back to article Microsoft's Big Data-driven improvement efforts flounder

Microsoft Research and a researcher from North Carolina State University have studied Redmond's internal use of Big Data tools and found that the company needs to change in order to put data to work. And so, probably, does yours. The paper, The Bones of the System: A Case Study of Logging and Telemetry at Microsoft (PDF) …

  1. Destroy All Monsters Silver badge

    "data-driven culture"

    When I hear "data driven", I take the safety off my Browning.

    1. Anonymous Coward
      Anonymous Coward

      Re: "data-driven culture"

      The firearm or the actress?

      1. Destroy All Monsters Silver badge

        Re: "data-driven culture"

        I didn't know there is an actress called Browning...

        From Jimbo's Trivia White Hole

        When the Nazis achieved power in 1933, Johst wrote the play Schlageter, an expression of Nazi ideology performed on Hitler's 44th birthday, 20 April 1933, to celebrate his victory. It was a heroic biography of the proto-Nazi martyr Albert Leo Schlageter. The famous line "when I hear the word culture, I reach for my gun", often associated with Nazi leaders, derives from this play. The actual original line from the play is slightly different: "Wenn ich Kultur höre ... entsichere ich meinen Browning!" "Whenever I hear of culture... I unlock my Browning!" (Act 1, Scene 1). It is spoken by another character in conversation with the young Schlageter. In the scene Schlageter and his wartime comrade Friedrich Thiemann are studying for a college examination, but then start disputing whether it is worthwhile doing so when the nation is not free.

        1. Fungus Bob

          Re: "data-driven culture"

          Emily Browning

          http://www.imdb.com/name/nm0115161/?ref_=fn_al_nm_1

          BTW, thanks for the history lesson :)

  2. Pascal Monett Silver badge
    Holmes

    Nothing really surprising in those results

    Trust of the data requires knowledge of how it is gathered and filtered.

    Understanding the data requires knowing what it represents.

    You cannot have meaningful data and put it into a form anybody can understand. The "consumers" of the data need to know the significance of what they are looking at.

    The only thing this research tells us is that Microsoft is having the same trouble managing data than everyone else does. There is no magic wand to solve data management issues. It requires expertise, knowledge and, most often, experience. No program can replace that.

    1. allthecoolshortnamesweretaken

      Re: Nothing really surprising in those results

      Learned this some 30 years in a statistics class I had to take (which, in retrospect, I'm really grateful for). Unless you do not know exactly how the raw data was gathered and how it was filtered and processed it is meaningless. However, that is no obstacle at all for gently guiding it in the right direction, i.e. have the data show whatever it is that you wanted it to show in the first place. A lot of this actually happens unintentional, because (for example) having a degree, even a scientific one, doesn't mean per se that you know and unterstand statistics. (I'm looking at you, medical doctors. Also: economics is not a science, sorry Tim, it rather bears all the hallmarks of religion.)

      Data trawling faces the same problems. Given that it is usually done with a business plan in mind, results are bound to be biased at least a bit - if only to justify the cost of the data trawling.

      1. Michael Wojcik Silver badge

        Re: Nothing really surprising in those results

        Unless you do not know exactly how the raw data was gathered and how it was filtered and processed it is meaningless.

        A gross exaggeration (assuming we correct for the extraneous negative; otherwise it's just nonsense). The value of information does not immediately plummet to zero if you don't know every miniscule detail of its provenance.

        This sort of nay-saying is just as bad as the Big Data cheerleading. Yes, it's easy to take a stream of data and turn it into a pretty dashboard that says nothing useful - and, conversely, difficult to do something productive with it. But even noisy, distorted, signals that aren't fully understood can be usefully analyzed.

        1. Destroy All Monsters Silver badge
          Headmaster

          Re: Nothing really surprising in those results

          Am excellent book on this:

          Causality by the great Judea Pearl

          And also:

          Causation, Prediction, and Search, Second Edition

          My biggest disappointment is that I am too slow to actually grok these things in my lifetime.

    2. fajensen

      Re: Nothing really surprising in those results

      Big Data == Ad Hock-cracy!

    3. Anonymous Coward
      Anonymous Coward

      Re: Nothing really surprising in those results

      Trust is something that's hard to earn, but easy to break.

      Microsoft is an untrustworthy company. Proven time and again.

      Also, Microsoft is the first company to use its operating system (a software monopoly) as a platform to data mine its users. That is morally reprehensible.

      If you're using Windows 10, please setup a local account instead of a Microsoft account. Then tweak the various settings to turn off keylogging, telemetry, bandwidth sharing etc.

  3. Anonymous Coward
    Anonymous Coward

    Progress

    arse:elbow

    1:2

    That seems to be usual ratio, what to make of it though....

    1. Anonymous Coward
      Anonymous Coward

      Re: Progress

      We need more statistics!

      COLLECT IT NOW! SHINGLED DISKS ARE CHEAP!

    2. DwarfPants
      Coat

      Re: Progress

      Does it not depend on how you classify arse. If the individual is one as well as having one the ratio of Arse:Elbow could be 1:1. They could also be extremely unfortunate and have fallen out of the ugly tree hitting every branch on the way down, resulting in a face like the arse of a donkey as well as being one and having one. The resulting Arse:Elbow ratio being 3:2.

      Sorry I am over thinking this, I'll get my coat.

  4. Quortney Fortensplibe

    Knock us down with a vulture feature..

    Unless you were trying to be too clever by far, I bet you meant "feather"

    1. Vic

      Re: Knock us down with a vulture feature..

      Unless you were trying to be too clever by far, I bet you meant "feather"

      When you hear that whooshing noise, look up. You might just catch sight of the joke...

      Vic.

  5. Anonymous Coward
    Anonymous Coward

    Blindly collecting data won't help you.

    This is a problem of blindly collecting data without any knowledge of their source true situation and aims. Thereby filtering and processing them becomes very hard, if not impossible, if you can't properly classify them.

    For example, Adobe changed recently Lightroom Import module following an assessment made among "photo enthusisast who never bought Lightroom before". They found the previous Import module too hard to master. They changed it into a much dumber "experience", enraging many long time users.

    Adobe didn't ask itself 1) Why those users never bought Lightroom before 2) Why they couldn't master the Import module, and their true computer literacy 3) Why they didn't look for a tutoria/manual 4) What actual, paying users would have thought.

    The result was a "We are very sorry, next time we'll do a larger assessment, and listen to more people"

  6. Anonymous Coward
    Anonymous Coward

    If the data comes from flaky 3rd-party javascript trackers then no, I don't trust it. Besides, the typical software project has a developer deficit and a massive bug surplus. Who's got time to sift through crap data looking for more?

  7. yokel

    Just providing data doesn't work

    Para 3 in the introduction says to me that data engineering is not enough and it needs some experienced people to manage and analyse it

    "The worst-rated problem for all but one activity was combining data from multiple sources (for data science, it was the second-worst problem). Other problems that rate highly include the ease of use of the tools, the amount of clerical work required, the difficulty of getting relevant insights, and the amount of time that the activity requires"

    1. Kobus Botes
      FAIL

      Re: Just providing data doesn't work

      Well, yes. Just look at the helpful links in Event Viewer, if you want to know more about a problem.

      I have taken MS on about broken pages when following the suggested links in Event Viewer, when trying to figure out what the problem was (in my experience, about 80% of the time I would get a "Page does not exist" error, or a friendly message stating that MS does not have any information about the problem, with a helpful suggestion to hone my search terms).

      In my post I provided them with the necessary links, as well as full details about which events it related to (I just started from the top and listed the first ten - all of which ended up at one of the above).

      They replied fairly promptly, asking for links and more info - so this time I took the first fifteen unique events (i.e. ignoring repetitions), which all yielded broken links, plus I pointed out I initially did provide the info they looked for,

      Needless to say, I never received any reply, and the links are still broken. Maybe my suggestion that the website maintainers and the Event Viewer writers get together to sort out the problem did not go down too well.

      The funny bit is also that the option to "Search TechNet with Bing" yields very little results, whereas using Google or Bing (ouch!) can return thousands of hits, including many more from TechNet than reported by "Search TechNet with Bing".

      For instance, Event 7001 (e) has the following result:

      Event Viewer Online Help - No information

      Search Technet with Bing - 50

      Bing - 288000

      Google - 186000

      Quite a large discrepancy, I would say.

      Edit: I tried to edit the two lines about the results, in order to line up the columns, but it gets ignored, so I changed it to make it more readable (hopefully!).

  8. Anonymous Coward
    Anonymous Coward

    The irony here

    Is that if they'd conducted this study by collecting and analysing a big data set then their conclusion would have been:

    'we can't be sure what the data says, whether what it says is true or if we're even reading it the right way up but what we can conclude for certain is that we need more data to draw reliable conclusions'

    Using the distinctly Web 0.01 approach of actually looking to see what's going on feels like foul play.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like