"data-driven culture"
When I hear "data driven", I take the safety off my Browning.
Microsoft Research and a researcher from North Carolina State University have studied Redmond's internal use of Big Data tools and found that the company needs to change in order to put data to work. And so, probably, does yours. The paper, The Bones of the System: A Case Study of Logging and Telemetry at Microsoft (PDF) …
I didn't know there is an actress called Browning...
From Jimbo's Trivia White Hole
When the Nazis achieved power in 1933, Johst wrote the play Schlageter, an expression of Nazi ideology performed on Hitler's 44th birthday, 20 April 1933, to celebrate his victory. It was a heroic biography of the proto-Nazi martyr Albert Leo Schlageter. The famous line "when I hear the word culture, I reach for my gun", often associated with Nazi leaders, derives from this play. The actual original line from the play is slightly different: "Wenn ich Kultur höre ... entsichere ich meinen Browning!" "Whenever I hear of culture... I unlock my Browning!" (Act 1, Scene 1). It is spoken by another character in conversation with the young Schlageter. In the scene Schlageter and his wartime comrade Friedrich Thiemann are studying for a college examination, but then start disputing whether it is worthwhile doing so when the nation is not free.
Trust of the data requires knowledge of how it is gathered and filtered.
Understanding the data requires knowing what it represents.
You cannot have meaningful data and put it into a form anybody can understand. The "consumers" of the data need to know the significance of what they are looking at.
The only thing this research tells us is that Microsoft is having the same trouble managing data than everyone else does. There is no magic wand to solve data management issues. It requires expertise, knowledge and, most often, experience. No program can replace that.
Learned this some 30 years in a statistics class I had to take (which, in retrospect, I'm really grateful for). Unless you do not know exactly how the raw data was gathered and how it was filtered and processed it is meaningless. However, that is no obstacle at all for gently guiding it in the right direction, i.e. have the data show whatever it is that you wanted it to show in the first place. A lot of this actually happens unintentional, because (for example) having a degree, even a scientific one, doesn't mean per se that you know and unterstand statistics. (I'm looking at you, medical doctors. Also: economics is not a science, sorry Tim, it rather bears all the hallmarks of religion.)
Data trawling faces the same problems. Given that it is usually done with a business plan in mind, results are bound to be biased at least a bit - if only to justify the cost of the data trawling.
Unless you do not know exactly how the raw data was gathered and how it was filtered and processed it is meaningless.
A gross exaggeration (assuming we correct for the extraneous negative; otherwise it's just nonsense). The value of information does not immediately plummet to zero if you don't know every miniscule detail of its provenance.
This sort of nay-saying is just as bad as the Big Data cheerleading. Yes, it's easy to take a stream of data and turn it into a pretty dashboard that says nothing useful - and, conversely, difficult to do something productive with it. But even noisy, distorted, signals that aren't fully understood can be usefully analyzed.
Am excellent book on this:
Causality by the great Judea Pearl
And also:
Causation, Prediction, and Search, Second Edition
My biggest disappointment is that I am too slow to actually grok these things in my lifetime.
Trust is something that's hard to earn, but easy to break.
Microsoft is an untrustworthy company. Proven time and again.
Also, Microsoft is the first company to use its operating system (a software monopoly) as a platform to data mine its users. That is morally reprehensible.
If you're using Windows 10, please setup a local account instead of a Microsoft account. Then tweak the various settings to turn off keylogging, telemetry, bandwidth sharing etc.
Does it not depend on how you classify arse. If the individual is one as well as having one the ratio of Arse:Elbow could be 1:1. They could also be extremely unfortunate and have fallen out of the ugly tree hitting every branch on the way down, resulting in a face like the arse of a donkey as well as being one and having one. The resulting Arse:Elbow ratio being 3:2.
Sorry I am over thinking this, I'll get my coat.
This is a problem of blindly collecting data without any knowledge of their source true situation and aims. Thereby filtering and processing them becomes very hard, if not impossible, if you can't properly classify them.
For example, Adobe changed recently Lightroom Import module following an assessment made among "photo enthusisast who never bought Lightroom before". They found the previous Import module too hard to master. They changed it into a much dumber "experience", enraging many long time users.
Adobe didn't ask itself 1) Why those users never bought Lightroom before 2) Why they couldn't master the Import module, and their true computer literacy 3) Why they didn't look for a tutoria/manual 4) What actual, paying users would have thought.
The result was a "We are very sorry, next time we'll do a larger assessment, and listen to more people"
Para 3 in the introduction says to me that data engineering is not enough and it needs some experienced people to manage and analyse it
"The worst-rated problem for all but one activity was combining data from multiple sources (for data science, it was the second-worst problem). Other problems that rate highly include the ease of use of the tools, the amount of clerical work required, the difficulty of getting relevant insights, and the amount of time that the activity requires"
Well, yes. Just look at the helpful links in Event Viewer, if you want to know more about a problem.
I have taken MS on about broken pages when following the suggested links in Event Viewer, when trying to figure out what the problem was (in my experience, about 80% of the time I would get a "Page does not exist" error, or a friendly message stating that MS does not have any information about the problem, with a helpful suggestion to hone my search terms).
In my post I provided them with the necessary links, as well as full details about which events it related to (I just started from the top and listed the first ten - all of which ended up at one of the above).
They replied fairly promptly, asking for links and more info - so this time I took the first fifteen unique events (i.e. ignoring repetitions), which all yielded broken links, plus I pointed out I initially did provide the info they looked for,
Needless to say, I never received any reply, and the links are still broken. Maybe my suggestion that the website maintainers and the Event Viewer writers get together to sort out the problem did not go down too well.
The funny bit is also that the option to "Search TechNet with Bing" yields very little results, whereas using Google or Bing (ouch!) can return thousands of hits, including many more from TechNet than reported by "Search TechNet with Bing".
For instance, Event 7001 (e) has the following result:
Event Viewer Online Help - No information
Search Technet with Bing - 50
Bing - 288000
Google - 186000
Quite a large discrepancy, I would say.
Edit: I tried to edit the two lines about the results, in order to line up the columns, but it gets ignored, so I changed it to make it more readable (hopefully!).
Is that if they'd conducted this study by collecting and analysing a big data set then their conclusion would have been:
'we can't be sure what the data says, whether what it says is true or if we're even reading it the right way up but what we can conclude for certain is that we need more data to draw reliable conclusions'
Using the distinctly Web 0.01 approach of actually looking to see what's going on feels like foul play.