When it comes to Big Data, I’m as geeked out as the next guy – if not a little more so. For the last three years or so, I’ve been telling anyone who will listen (and plenty of people who won’t) that Big Data and enterprise analytics are the "next big thing" both in business and computing. Today, it’s widely accepted that Big …
Sounds like my last reprocessed baked bean
Sounds like he is describing accountants more than most with the way they make there books look. Give an accountant worth there salt a data-set and they will make a profit from it one way or another.
But data is like shoes - people are different, but there will always be somebody selling sandle mentality.
Newton's law of universal gravitation is quite a good model based upon "fitting a pattern to some data".
You can make accurate predictions. You don't have to understand why it works. It has stood the test of time.
How do you expect to come up with a theory (model) if you are not allowed to use your observations (data)? Close your eyes and guess?
"[Scientists] simply look at loads and loads of data, fit some patterns to it, and then start making decisions based on that model."
Yup. That is what scientists do. If you come up with a crappy model, perhaps your predictions will also be crappy. There is nothing wrong with the overall approach though.
Fenwick> You can make accurate predictions.
I'd correct that to read: "You can make reasonably accurate predictions only for those cases where the model is at least reasonably correct and the data you plug into that model is at least reasonably correct."
1. Correlation != Causality. (My phone spent half an hour outside Victoria's Secret NOT because I'm interesting in buying a bra-and-panty set, but because my feet were tired and there was a handy bench outside the shop.)
2. An accurate model (and the huge number of data points -- some of them likely un-collectable -- needed to make it work) covering complex human behavior is unwieldy at best, incomprehensible at worst.
3. Model-makers must simplify their complex-human-behavior models to make them humanly-usable.
4. When you simplify a complex-human-behavior model, you lose accuracy.
5. Simplifying a complex-human-behavior model in a way which makes it still mostly-usable and reasonably-accurate is partly science, and partly art. It's tricky and difficult work.
6. Gullible execs continue to buy corporate "magic bullets" and "wonder cures".
7. "Let's keep their credit-card information. We could probably do something with that."
Big Data = marketing = cargo cult = fail.
I had a long interesting comment on the subject but fck it your article is worthless and the article it is about is just as worthless.
"I had a long interesting comment on the subject"
Oh I doubt that.
1. keep data which is flowing into your organisation
2. package and sell it / access to it / analysis of it ( to anyone including "fetishists" )
3. see title
What you say is great for nice non-chaotic systems. Try recording the movement of a three bob pendulum for 100 years sampled every 1000th of a second. try to use that data to predict where it will be 15 seconds later (If you are going to say that is possible, what if a bearing fails after 7 of those seconds?). Not only does Correlation != Causality but also Determinism != Predictability. Few systems are as nice and well behaved as gravity.
In the UK I have seen that August is warm and February is cold (substitute for correlation). The weather is chaotic and unpredictable to some extent.
I predict that this August will be warmer than next February.
What odds would you require to bet against me?
So monthly patterns are reasonably predictable - you don't need Big Data for that, the last 20 years avergage monthly temps. would do, not 200 years of 2000 samples an hour. But even that amount of data won't help determine the weather next wednesday. That is my point, nice general trends can often be spotted without massively huge, detailed data sets. Specific detailed predictions can often not be made even with them. When the volume of sample isn't the issue Big Data isn't going to help.
I reckon that I could make more accurate predictions of the average August temperature using a larger data set. Or at least quantify my accuracy with a little more certainty.
Talking of trends. If you want to convincingly separate (and quantify) any long term trend in say the temperature, from things like the annual cycle, turbulence and sea surface temperatures, then 200 years of 2000 samples per hour (at 1x1 mm resolution around the whole world) would do nicely.
For some applications the volume of the sample is exactly the issue (for example in accurately estimating the failure rate of bearings).
Big Data: The Tower of Babel AGAIN?
"6 And the Lord said, Behold, the people is one, and they have all one language; and this they begin to do; and now nothing will be restrained from them, which they have imagined to do.
7 Go to, let us go down, and there confound their language, that they may not understand one another's speech."
Lo verily though I am a sinner and secular humanist and liberal peacenik & that most undeserving of creatures derived from Adam's rib, I say unto you: let us retreat into our BOWERS of BABBLE and read snail tracks. (Sorry have been out in the garden where I belong.)
'Big Data' not New Just Cheaper to Do
'Big data' (formerly and sometimes still known as database marketing on very large databases, [enterprise] data warehousing, combined with statistical analyses, no wait that's data mining, I mean analytics)... sigh! Every time a new buzzword comes up, it's usually because some vendor wants to put a new spin on their frankly old products and ideas to distinguish them from the crowd.
Not so long ago, because of the costs of storing and managing large volumes of data, we used to be very selective about which data we tried to collect. The continuing poor quality of much data collected in organisations makes that ever more important. Models built on this unstable mountain will be very unreliable indeed.
But, now that data storage costs have fallen (predictably in accord with Moore's Law), companies want to recklessly capture as much data on their customers as they can. I also believe it has something to do with what I call the Microsoft Market Effect, i.e. if Microsoft seriously enters a product market, it becomes a commodity (software licence costs fall and everyone wants one on their desktop).
As for keeping all data forever, this ignores the very real rules of Data Protection legislation (even in the US, Safe Harbour agreements often mean that large companies need to be cognizant of EU and other international laws) that forbid such a thing. Also, as the business evolves, very old data becomes non-representative of the current business. Retention requirements and archiving policy need to be evaluated on a careful (business) case-by-case basis, not simply because the storage/database vendors tell you can store as much as you want or at least more than your competitors ('mine is bigger than yours' springs to mind).
The real risk in all this is that as the fad grows, it exacerbates the demand for truly experienced technologists who have the gravitas to tell their decision makers, "No! This is how we do it the right way so that you spend less and get meaningful results quicker." If decision makers then really want to make automated decisions based on faulty data and even worse models, they risk turning loyal customers into sworn enemies--downward spiral begins. And oh, how I have seen this already beginning. Caveat emptor.
We've lost the plot
The emphasis seems to have shifted away from understanding what data means and we are now more interested in collecting as much as possible, no matter what. Having lots of data is no good if we forget how to analyse it. Having data isn't the same as knowledge and doesn't imply. understanding.
- Mounties always get their man: Heartbleed 'hacker', 19, CUFFED
- Analysis Oh no, Joe: WinPhone users already griping over 8.1 mega-update
- Leaked pics show EMBIGGENED iPhone 6 screen
- Opportunity selfie: Martian winds have given the spunky ol' rover a spring cleaning
- OK, we get the message, Microsoft: Windows Defender splats 1000s of WinXP, Server 2k3 PCs