# Craptastic analysis turns 2.8 zettabytes of Big Data into 2.8 ZB of FAIL

We can't seem to get enough of Big Data. In its Digital Universe in 2020 report (PDF), IDC forecasts Big Data-related IT spending to rise 40 per cent each year between 2012 and 2020, as the digital universe, now at 2.8 zettabytes (ZB), or 2.8 trillion GB, explodes to 40 ZB. That's very, very Big Data. It's a pity, therefore, …

This topic is closed for new posts.

#### Delta Tango Oscar

...But the delta between today's 0.5 per cent of actually analysed data and 33 per cent that could be useful...

Delta? What does that even mean? Are there not enough descriptive words in the English language already, that you feel the need to use these American-business-twatspeak nonsenses?

#### Re: Delta Tango Oscar

"Delta" or more specifically "Δ" is quite often used in mathematics/physics to denote a difference, see http://en.wikipedia.org/wiki/Delta_(letter)

A little bit of Google would have saved a bit of embarrasement.

The Delta between what you think you know and what you actually know is probably much larger than what you might care to believe .

#### Re: Delta Tango Oscar

It's Greek, therefore older than America, let alone "American-business-twatspeak nonsense". I first encountered it in school when I learnt calculus. I've also encountered it in the phonetic alphabet which you've perhaps ironically used.

As FDR mentioned, perhaps you'd have preferred "difference", but that wouldn't have accurately fit, given the author could then have been talking about the differences in the data sets, not the difference gap. So yes, it's quite a good word. I use it when describing data sets as just one example.

#### Re: Delta Tango Oscar

I won't pile on other than to say that Delta is a commonly used term in mathmatics, business and data management.....

#### Re: Delta Tango Oscar

What's worse, like most cheesy business jargon, it was stolen from another, far more interesting field. Usually, they borrow from sports lingo, but in this case it's aerospace/engineering jargon. If you've ever listened to the live comms from a space mission, you'll hear the astronauts referring to "Delta-V" after a course correction burn, where "Delta" refers to change and "V" refers, of course, to velocity.

Like those pasty management clowns have ever done any actual engineering, or been anywhere near the cockpit of a Space Shuttle.

#### Re: Delta Tango Oscar

Yes, a lot of consultese is the taking of a legitimate scientific term, with a precise meaning (delta, leverage, etc), and using as a synonym for a more general term, in order to sound like what you are saying is more interesting, or more scientific than it is.

#### Re: ...those pasty management clowns...

I really wish that the shuttle wasn't retired. One useful purpose for it would have been to send those those pasty management clowns into orbit, sans spacesuit.

My New Year Resolution is simple: assist incompetent manglers to achieve self destruction, any way possible.

I look forward to my next manglement tangle within the next two weeks.

#### Re: Delta Tango Oscar

So you're saying that using the name of an obscure specialist mathematical symbol when speaking to a general audience is more apt than using an everyday English word that means the same thing?

Well, forgive me for being stupid ampersand daft!

Does anyone seriously think there is any value in the majority of digital information generated on the internet? Most of it is tat with little meaning even at the point of generation.

Unless you have some idea of what you want out of the data there is little point in collecting it. As the article points out, signal to noise ratio is pretty poor.

The social network example illustrates what I think is flawed reasoning : "social networks aren’t using [our data] to create for us a social experience that is more like our real world, and frankly more in tune with our human-ness". The social experience comes from our voluntary interactions with eachother by means of information exchanges which have little intrinsic value. The social networks don't need to do anything with the data to enhance our experience - our brains can do that, thanks.

Sure, there is information that can be derived from the tat, but I would suggest that continually collecting piles and piles of unprocessed data is not the way to get at it.

'social networks aren’t using [our data] to create for us a social experience that is more like our real world, and frankly more in tune with our human-ness'

Or even our humanity.

Anyway, take away the endless drivel that fills Facebook and all the pictures of cats and that predicted 40ZB becomes a more easily managed couple of gigabytes.

#### We don't wanna miss the big one.

I wonder if part of the reason is that we have some instinctual fear of missing something important. Like a piece of obscure information that in hindsight turns out to be key in determining some important aspect of human behaviour. We don't want to be the one, to use fishing parlance, "to have the one that got away." It's like the old saying: 20/20 hindsight. We're trying to get enough foresight to not fall into that trap. But of course how do you know what's the important bits without experience? We're not clairvoyant.

What exactly is an 'open source data solution'?

#### Re: What exactly is an 'open source data solution'?

Funny, the author of this article works for the company that develops MongoDB and sells MongoDB support packages and whatnot. If I didn't know better, I'd say it was leading towards being an infomercial.

#### Re: What exactly is an 'open source data solution'?

Matt is a spruiker. Almost (there are rare, welcome exceptions) every article written has either directly or indirectly touted the services of the company he works for at the time. For reasons best known only to themselves the reg-itors don't have a problem with this.

most 'big data' is only proprietary stuff like pics, word docs, etc...

Trying to index HUGE quantities of data files takes roughly twice or more the space!! ( I am only guessing here based on work I did 10 years ago... ) - your data, a copy for check, a HUGE CRC to check, plus massive pointer file to track what data point is what....

go and find a good quality duplicate checker, that checks the WHOLE file, and give it 10Gb of files to work on.,... even todays powerful home PC's would start to choke... :) :)

#### Re: loads of crap data...

Why would you want a duplicate checker to check a whole file? In theory you'd only check files of the same size then check the file up to the first difference (which may be the entire file up to the last byte).

Now, if you wanted to check against any future duplicates you'd select a hashing system that makes sense for the number and size of files you will have (CRC may be fine, or SHA-512 if you want to reduce the chance of collisions), then hash the file as it comes in to your system since that should be the cheapest time to do it. You could then save this info to a database that could handle the comparisons quickly. Just make sure you figure a way to handle deletions and moves correctly.

#### Obese Data

Perhaps obese data would be a better name for it: it's so big that it's unhealthy, of no real benefit, a burden for those who have to manage it and threatens to destroy our comfortable way of life.

#### Re: Obese Data

>of no real benefit

Not true. It is of great big benefit to the vendors of big data solutions.

Our industry works in recurring waves of the consensual next big thing, with customers periodically being fleeced by buzzword-toting sales and marketing types, as well industry pundits.

Every so often, something of value comes out of the new paradigms, but most of it is of dubious value and provides little long term competitive advantage*. Because the subject matter is still evolving, costs are high and implementations likely account for a good percentage of failed projects in those organizations that are not otherwise pathologically incompetent at IT by nature (i.e. government IT wastes money with or without those things).

Especially useful to the vendors when the new paradigm translates into big honking servings of hardware and consulting. CRM, anyone? Real time enterprise? Object repositories?

* straight out of Nick Carr

#### Re: Obese Data

GIGO. Garbage In Garbage Out. What an old saying. but still true, I am, and I am not, worried about all that garbage. But suppose, I have read something about "paedophiles". like I did to day, at the Register, and suppose I use the word "paedophile" again, will I end up in some register with possible "paedophiles". What about Al Quaida or what ever, Am I in a possible terrorist database then, perhaps.

The "state" and me. Are we drifting apart again, or have we done it from the beginning.

Until me cannot define rules and methods to dig into that garbage, for good reasons, just let that garbage down the drain, Why, the hell, are there people who think we should collect all the shit we produce without having even one sensible, or workable, or "enthusiastic" idea, or reason to do it and to pay for it.

Enough of that, but then this "Big", Big is never big, it is big only compared to something smaller, big is never big for more than some seconds. Big is nice and honest, he remembers the Titanic and much more.

"Big" will survive as far as big tits or big, ahh, leave it at that, headaches horses.

Data will not.

With my highly advanced capability to be accurate, to the point, lovable, cleaver.

New line here, because of my strong and humble character.

But now, dear El Reg, as I have proved my ability to take part, and add to that big obese data " mountain" of shit.

Could you not provide us with something easier and faster when it comes to deleting old and wonderful posts.

As It is now , one at a time, I give up, to slow too much work. Could you not provide us with something "big data features" things, like first year, last year, every year, or something. By now, and to your great surprise, deleteing down voted post only, came into your head long before it came into my head. Never do that.

Then there is this New Year, tends to happen very regularly, and by now I have been able to show how much big shit I can produce, If I want to, and even if not.

Happy New Year, but tell me, if I click on the rubbish adverts I am provided with does it help you economically,

do I help you. Are you allowed to express an opinion on this.

#### You want to know what your customers wants?

Why don't you pick up the phone when he's calling you then? Why don't you listen to what he has to say? Why do you have this minefield of procedures to prevent your employers to give him what wants? I'll tell you why. Because you're not interested in what your customer wants.

Why are you collecting big data then? Because if you do, you get a big budget, and a big computer, and a busty assistant.

#### Re: You want to know what your customers wants?

Because you have big numbers of customers. If you only had 10 customers it would be much easier to build a unique relationship with them.

#### Re: You want to know what your customers wants?

Naah, even if you have big numbers of customers, you can have a personal relationship with your customer. But most companies don't bother. Banks go out of their way to not have you at the counter. Then they call you up because they have this really interesting "financial product" specifically for you? And when you look at it their "financial expert" who used sit at the counter is peddling some crappy standard formula that is never interesting.

Another example: my IT-provider is sending me backdated letters that I'm late paying my bills. Sometimes I am, but most of the time I'm not. Probably these letters are backdated because it takes nearly a week between the letters being generated and going to the post to be sent. Either that or it takes the post a week to deliver them. Which is not unheard of either. Either way the IT-provider takes this delay into account so they decide to sent the letter a couple of days before the payment is due. So the payment and the letter cross each other, and on the next invoice they charge me 5€ "administration cost". I would complain, but you can't mail them. You can't send them a letter. No you have to phone them and then they put you through automated phone hell.

I'm sure if I decided to swap IT-providers they would be fawning all over me, because then I would be a potential customer. But because I already am, they're milking me for every cent.

My conclusion: most companies don't want to know what their customers want. Because if they did, they would have to listen to them, or perhaps even worst, listen to the people at the bottom who actually meet their customers.

Instead they merely pretend they want to know what their customers want. This allows them to set up big surveys, use big budgets and, most importantly, puts upper management in a position to not listen to the morons on the floor, but tell them what to do instead.

I'm sure you agree that this is preferable for everybody involved. Ok, perhaps not for the customer, be he doesn't get to decide. And perhaps not for the people working with that company, but they don't get to decide either.

This is what upper management wants and that's the only thing that matters.

#### would you like a cookie?

"You know EVERYTHING about me, but can you show relevant ads? Nope"

I never understood why the big adbrokers think they know my interests better than I do. None of the adbrokers seem to offer an easy way for me to list the kind of stuff I'm actually interested in and might buy.

#### Re: would you like a cookie?

But that's a little backward. They want to sell you to some advertiser. Here are 1000 people who might buy your crap

#### The answer's in there somewhere

But they don't have the right question yet. (What do you get if you multiply six by nine?)

#### Re: The answer's in there somewhere

"What do you get if you multiply six by nine"

69, you a fool?.

#### Re: The answer's in there somewhere

Don't know your perl operators ?

"." doesn't multiply but concatenate. If anything, "6 x 9" is 666666666.

(is that in the 0.5% usefulness or in the 99.5% "stuff" ?)

#### Re: The answer's in there somewhere

You've missed the DNA reference .. the answer is of course, 42. It's the question that is wrong.

#### Hmmm, seems like a bit of a product placement story....

Considering that the author is from a data management vendor.

However, data insights can be incredibly valuable, but you have to get the right data, and then that data structured correctly, and get a cost-effective data analysis platform, and the people to generate insight from the output of that data analysis, and an organizational culture that welcomes those insights.

That's a lot of hoops to jump through on your way to improved business results, versus another IT white elephant.

Big brother, because he seems to be the most fitting icon out of those that are available.....

#### actually thankful for ineptitude in this case

"But, as I've argued before, we're actually quite inept at analysing data, be the data big or small."

Hey, works for me.

Apparently, they're also abjectly incompetent at discerning between real, actual data and the bogus stuff. When setting up my account, I told Facebook I was a thirty-year-old black woman born in Cairo and living in Tripoli. You should see the ads in my right column -- almost all in Arabic, and half of them for cosmetics. B'wah ha ha ha ha hah.

#### Re: actually thankful for ineptitude in this case

I write some fiction. I'm told that if I sell it through Google it will of course be indexed by their search engine, which will be a good thing. I have no idea how anyone will craft a search term which will reliably find fiction about bears parachuting into Nazi Germany, headed for an encounter with a spear and magic helmet. And I really don't want to know what might prompt them into doing so.

Based on the adverts I get, the systems struggle to tell the difference between fiction and reality. I am beginning to wonder if it was Google who sold T. E. Lawrence the motor-bicycle which killed him.

(Yes, it is the double-breasted gray leather outfit, and that is a DeLameter in my pocket. QX?)

#### Re: actually thankful for ineptitude in this case

You should see what my cat is supposed to read.

#### Re: actually thankful for ineptitude in this case

Maybe it was way before my time, but I never understood why the letters QX. Interesting thing about the fiction of that era, though. It would make for an interesting fictitious profile: a pilot with a penchant for precise use of maneuvering thrusters in inert space and a knack for making accurate physics calculations using a slide rule.

#### The Pachyderm In the Premises, or The Elephant In The Room...

"So while the US will spend around \$1.80/GB and Western Europe \$2.50/GB to manage Big Data, China (~\$1.30/GB) and India (~\$0.95/GB) are much lower. As IDC notes, this disparity ... 'also represents differences in the sophistication of the underlying IT, content, and information industries — and may represent a challenge for emerging markets when it comes to managing, securing, and analysing their respective portions of the digital universe.' Open-source data solutions will help to lower the costs of Big Data storage and analysis for all regions, but the far bigger problem is still knowing what to do with all these data. "

So...

01) IT has not solved the problem of "knowing what to do with all these data".

02) IT in the West is more sophisticated than in China, India, etc because they spend \$1.80-\$2.50/gb to manage "big data" even though they don't know what to do with all that data, while "less sophisticated" China, India, et al, spend less.

To me, that sounds like

1) IT in the West is more sophisticated than in China, India, et al, because they suck up larger amounts of resources that result in no gains for anyone not directly deriving an income from the sale of "big data"-related goods and services. That's sophistication alright!

2) IT in China, India et all is less sophisticated than in the West, as evidenced by the fact that businesses there spend less money managing "big data" for which no one has figured out a use.

03) IT in China, India, et al will only reach the same level of sophistication as IT in the West when they manage to siphon similar levels of resources from their host businesses as does IT in the West.

04) "IT sophistication" is gauged by the resources directed into IT investments and expenditures which yield no returns to anyone other than those deriving their income from Big Data-related goods and services.

Obviously this particular article, though bearing Matt Asay's byline, was ghost-written by Lewis Carroll.

Perhaps we can set an army of dispossessed urchins to work, foraging for scraps of value in the ever expanding digital landfill.

Even organisations which claim to be totally compliant, and clued up about the data they create and hold, are surprisingly lax when it comes to risk management and forward planning. They may think they've implemented foolproof policies to stream a wide variety of data and file types by value and retention date, but we can always build a better fool.

If businesses and public sector orgs managed their data responsibly from the get go, ICO hits would be rarer than vegetarian lions. Slipshod strategies employed piecemeal by poorly trained and badly informed staff make that an unlikely scenario.

#### Encyclopedia Brittanica

I have this vision of monkeys, keyboards, and big data crunching machines -

#### Re: I have this vision of monkeys, keyboards, ...

I could not let that one get past me.

Aren't you describing the crack Windows 8 development team???

</fanboy rant>

they have money, and they love to assume that they are smarter than people with less money. (they call it ""experience"") but really what they are saying when they say they wont hire you because of a lack of experience, they mean they wont hire you because you have a lack of money.

im not surprised business leaders frequently fail to spot trends or analyse data effectively. I have worked with business leaders and they are all scum. they only think in terms of how to make the next penny. even when they have kids you can see they have a terrible outlook on life and their kids are inherently unhappy. business is futile.

#### Re: the trouble is that business leaders are inherently stupid people

if you're rich you will get hired straight out of Oxford even if you only got in cos your dad paid the relevant hush money. thats exactly how business in the UK works. There are people in top universities who can write their own name, cant name the planets, but they can get a job above you in the food chain every day of the week.

#### Re: Can't name the planets

Are you upset that they include Pluto, or that they leave it out?

#### Yes, data is worth money

Data is worth a lot of money given to the right person using the right methods and given reasonable time.

#### How do they know?

How do they know how much data there is and which data isn't being analysed?

For example, I have some image files on a USB drive which I keep hidden from the missus... as far as she knows the data doesn't exist and isn't analysed, yet it does exist and from time to time I give it a very thorough analysis (sometimes twice in one session).

This topic is closed for new posts.