Hype alert; hype alert; Big Data is coming our way. A new volcano has blasted its way above the surface of the marketing sea, spewing out "big data" messages in enormous flows of thought leader bullshit. What the heck is this big data thing? EMC says it's to do with handling data at the petabyte scale, where things like …
If Theodore Sturgeon was right...
...when he said 90% of Sci-Fi was crap because 90% of everything is crap; Then 90% of the Petabytes stored in the "cloud" is crap too.
I think Theodore was on the right lines...
As 90% of Reg articles are crap as well.
Come on kids, THEIR / THERE ... really?
All those SMS messages and Twitter feeds from people making posts about their pets have to kept somewhere while the Gov sifts them looking for evidence of terrorist activity!
Is it deduplicated? 5 million people posting the same damed lolcat is a hell of a dedupe ratio...
Not sure on the value of de-dupe in this space. I could understand wanting to de-duplication data in the ETL layer, although a lot of products there are DB-based rather than file-based. However, since the aim is for a 'single source of truth' via third normal form, where's the value of de-dupe in the DW? Assuming the reporting tier is ROLAP (as part of that single source of truth that everyone's striving for), there's very little data there apart from cube dimensions.
I suppose you might want limited MOLAP for performance reasons, then de-dupe that, but that ought to happen at the DB level, surely?
Big (file) data is very real
Chris, come spend a day with Isilon and you'll see that 'big data' is very real. It's something we talk to customers about every day. We can talk about the members of our 10PB club too!
There's big data ..
and there's the LHC - in a different league
The Large Hadron Collider will produce roughly 15 petabytes (15 million gigabytes) of data annually
RainStor's goal is to de-dupe Big Data
Great article as usual Chris and thanks for the mention.
As you pointed out RainStor de-dupes structured data without sacrificing the original form. We preserve the immutable structure of the data while magically de-duplicating the values so that the footprint physically shrinks 40-1 or more.
We are all about taking petabytes of data and reducing it to terabytes thereby allowing limitless amounts of data to be stored at the lowest possible cost.
Many thanks again for the article and the reference to RainStor.
VP Product Management