So much data, so little time: How to not flip your wig processing it

The author has no clue about what he attempts to talk about.

So many errors that it would take a rebuttal article to correct the original.

Free clue. Batch processing for summaries has to occur at some point when the data is stable. You can do this periodically throughout the day or at night. Note that if you're a global company night is relative.

Also, if you aren't going to use the data, it makes no sense to spin cycles computing averages kpis for no reason.

The real issue is how the data is delivered. Some data comes in throughout the day in flat files, thus you have to wait until 'end of day' or all sources have delivered the data. Then you have data which is streamed. This data can be used to generate running totals /averages or other kpi.

But the author is correct in that most people today only know RDBMs. We've since flipped back to hierarchical structures and unless you're old enough to have been taught COBOL or have worked with a Pick system (Revelation, U2, etc ...) or have converted IMS,... you really don't know much about it.

Or have spend time working with the newer tools where you have field and record separators in hive.

But I digress. Maybe the author should learn something before he writes an article ?

