Companies may be excited about doing Google-style analytics on all aspects of their business with Hadoop and other "big data" tools, but big businesses are bracing for bigger phone bills as big data is starting to generate big traffic across the distributed operations of enterprises. This is music to the ears of Infineta Systems …
Two corrections to this article
1. Google do not run Hadoop internally. They have Google FS, BigTable, Pregel and other things. The Apache Hadoop stack is evolving to be equivalent, but Google have their own stack, which predates much of Hadoop. The paper gets this right; it's the El Reg journalists who appear confused.
2. Bandwidth after the "MapReduce" stage is normally much less than ingress bandwidth. Hint, the word "reduce". This usually means squeezing down log data and the like to smaller summary.
Regarding the ingress/egress bandwidth, if all you are collecting is internal log data, you can predict the data rate (your daily click count, compressed), and its origin (your servers). Click log bandwidth will always be much less than site bandwidth, unless your site is something like bit.ly that just bounces 302 redirects back to the caller, in which case it's probably equal. Provided you keep the web servers near the Hadoop cluster, the cluster ingress bandwidth will be straightforward to handle.
The paper looks specifically at the problem of "classic" enterprises (i.e. pre-web), where systems are widely distributed for historical reasons; intra-enterprise traffic becomes a problem. This is probably the case when the application is itself distributed (telcos, banks). if your servers are scattered across 20 datacentres for historical reasons, you should consolidate down for cost reasons.
Despite these critiques of the article, the paper itself is pretty good.
"A pipe so fat it can satiate even the biggest boxes", oh yes, go on, how long did it take you to think up that smutty innuendo? Finish 2011 as you mean to go on... ;)
Finbar Saunders would be proud of that one.
Xylinx Vertex? I suspect that should be Xilinx Virtex
- Product round-up Coming clean: Ten cordless vacuum cleaners
- Something for the Weekend, Sir? I need a password to BRAKE? What? No! STOP! Aaaargh!
- Episode 13 BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
- Vulture at the Wheel Ford's B-Max: Fiesta-based runaround that goes THUNK
- Worstall @ the Weekend BIG FAT Lies: Porky Pies about obesity