* Posts by Joe Landman

1 publicly visible post • joined 25 May 2013

In-array compute ....

Joe Landman

Processing in memory by any other name, would provide as much bandwidth and scalability

I appreciate that Chris is trying to get some comments going, and this is an important topic.

First off, my company has been talking about and developing tightly coupled computing and storage for a long time. Far longer than has been fashionable. So we are biased in this regard, but we explain our biases. We are building some of the fastest tightly coupled systems around (pay attention to our web site soon for an example).

Second off, no shared pipe will ever scale with capacity, unless you scale the pipe. Which you largely cannot do after installation of the pipe. So once you lay down your SAN, you are at FC8 or FC16 until you fork-lift upgrade it.

Third off, as you get more requesters hitting that single shared pipe, guess what happens to your bandwidth. Its not pretty. You add that more storage, because you need the capacity. If your bandwidth doesn't scale with capacity, your data gets colder and colder. In the era of big data, streaming analytics, and massive data flows, this is not a plan that will end well for you.

Tight coupling between storage and computing is absolutely essential for most large scale computing and analytics ... you simply cannot get more bandwidth out of shared pipes of fixed size, as compared to a distributed computation among machines with collections of massive pipes to their local storage. Whether you call this putting computing in arrays or arrays in computing doesn't matter. What matters is that designs that cannot scale, wont scale.

Considering how fast storage and computing requirements are growing, and the ever expanding height of the storage bandwidth wall (size of storage divided by the bandwidth to read/write it ... its a measure of how much time is required to read/write your capacity once), designs which don't allow you to scale bandwidth and computing power at the same time as you scale your capacity are rapidly falling by the wayside for many users, as they need to process that data in realistic periods of time. Placing the storage firmly at networks edge exacerbates the problem thanks to shared pipes and increasing numbers of requesters for using these pipes.

We only see the problem growing worse with the rapid growth of storage capacity and the relatively fixed size of network and fabric pipes. The only way you are going to be able to process all the data you need to process is to put the processing adjacent to massive data pipes. This is what Google, Yahoo, Facebook, Linkedin, ... are doing. And most of the other folks, not at their scale, are looking to do smaller versions of this. Hadoop and other key value processing engines are implicitly this.

Call it processing in disk or disks in processing, but its not going away. If anything, its accelerating.

Joe Landman

CEO

Scalable Informatics

http://scalableinformatics.com