gold whop whop whop!
Listen and prepare to behold this vision. Storage arrays will become nearline vaults because storage memory will steal their primary data storage role. There is a thundering great divide, a Grand Canyon, between server memory in the single digit terabyte area and "storage": double digit terabytes and up to multi-petabytes of …
Wednesday 30th January 2013 10:15 GMT Khaptain
The 1st and Primary question
Do we "really" need so much data ?
Some mildly interesting facts on this page about storage requriements in relation to how much space is required for all of the books in "the Library of Congress"
The first figure is the most astounding:
•“Every Six Hours, the NSA Gathers as Much Data as Is Stored in the Entire Library of Congress.”
Wednesday 30th January 2013 13:30 GMT Martin Gregorie
Single level storage is a very OLD idea
This concept was first developed in-house by IBM in about 1970 as 'Future Series', originally intended to be a replacement for the System 360/370 mainframes. That project was axed in about 1972 before being resurrected in the late 1970s as System/38, which was on sale from 1979 and later morphed into the AS/400 range.
Its key feature was single level virtual storage. All RAM and all disk space was mapped into a single address space, so the only storage access method was virtual memory page reads and writes. There was no separate filing system as we know it because all files were in-memory structures. This worked well and was fast and reliable because RAID 5 disk arrays were used. Replacing disks was very easy - you just migrated disk-resident pages off a disk you wanted to replace. Adding a disk was even easier: plug it in and the load-balancing paging algorithm would to start moving pages onto it.
I'm not particularly an IBM fan, but this was one bit of hardware architecture that they got right.
Wednesday 30th January 2013 13:57 GMT AndrewA
But this future will never come
RAM sizes today mean that I can use in-memory databases for the kind of data problems I was handling in the 90's. I remember my first *million* row database. How quaint it seems today.
Just as system memory sizes have grown over the decades, the size of the problem that they *can* address has also grown, so we are today still surrounded by spinning rust.
Tomorrow will be the same. Just as today's systems are orders of magnitude greater than a decade ago, BigData is an orders of magnitude greater problem which will continue to require spinning rust.
So I'm not so hopeful that it'll all fit in memory...
Wednesday 30th January 2013 16:18 GMT Frumious Bandersnatch
Re: But this future will never come
So I'm not so hopeful that it'll all fit in memory...
There's been a trend in research systems at least towards looking at using RAM to store index information while delegating actual data storage to (flash) disks. FAWN-DS (Fast Array of Wimpy Nodes Data Store), for example, reduces the amount of RAM used by each index entry to 6 bytes, while SILT (Small Index, Large Table) achieves even more compression of those index data (somewhere between 1.5--2.5 bytes per index entry, iirc). It also helps that these systems are designed from the ground up to work well with flash storage and avoid the write amplification problem (where a single write requires several physical writes due to the need to rewrite entire memory blocks when a single page changes). I'm not sure how many of these design features are implemented in today's commercial-grade systems (like hadoop's file system) but I'd wager that there are more similarities than differences.
If you add to this the fact that clustering your storage nodes is relatively easy using consistent hashing (or a DHT) to spread the storage across many nodes/controllers each with their own RAM and local storage, then I think that such a future is actually quite practical today. A lot more practical than you think.
Wednesday 30th January 2013 18:50 GMT Tom Maddox
Layers upon layers
All that's happening is the next step in an ongoing evolutionary process. Over the past few decades, the number of intermediate steps between slow storage and fast compute has been growing, with on-die CPU cache, level 2 cache, level 3 cache, system RAM, HBA/controller caching, onboard flash cache, storage array cache, on-drive cache, and now array flash storage providing yet another layer designed to improve the speed of transfer from static storage to active compute. The slowest storage has essentially stagnated, from a speed perspective, merely growing in capacity. The next tier up, "fast" spinning disk, is itself turning into yet another intermediary layer for staging data.
All any of this means is that same as it always has: ultimately, the goal is to touch the disk as little as possible and keep the relatively small amount of data you're actually using somewhere else.
Saturday 2nd February 2013 00:47 GMT Herby
Big question for everyone...
Is 64 bits of address enough?
You only get a range of 64 exabytes (64 x 10^18, more or less).
You never know? IPv6 allows for 128 bits of addresses 4 x 10^40, so maybe not?
Of course there is the guy who wanted to be paid by grains on a chess board, one grain then two, then doubling for every succeeding one. Problem was there weren't enough grains in the world to satisfy him, and (as the story goes) he lost his head. Never mind.