The future of storage
This topic was created by Chris Mellor 1 .
The future of storage
I've been talking to Jean-Luc Chatelaine, EVP strategy & technology for DataDirect Networks, and I'd like to check out his view of things with you; it being surprising.
He thinks that, starting 2014 and gathering pace in 2016, we're going to see two tiers of storage in big data/HPC-class systems. There will be storage-class memory built from NVRAM, post-NAND stuff, in large amounts per server, to hold the primary, in-use data, complemented by massive disk data tubs, ones with an up to 8.5-inch form factor and spinning relatively slowly, at 4,200rpm. They will render tape operationally irrelevant, he says, because they could hold up to 64TB of data with a 10 msec access latency and 100MB/sec bandwidth.
He claims contacts of his in the HDD industry are thinking of such things and that it would be a disk industry attack on tape.
So .... what do you think of JLC's ideas?
Why … kick a dead horse? Tape is not [a] growing business. The smart tape vendors, like Fujitsu with CentricStor, are not enjoying great success. I bet on “cheap” disks with … mirroring in de-clustered RAID [or] Erasure Coding. [For example] 2.5-inch HDDs (2020 - 12TB, 3.5-inch - 60TB), more platters, all SAS.
Same more details:
The vanilla tape market is shrinking constantly, the only way to slow that is by using smart tapes such as virtual tapes libraries. The current vendors of the VTLs are not selling them in quantities which can stop the erosion of the tape market.
“cheap” disks connected in RAIN (Redundant Array Independed Nodes) structure .
The 2.5-inch will continue as enterprise HDDs not the “cheap” line, the 3.5-inch will continue as “Nearline” capacity disks. Going back to large factors such as the 8.5-inch means more energy, noise, space, not fitting standard rack , super long re-built times, etc.
What Jean-Luc Chatelaine says is interesting from a hardware perspective but it also reinforces our belief that the days of special purpose hardware are gone. Everything has to be software defined. Even in his scenario, the storage will need to be able to determine which combination of tier one or tier zero and slow disks is correct for each use case and how to configure them.
Nexenta agrees with Steve Herrod, chief technical officer at VMware, when he talks of storage being "virtualised into a software service that can be fully automated and deployed as needed."
None of us want a world where individual silos of very specialised hardware are used to treat single purpose applications. To quote Herrod again: "You want to think about your data centre as an entire pool of resources that can be safely and efficiently assigned to any sort of application including those that exist today."
> So .... what do you think of JLC's ideas?
As mentioned in the comments, the problem with this "tape is dead" stuff is it is mostly wrong. It doesn't make economic sense as Amazon Glacier is demonstrating. Static, mostly untouched data if it can be decoupled (unstructured data) should be on the cheapest tier. Sure, there are long stale records in databases - but not easily decoupled. Why would you pay 10 times as much to store it (disk versus tape) if you rarely if ever access it? Especially if there are LOTS of it.
Secondly, there is that whole nasty DR component. If the data isn't being spun off to tape and stored elsewhere, what about DR? I've been in ear shot of just that discussion when these tapeless solutions are described. You need two of them, right? Remote enough that they don't get wiped out, right? Now you need bandwidth to keep them in sync. That's the biggest and ugliest problem in this whole "tape is dead" scenario. Not everyone is an Amazon or Google and own their own massive pipelines.
answer and some more
As explained above I didn’t use the right words. Despite the benefits of tape usage such as the lowest price per GB, the lowest energy consumption per GB, portability, fastest development in bit density and new features such as WORM, encryption and Linear Time File System (LTFS), the tape market is constantly shrinking. The major reason is using nearline disks with deduplication as the targets for backup. The idea of virtual tape libraries which combines benefits of disk (as a cache) and tape on the backend didn’t achieve great success outside the mainframe market and is not able to stop the erosion of the tape market.
Currently most of the storage Control Units are based on the same technology as the servers; multi core Intel chips. In fact the multi core is used much more effectively is CU than in servers. Multicore technology and server virtualization bring some other developments to watch such as the Virtual SAN Appliances (VSAs) and embedded application on storage control units. The first emulates the server as storage CU, the second using storage CU for applications. The VSA:
Emulating shared block-access storage area network (SAN) system on internal DAS storage (running in virtualization partition)
The server SW acts as SAN array controller software
A pair of servers with high amount of direct-accessed storage (DAS), accessed by other servers across a network
LeftHand Networks pioneered, evolved as HPs StoreVirtual (VSA), NetApp, OnApp, Nexenta, StorMagic
Mellanox Storage Accelerator VSA product accessed over Ethernet or InfiniBand supports DAS & SAN promise better performance
Saves HBAs, switches, Physical CU price
The usage of embedded applications:
usage of integrated applications:
Remote replication (RecoverPoint on VMAX 40K, 20K)
Compression (Real-time Compression to IBM Storwize V7000, SVC)
Drive Encryption – EMC, HDS, IBM high-end subsystems
Server-less, LAN less backup
From a physical infrastructure perspective, the two-tier storage model (fast SSD + deep SATA) will become more prevalent in data-centric applications (including big data and HPC). Applications that traditionally relied on tape, such as backup and disaster recovery, are increasingly using disk-based solutions.
What is more interesting is how these disparate tiers of storage will be abstracted and delivered as services. Rather than having capabilities bundled in the hardware, software-defined storage will offer administrators the ability to programmatically control how physical resources are deployed, configured and managed. The control plane, rather than being within a single storage box is integrated across the entire storage system.
Tighter integration between applications and software-defined storage will allow appropriate storage media (SSD, SATA) and configurations (e.g., RAID levels, erasure coding) to be used according to application needs. Also, since flash drives can be used as cache as well as data, SDS systems will allow administrators to programmatically control whether an SSD drive is to be used as data or cache, and set the priority levels for different applications. As software-defined storage gains momentum, the opportunities to leverage the system's flexibility and simplicity are huge.
There still will be tiered storage, just SATA will phase out
Companies have an obligation to keep records for at least 7 years, in many cases in the financial sector you have a 14 year retention peiod. You don't want to keep this data relatively easily accessable (read deletable), so I would think a tape in a fireproof vault is still the most appropriate.
I would rather see the spinning disk to dissappear, and being replaced with PCM or other chips. Imagine you have 365 generations of backup, each of which you can access almost as immediately as the actual data, wouldn't this help other trends like true 24/7 computing?
Continuing this thought, if we now have solid state chips of, let's say, 2PiB, why not just put them directly on the PCI bus, omitting the need for a storage controller, or better, put the SSD directly on an adapted memory bus?
To be fair, things are moving that way already. If you look at 3Par, compellent, EMC and some of the new up and coming players such as Tintri or Nutanix, they're all adopting an SSD first approach. The themes come from two angles being auto tiering and shuffling blocks/chunks of data in its final resting place in the disk with varying degrees of granularity and regularity and the other approach being a tiered cache approach from PCI-E in the server, through to extending cache with SSD in the array. Some vendors are proposing caching appliances on the fabric. Its interesting as the former approach lends itself to trending IO workload over whatever period of time to figure out where to move blocks about and the latter is more for bursty workloads. I think in part the approach taken will depend also on how application storage workloads evolve, as today some make good use of cache, others doing.. some workloads can be trended, others not so much (such as VDI in a non persistent environment, as LBA addresses used for desktops are trashed and then new addresses used as desktops are refreshed).
- Vid Hubble 'scope snaps 200,000-ton chunky crumble conundrum
- Updated + vids WHOA: Get a load of Asteroid DX110 JUST MISSING planet EARTH
- 10 years of Facebook Inside Facebook's engineering labs: Hardware heaven, HP hell – PICTURES
- Very fabric of space-time RIPPED apart in latest Hubble pic
- Massive new AIRSHIP to enter commercial service at British dirigible base