Block storage is dead, says ex-HP and Supermicro data bigwig

Tuesday 10th November 2015 13:35 GMT chrismevans

Kinetic drives are clever, but....

There are a few points to consider here.

1. You still need a map/metadata to track the blocks that are in or out of use on a device. Nothing changes over traditional storage.

2. If the device (an HDD) makes decisions on where to locate objects, then you have no control over performance. Techniques like prefetching become unusable because you can't take advantage of head proximity or the position of data on a track.

3. flash makes Kinetic drives obsolete and pointless as any 4K block can be stored/retrieved with (typically) equal performance (barring device garbage collection).

Kinetic-style drives will only be useful when the drive itself has more of a degree of autonomy i.e. when the drive can replicate its own data to another drive without involving the host. I think that's a while away.

1 0 Reply
Tuesday 10th November 2015 13:38 GMT Anonymous Coward

really?

"In the early days of disk drives, the fastest access was when the data was in contiguous blocks that could be read sequentially off the disk drive"

Not true. Because the disks couldn't be read fast enough to make a continuous read from disk, so the placement of data onto the disk, when optimized, took the processor limitations into account. He, of all people, should remember this.

...ninja'd again...

6 0 Reply
1. Tuesday 10th November 2015 14:39 GMT JacobZ
  
  Well done. Re: really?
  
  You are correct, sir.
  
  I am old enough to remember the days when interleaving was a standard part of laying out a disk, and getting the skip factor right was critical to performance.
  
  3 0 Reply
2. Tuesday 10th November 2015 16:56 GMT admiraljkb
  
  Re: really?
  
  ah, the good old days. I remember swapping an i8088 cpu out for an NEC v20 in order just to get enough of a performance boost to get the interleave down to 3:1 on my MFM 20MB HD. Reports ran much faster after that... *sigh* Does this mean I need to start yelling "get off my lawn" to all the neighborhood kids?
  
  3 0 Reply
  1. Tuesday 10th November 2015 20:20 GMT Destroy All Monsters
    
    Re: really?
    
    That's positively STEAMPUNK!
    
    1 0 Reply
  2. Thursday 12th November 2015 07:55 GMT Colin Tree
    
    Re: really?
    
    Double data paths internally gave the v20 the edge, upped the storage and transfer speed on the old mfm drives with rll 27 encoding, nearly wet myself with excitement.
    
    0 0 Reply
  3. Friday 13th November 2015 15:28 GMT jelabarre59
    
    Re: really?
    
    > Does this mean I need to start yelling "get off my lawn" to all the neighborhood kids?
    
    More like "get your Surfaces and iPads off my lawn"
    
    1 0 Reply
3. Wednesday 11th November 2015 10:13 GMT sailnfool
  
  Re: really?
  
  You are right that I neglected the interleaving of blocks of data on the device (we used to time instruction sequences for the interleave time on drums), since that quickly dropped into a hardware rather than software paradigm to manage the interleaving. From a pure application standpoint, the blocks were sequentially allocated, even if interleaved. Thanks for the reminder on the interleave.
  
  0 0 Reply
Tuesday 10th November 2015 14:07 GMT AndrueC

inodes (now called metadata)

Are they? That's news to me. The term 'metadata' is relatively new but inodes are still called inodes. All that might have changed is that we now classify them as metadata whereas that term might not have been around last century.

Tyres are still called tyres even though we might classify them as 'vehicle parts' ;)

And anyway as others have said block storage will still exist. It'll just (perhaps) be buried where most people don't see it. Then again most programmers probably don't see it now. I get my data via an ORM most of the time and it's already 'objected'.

2 0 Reply
1. Tuesday 10th November 2015 21:33 GMT Anonymous Coward
  
  Tyres - aren't these classified as road / vehicle interfaces
  
  2 0 Reply
2. Tuesday 10th November 2015 21:33 GMT fluffybunnyuk
  
  i still use binary block storage with no FAT/ inode table on 7 drives right now with some large tapes that are just binary blocks with no headers. not dead yet...
  
  1 0 Reply
Tuesday 10th November 2015 19:59 GMT Yaron Haviv

Object can be faster, but UDP?

i agree with most of the points

yes, Object can be faster than block and is the future, most block vendors use some form of versioned b-tree, object or K/V use faster Hash, the current implementations of object over file are not really efficient since they double the overhead and add a slow HTTP protocol in front.

yes, we need metadata, it can be coded in yet another K/V set, K/V is not an alternative to Object which have indexed and extensible metadata, security, management, tiering, EC/RAID/DR, .. but rather the best underline tech to store the object chunks.

Note that a big advantage of K/V is eliminating the "double fragmentation" in Flash hardware, i wish some of the NVMe or NVMe over Fabric (Eth) guys will extend their APIs to K/V. many new age apps are already using K/V (in the form of RocksDB or alike) since its faster to develop with and leaves the hard problem to someone else. having hardware K/V is the natural evolution.

re UDP, don't think it works at scale, Coraid pioneered similar notion and are now closed, you must have ways to deal with network congestion, and have mechanisms like TCP congestion win .., or RDMA credits and congestion avoidance.

a key problem with Kinetic is the CPU overhead on the client side, lets assume i access many drives or flash it would eat up all my CPU, vs a SAS HBA or NVMe which do 6GB/s all in hardware.

for that reason i think NVMe over fabric (RDMA) or Intel Omni-path will be more efficient when it comes to remote Flash or remote K/V.

Yaron

SDSBlog.com

0 0 Reply
1. Thursday 12th November 2015 16:14 GMT CaitlinBestler
  
  Re: Object can be faster, but UDP?
  
  Internal traffic within a storage cluster is carried over a VLAN, frequently configured to use no-drop Ethernet. This is how FCoE works. Everything that works for FCoE works for UDP.
  
  You still need to confirm successful transfer, but that is true with connection-oriented transports as well.
  
  16 or 32 bit protection is totallly inadequate when trying to manage Petabytes of data.
  
  Custom congestion control building upon negotiated reservations can start the transfer at full wire speed. You cannot do that with any connection-oriented transport.
  
  0 0 Reply
  1. Thursday 12th November 2015 17:56 GMT Yaron Haviv
    
    Re: Object can be faster, but UDP?
    
    Caitlin, We haven't talked for years :)
    
    FCoE never got to work in scale with multi-hop networks, cant point to many customers doing it
    
    Cisco main push was on a single switch hop (only to the rack switch)
    
    As you know I built a bunch of IB & Eth clusters with thousands of nodes, Eth & IP are not well designed for lossless, Pause is not Credits and it requires careful/E2E configuration of PFC and Switches.
    
    packets do drop due to HOL blocking and require re-transmit, if a bunch of guys send lots of packets to a destination, you will surely hit a congestion at the switch egress, new switches dont have enough buffers to hold it, and doing pause will propagate congestion through the network.
    
    we spent a lot of time adding capabilities to RoCE NICs and make it more robust at cloud scale and there is more to do, it will require doing a pretty complicated layer over UDP, so why not just stick to TCP for software or RoCE for Hardware accelerated
    
    as you know from your TOE experience, TCP and DC/TCP can be fast, issues are actually more about DDP (DMA), allowing storage data to be gathered/scattered from/to app buffers w/o a copy, doing header/data split .., just like SCSI, or FC, or NVMe or RDMA do, and another critical challenge is how to deliver the notifications and doorbells to/from the app CPU core/thread to avoid locking (something NVMe & RDMA do well), so should we re-invent it all now for UDP?
    
    Yaron
    
    SDSBlog.com
    
    0 0 Reply
Tuesday 10th November 2015 20:01 GMT allthecoolshortnamesweretaken

IIR Hollerith punch cards derive from Jaquard punch cards used for controlling mechanical looms.

4 0 Reply
1. Friday 13th November 2015 15:30 GMT jelabarre59
  
  > IIR Hollerith punch cards derive from Jaquard punch cards used for controlling mechanical looms.
  
  Never let *facts* get in the way of a half-wit anecdote.
  
  1 0 Reply
Tuesday 10th November 2015 22:24 GMT Dave 13

My thoughts exactly.

0 0 Reply
Tuesday 10th November 2015 23:19 GMT Anonymous Coward

Computers use zeros and ones and this isn't going to go away any time soon. Disk drives and memory, whether the flash type or the RAM type do the same. There's always going to be a translation from how you want to read and write your data into those zeros and ones. All that's up for debate is where you do that translation and whether it is done more than once, with intermediate layers. Translating as few times as possible will generally lead to better performance, at the expense of portability. That's why some applications use raw devices, and at the other extreme we have virtualisation layers such as those found in dedicated appliances, LVMs or advanced file systems like ZFS and BTRFS. These get in the way and will inevitably affect performance, but they allow you to do clever shit.

0 0 Reply
Wednesday 11th November 2015 01:03 GMT Steve Chalmers

They're still linear arrays of bytes...

Agreed there are a lot of changes coming in storage, and I agree with Robert that the entrenched players may not be moving fast enough. What else is new in the tech industry?

As to storage, a disk drive is a linear array of bytes. Historically, we've used a file system to recursively divide that linear array of bytes into a collection of smaller linear arrays of bytes. Or disk arrays to stripe logical disks across physical disks (striping logical linear arrays of bytes across physical linear arrays of bytes). Object storage is yet another collection of smaller linear arrays of bytes (the objects).

A USB stick (which by the way is the replacement for the removable disk drive) is just another linear array of bytes.

Flash, or more generally nonvolatile memory DIMMs, are simply a new physical linear array of bytes to store smaller linear arrays of bytes (objects, files, whatever).

Agreed that how we organize and access the smaller linear arrays of bytes within the larger physical ones may change.

Oh, and accessing storage congests networks. Blindly using UDP is fatally flawed in nontrivial installations. Layering one of the emerging congestion control protocols over UDP is fine.

But I predict that 100 years we will still have physical devices which store linear arrays of bytes, and within them we will store smaller linear arrays of bytes. If block storage is dead, long live block storage II !

@FStevenChalmers

0 0 Reply
Wednesday 11th November 2015 01:45 GMT John Smith 19

"if you divide the data into chunks and store the chunks"

Is anyone else hearing the sound of a very big "if" ?

Is anyone thinking "MULTICS" direct segment mapping of disk files to main memory?

Is anyone thinking of "Kinetic Drive" as ICL CAF with long string comparisons?

Just asking.

1 0 Reply
1. Friday 13th November 2015 19:44 GMT Aremmes
  
  Re: "if you divide the data into chunks and store the chunks"
  
  I thought of the CKD format on IBM mainframes, where a program can instruct the storage device to locate a record by key and transfer it to a buffer in memory without CPU intervention.
  
  1 0 Reply
Wednesday 11th November 2015 10:13 GMT Anonymous Coward

Thank god it's just block storage... For a moment I thought he announced tape dead.

Who really gives a sh!t these days how data is layed down on the media ?

There will always be a medium, an adressing scheme and a transport protocol.

And there will always be chief technolgists and consultants to first confuse and then enlighten the customer.

How else could you sell something as boring as storage.

The real technological breakthroughs happen material sciences labs where new media is developped.

What storage vendors do is sell Pepsi or Coke. They all cook with the same water.

3 0 Reply
Thursday 12th November 2015 05:25 GMT Anonymous Coward

If he'd just used "Laundry" instead of "Chinese Laundry" he wouldn't have had to go through all that nonsense about being Politically Incorrect.

1 0 Reply
Thursday 12th November 2015 08:16 GMT Colin Tree

CAM memory

IDT content addressable memory is far more exciting,

I always objected to objective symantics,

preferring simplicity and the most direct path.

0 0 Reply
Thursday 12th November 2015 22:59 GMT CrosscutSaw

Yes yes yes...

"The problem is that the data in the cloud is accessible only at WAN speeds. That is fine for the data on your phone or tablet, but unsuitable when you want to run analytic applications on archived records.

In addition, the cost of public cloud storage is deceptive. For your phone or tablet data, it is palatable. When you get into your massive data needs, the cost of retrieving cloud data overwhelms the savings of not keeping the data within the company."

This ^^ I've been telling my counterparts who are infatuated with cloud this cloud that.

You should see the uproar in end users trying to access stuff over the WAN. You can't just tell them to be patient. Also we are beginning to see the price hit as we build more servers in the cloud. The monthly bills are piling up.

2 0 Reply
Friday 27th November 2015 08:11 GMT John Geek

that sure isn't the IBM card I remember from the 1960s, 70s.

0 0 Reply