back to article PCIe flashers bash storage networks

A sea change in storage industry thinking is occurring: storage networks are now seen as slowing down access to data. The PCIe flash DAS hare beats SAN and NAS tortoises every day of the week. We can call PCIe flash "storage memory" if we wish, but it's basically very fast direct-access storage, a closely-coupled data bucket. …


This topic is closed for new posts.

CPU cycles as the most precious data centre resource...

"Virtualised multi-core, multi-threaded servers are impatient. They want data for the apps in their virtual machine instantly. It's odd; now that CPU cycles are cheaper than ever before, they are treated as the most precious resource in the data centre."

Rarely have a seen a case of missing the point more thoroughly than this quote. It's not that people are worried about "wasting" cheap CPU. It's that latency time on disk I/O has failed to keep up with processors. In consequence, for many applications, the application is bottlenecked by the storage. I know of many apps which spend 90% or more of their time I/O bound. So the issue is not "wasting" CPU time, but simply not getting through workloads fast enough or giving quick enough end-user response times. It's not the CPU time that's being seen as precious, but the poor end user who's sitting in a call centre with an impatient customer.

As for some sort of direct-access memory storage model over PCIe being faster, then of course. But just how relevant is that to most large real-world apps. Firstly it's an extremely good idea to establish a clear distinction between persistent storage and volatile working space. For this you need clear and controlled access methods with controlled APIs that can provide for security, protection from rogue apps, clean restart points, data sharing and much else. Those APIs can be at various levels - blocks, files, database etc. but exist they must. Those are also ideal points to establish shared access models and are hence ideal break points for network access (as anybody whose worked on large application architectures can agree).

Then there is the issue of just how many apps can exploit ultra-low latency times. I know of many real-world apps which are I/O bound at 10ms access times. However, I know of none which would be in that position with 100 micro-second latency times, an access time perfectly within the bounds of what can be achieved on common network protocols, such as FC or 10Gbps Ethernet. Indeed in many cases the largest element of that delay is not in the time on the wire, but in navigating the software stacks.

This is not to deny that there might be some specialist uses of direct memory persistent access models over PCIe, but these are likely to be very low functions. Perhaps the networked storage units themselves (albeit dealing with single point of failure issues is important).

Anonymous Coward

Your comment is a better article than the article - which appears to be a retread of a really weak article that appeared last month and which basically said "PCIe flash storage manufacturers think they have a nice product and this is what their press release said".


"CPU cycles as the most precious data centre resource..." again

The OP still missed it. CPUs are cheap. The Author's slight sounded like a stab at Virtualization, where apparently his mantra is "squeeze every bit out of the CPU as we can!". This is false. Businesses don't virtualize to drive up CPU utilization; they virtualize to reduce hardware count. Sure, a 10-core CPU goes mostly wasted being a mail server with 15K SAS disks. Stuffing PCIe flash won't improve the matter any. However, treat the SERVER as the most precious data centre resource, stack it high with VMs to utilize the CPU and RAM, you'll need something with high IOPS (be it remote or local storage) since hosting 10 VMs all doing various loads of sequential and random access will look like a bunch of random access requests (think, two sequential reads will have to interleave the segments, causing the heads to jump across the disk swapping back and forth, unless it prioritizes a whole file ahead of the other, which disk access doesn't really do). This is why high IOPS is in demand in virtualized environments. Each server is doing its own thing. If a server was I/O bound as bare-metal, think of how behind it gets when stacked with 6 other (even occassionally) I/O bound VMs.

The one good mark I can give is that the Author stressed that NAS/SAN isn't dead, but that they just need flash too, and that there are limits to the benefits of DAS. Unfortunately, that's about all of the topic the author seems to grasp.


Not addressing virtualisation

@Ammaross Danan

I wasn't addressing the value of virtualisation in my post, but of the role of I/O bottlenecks. However, since you raise it, the point you makes about virtualisation disrupting I/O patterns applies equally to any form of multi-application access to shared storage unless you maintain a system of rigid segmentation (which is usually undesirable from a storage utilisation point of view). Indeed even multiple apps under the same OS exhibit this phenomenon of disrupted I/O patterns, and this has got worse as disks have increased in capacity and fewer spindles are available. However, none of this is incompatible with the observation that I/O bottlenecks are increasingly the limiting factor on modern applications - indeed it reinforces it.

I'm well aware that virtualisation saves system boxes, power, software licensing (under some circumstances) and so on (although savings ). Indeed I go back far enough to have worked on virtualised systems in the very early 1980s. Also, virtualisation does make some other demands on fast I/O. There is a phenemomenon called I/O elongation (or it was at the time) whereby I/O latency time had the appearance of being extended (as far as the guest OS was concerned) as the I/O terminated whilst the guest machine was not scheduled to run. This means that guest OSs on I/O bound apps in contended environments tend to perform notably worse than when run native. Improvements to VM software guest scheduling has helped this, but owing to the common x64 OSs not being written with virtualisation in mind, there are limits to how effective this can be.

Virtualisation is fine, but it should be remembered there is one resource that can't be virtualised, and that is time. Any time an OS has to deal with the real world, the timing issue raises its head. If the guest OS is unaware of running under a hyperviser then there are many complex issues which can lead to unhelpful behaviour on contested platforms.

Bronze badge

The problem really is that they all miss the point

I helped design a data center virtualization network a while ago that made use of heavy VM relocation to achieve insane cost reduction. And this is where SAN was critical. IOPs are great, but IOPs can be achieved pretty easy on most any media where there is large storage systems.

Using a software based SAN instead of a hardware one, we loaded up a single massive RAID with 512GB RAM for disk caching purposes. In many cases, this meant that the disks were almost never touched at all. The machine running the disks (and cache) were heavy iron with redundant absolutely everything. We used a fancy fancy file system (mostly my design) which would perform logical block addressing and basically made growable partitions that reference counted block accesses and therefore stored all the "hot data" on faster drives while leaving the less common data on drives which for the most part are just powered down altogether. Flash wouldn't have improved this situation any... more RAM would be better though.

Now, the big iron was redundant through 16x PCIe switching and therefore all data was passed from machine to machine at lightning speeds. The two monster raid servers were expensive and used very much hard drive space, but they are rock solid and the power they use is made up for in the next bit.

A pile of cookie sheet servers with gobs of processing cores were stuck into several racks. They were all connected to the SAN using 10Gbe and iSCSI. They're plugged into a 10Gbe switch and uplinked to the servers using 40Gbe. So... load balancing is not a real issue. Other than the servers and the switch, the rest of the network is pretty low power and here's why.

Hundreds of idle or low usage servers reside on a single physical server. As CPU consumption for a virtual server rises, another server is automatically powered up and virtual machines are migrated to that server. When that server drops in CPU consumption, then it remigrates back to the one big server. There are times when hundreds of servers need to be running and there are other times when no more than 3-4 servers need to be running. This wouldn't be possible without SAN and virtual machine migration (and some fancy scripting).

The monster servers also use very little power unless they absolutely need to. Sure, the motherboard and RAM suck up a pretty stable 300watts or more... but the slow as hell 5400RPM drives use less than a watt when they're sleeping. And since we only have about 4 terabytes of actual high access rate data, a terabyte of RAM across two servers does a pretty good job caching it. Also using 15,000RPM drives for 4 full terabytes keeps things moving nicely.

Now, Flash would probably speed us up a bit, but given the way we allocated the blocks on the 4TB disks, they are being written A LOT!!! so even "Enterprise Flash" will have to be replaced all the time. Besides, thanks to the RAM cache, we are pretty much flooding the PCIe bus of the servers.

Oh... as a bonus.. thanks to most servers being shut down most of the time... the cooling cost of the room is dirt cheap compared to most data centers. And given the totally modular design, it can be scaled up tremendously for peanuts.

So... the guy who wrote the article doesn't get virtualization... that's pretty obvious. It's not about trying to save CPU, it's trying to save power, CPU, etc... the flash guys might get it, but in reality, if you have 2000 virtual machines and you're using Flash as a cache to speed up operation handling, it'll burn out blocks even on million write cycle flash in no time. RAM caching is the only real solution here. I never really liked SAN companies because their product managers seem to be so off the ball on what is actually needed. They're great if you need a 5 petabyte solution, but if you need a tiny little 1 petabyte solution with high performance, well, DIY works pretty good too.


I also don't understand why the article mentions networked storage arrays, as regardless of whether these use PCIe cards (not sure if production versions of these exist/have matured yet) or Flash SSDs, the network latency is constant.

In addition, I think you're mistaken in the idea that "servers running random I/O-bound apps are moving from storage networks to PCIe flash" - there will still be a huge need for storage networks as these are absolutely necessary for scaling, consolidation and redundancy. The technology they use to actually store data is less relevant than these other reasons for their existence.


PCIe flash cards are just expensive disk drives

We're seeing the storage industry go full circle with flash.

The reason SAN/NAS was invented in the first place was that disk drives were the most expensive component in a server. By aggregating disks in a centralized location (outside of the server) capacity utilization and availability improved. A PCIe flash card in a server is just like a disk drive in a server, no other server can share it's capacity/performance resources = poor utilization of a very expensive resource; and for 99% of the implementations out there, if the card goes offline, you're data is offline or worse yet, you lose data.

PCIe flash cards still cost about as much as the entire server and are, again, the most expensive component of the server. It's a natural evolution that flash will migrate out of the server and into shared storage environments for economic efficiency.

Silver badge

I still need SAN

Latencies aren't much of an issue, but I have a pressing need for 100Tb+ sized chunks of storage.

The last data set my users worked with was 250Tb, the next set will be petabyte scale. Local caching helps a bit but you can't fit that kind of storage into your back pocket.

(No I don't virtualise. When 2 users can max out a shelf full of 48core blades, there's no point in doing so)


Bandwidth and latency confusion


I'm wondering why you cite improved bandwidth numbers (40Gbps Ethernet and 16Gbps FCP) as the solution to the latency problem? It's particularly irrelevant for the Ethernet example given that 40Gbps Ethernet is multilane 10Gbps :)




No Shit Sherlock.....

And the news is....?

(OK Happy New Year anyway.)

This topic is closed for new posts.


Biting the hand that feeds IT © 1998–2017