back to article Want to super scale-out? You'll be hungry for flash

Scale-out flash arrays sound excessive but they are really not. After all, we can understand scale-out filers, adding node after node to store rapidly growing file populations. Use cheap and deep disk for the data, with flash stashes used to hold the metadata and locate files fast. When the files are large then sequential …

  1. This post has been deleted by its author

  2. markkulacz

    Scale Out w/ Erasure Coding

    Scale Out with Erasure Coding typically (always?) requires re-striping after node addition. Administrators are never well informed of this - More nodes means a different layout, and that leads to having to fully read in all impacted data, recalculate parity, and re-write everything. As the cluster scales, the protection required to statistically sustain the same protection level also means additional parity is required. Some of this may be handled automatically with pooling, but some is not.

    Solidfire (which uses 2x protection if Im not mistaken), or traditional HDFS with 3x protection on an all-flash box, are the ways around this. Or using Isilon with a mirroring protection (not N+M). Scale Out requires mirroring, not erasure coding, to support the non disruptive scaling while also avoiding re-striping (which is not the type of activity we want to see SSDs exposed to).

    Deduplication and Compression have a big impact in making all-SSD scale out economically viable. Problem there is compression is probably using a packed-segment approach (common in LFS schemes), and that increases metadata activity, snapshot overheads, and can secretly lead to a lot of segment packing/unpacking and then we are right back to additional SSD wear simply to offset the compression... which is there is decrease the SSD wear in the first place.

    There is no magic in this world.

    I am an employee of NetApp, and my comments do not reflect the position of my employer.

  3. SnapperHead

    Chris - there are some very serious disconnects in your article. Companies cannot run their business on PowerPoint, but it seems articles in The Register can write articles about them.

    What a powerpoint says makes sense on the surface, but in practice falls apart very, very quickly. The tradeoffs that have to be made on some architectures are huge, and very limiting. You don't see those tradeoffs listed in the PowerPoints that you've had presented to you.

  4. Nate Amsden

    reading this

    makes me glad I bought a 4-controller 3PAR 7450 up to 460TB of raw flash(and in 5 years I'm sure that number will probably be in the 1PB range with larger capacity SSDs that will most certainly come out in that time). Not for everyone I'm sure but it does the job for me just fine.

    Realistically though I don't see my org going past 200TB of raw flash (meaning I don't have to expand the footprint of the original installation which is 8U) in the next 3-5 years(even that is a BIG stretch).

  5. Anonymous Coward
    Anonymous Coward

    Nimbeciles

    But what about Nimbus??

  6. Anonymous Coward
    Anonymous Coward

    Scale Out is Necessary for All-Flash Arrays

    There are a couple of challenges with All-Flash Arrays which drive scale-out. The first is controller performance. If you consider the typical modular SAN array controller, such as the EMC VNX2 and similar products from other storage vendors, these array controllers are based on dual-processor Intel Xeon system boards. The highest-end version the VNX2 controller can support 1,500 hard drives--seven cabinets of disks. But even with 15,000 RPM drives, this is a back-end disk IOPS of about 300,000. At 20,000 IOPS per enterprise SSD, fifteen SSDs will provide 300,000 back-end IOPS.

    So when you look at All-Flash Arrays which use a similar approach of dual, fail-over, high availability controllers based on two-socket Intel Xeon system boards, you can see the controller is the performance bottleneck.

    If you add to this inline efficiency features such as deduplication and compression, those features increase effective capacity and reduce SSD wear, but also require controller resources, which further limits the performance scalability of the dual-controller array.

    Finally, inline deduplication requires keeping deduplication metadata in memory, and either logged into NVRAM (most AFA vendors) or protected via UPS (in the case of XtremeIO). The inline deduplication metadata database will limit capacity scaling. More capacity means more metadata which means a bigger database which means more RAM and more NVRAM. This is why many AFAs with inline deduplication have a fixed capacity for a controller pair.

    This capacity limitation means the only way to scale capacity is through horizontal scale-out. If a contiguous data storage space is desired, it becomes more complicated. It could be simpler without that contiguous space, but if an app needs a lot of space, host base volume management will be required to concatenate discreet flash array capacities.

    The previously mentioned fact that controllers are the performance bottleneck, to scale controller IOPS with the back-end capacity, All-Flash Arrays have to scale out.

    There is a happy medium. If a dual-socket Intel Xeon based controller can support, say 500,000 IOPS, and that is all a customer needs, but they need more than one 25-SSD disk tray of capacity, they can scale to multiple disk trays, but the overall IOPS will remain the same. Some scale-out All-Flash Arrays, such EMC's XtremeIO, do not support more than one disk tray per controller pair, while others, such as Kaminario's latest, allow for scaling of capacity to two disk trays per controller pair.

    SSD capacities are increasing dramatically (faster than the Intel Xeon CPU performance increases). All-Flash Array vendors will choose SSD capacities which offer the best $/TB, and build arrays around them. The Intel Xeon CPU will continue to be a bottleneck. Because of this, I think the one disk tray to dual controller archetype and scale-out will be the norm for All-Flash Arrays for some time. The other alternative is SolidFire's design, which, because of its aggressive CPU to SSD ratio, seems to assume very high density SSDs are coming. The greatest risk to Pure Storage's design, which is very well-suited to the "happy medium" described above, but will be pressured to offer a scale-out solution as SSD densities increase. Alternatively, Pure could build bigger storage controllers based on four-socket Intel E7 CPUs.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like