back to article Why storage needs Quality of Service

Storage consolidation looks great on paper. Direct-attached storage is notorious for its inefficiency, with some arrays being just 40 per cent occupied or even less. Providing an Oracle database with 10,000 IOPS could mean aggregating dozens of 15,000 RPM drives, and unless the database is several terabytes in size that is a …

COMMENTS

This topic is closed for new posts.
  1. Storage_Person

    Additional Complexity

    The issue of storage QoS is even more complex than the article suggests. Moving an entire host on to a specific tier of storage is not the correct solution, because most servers have a relatively small amount of active data, and a large amount of inactive data, obviously with different I/O requirements. So there needs to be the ability to identify and migrate the hot data on to flash whilst retaining the cold data on HDD. Plus of course the definition of hot data changes frequently over the work day, the business month, the calendar year, etc.

    And defining requirements in terms of IOpS is not a good solution either. Even a slow disk can provide thousands of IOpS if the application requires streaming whereas the fastest HDD can be defeated by a weird access pattern. Flash has a major benefit here for reads, but you can craft write access patterns that can bring flash drives to their knees as well. Response time is a nicer way to think about things, but that would require storage companies to work with OS/database/app companies and they're notorious for having no desire to do so.

    Finally, the idea of having lots of policies around to define relative priority of servers is too time-consuming and error-prone to be worthwhile. It needs to be a totally automated system, dynamically moving data from one place to another to ensure that the the right data is on the right tier at the right time and without operator-created constraints. This is not the type of functionality that someone will purchase as an add-on or third-party feature to existing arrays, it's going to need a new product (or company) which uses this as the basis for their sales pitch. Kind-of like Compellent, but much more dynamic and adaptable and without the manual configuration.

    1. Anonymous Coward
      IT Angle

      Re: Additional Complexity

      Essentially your making the case for applying a type of predictive analytics with high autonomy on rule determination and application. I've done that in several fields (logistics, medicine (epidemiology), even financials and the social sciences. The fun part will be determining the monitoring (wiring) harness for the workloads although... that part's going to be hard.

      Now I know what's going to be occupying my thoughts for a few.

      1. Levente Szileszky

        Re: Additional Complexity

        I would argue it's not THAT difficult, see every storage company's perfmon request *& subsequent analytics when they are trying to size up your possible workloads to build a possible solution for you... just keep thinking and try getting a lot of sample, you might just end up with something. ;)

  2. Jim 59

    Minimum requirement

    Servicing the "minimum requirement", as explained in the article, sounds like a ingenious solution. Simple and potentially effective.

    I wonder where backups would come in the QoS hierarchy. Either disk to disk or disk to tape, they need high priority to stay in the time window, barring snapshot solutions.

  3. Nate Amsden

    can netapp+fujitsu enforce minimums?

    From the article:

    "Another route, and the one chosen by a number of leading-edge developers such as Fujitsu, NetApp and NexGen (now part of FusionIO), is to enforce minimum application data throughput levels rather than maximum."

    I wasn't aware NetApp had QoS(though I don't follow them closely at all) in doing a quick search it seems as of October 2013 they supported rate limiting but I see no mention of enforcing minimums

    https://communities.netapp.com/community/netapp-blogs/sanbytes/blog/2013/10/03/how-storage-qos-works

    "[..]you set throughput limits expressed in terms of MB/sec (for sequential workloads) or I/O operations per second (for transactional workloads) to achieve fine-grained control. When a limit is set on an SVM, the limit is shared for all objects within that SVM. "

    (no solid indication that you can specify both IOPS and MB/sec levels simultaneously for a given workload)

    Fujitsu seems similar, again I don't follow them but doing a quick search seems to indicate they only support limits, not minimums:

    http://www.fujitsu.com/global/services/computing/storage/eternus/products/diskstorage/feature/strsys-f06.html

    "The Quality of Service (QoS) function [..] setting of an upper load limit for each application enables stable performance and reduces the impact on other applications due to load changes. "

    Perhaps that info is just out of date, though NetApp's info seems pretty recent, not sure when Fujitsu's info was last updated but since it is on their product page(rather than a blog post) I'd have to assume it's up to date.

  4. Jim O'Reilly
    Holmes

    Why not just use SSD or flash

    Starting the article with "Providing an Oracle database with 10,000 IOPS could mean aggregating dozens of 15,000 RPM drives, and unless the database is several terabytes in size that is a lot of wasted space" begs the issue of why you should just go to SSD.

    No flash cache, no SSD tier...just use SSD. Heck, a Terabyte SSD is just $500 retail, which is about the same as a 15,000 RPM hard-drive. And you'll only need a couple to get 10K IOPS!

    I'm surprised this discussion about how to get more from hard drives still comes up. It's over, and SSD won by several laps!

    1. Anonymous Coward
      Anonymous Coward

      Re: Why not just use SSD or flash

      Even if enterprise grade flash were available at those prices, SSD is not the panacea you believe, maybe in a 1 to 1 application to array relationship it's simple. But building a shared storage infrastructure and guaranteeing performance can be complex and has traditionally been very expensive because of the need to physically partition resources. SSD doesn't really change that if anything the performance differential between the have's and have not's is exacerbated, making QOS even more essential.

      HP's Priority Optimization feature on 3PAR can do all of these in the latest release, requires no hardware partitioning, no host agents, enforcement is instant with continuous real time sampling and you can prioritize QOS goals via relative importance of the application.

      Set Min goals & Max caps for IOps, MB/s and Latency,

      Set relative enforcement priorities Low, Medium, High.

      Configured per Application and / or per Tenant (Virtual Domain).

      So if you have a noisy neighbor or development area that needs logical isolation, a badly written query that needs a cap to keep it from swamping the array or you need to provide performance SLA's to particularly important apps you can do all three. Taking a step further you can nest rules and oversubscribe to allow for bursting, you can also schedule rules to adjust QOS to suite different demands throughput the day.

      1. Jim O'Reilly
        Pint

        Re: Why not just use SSD or flash

        These all seem like things you do to get around having slow storage. Why not just put in fast storage in the first place?

    2. Nate Amsden

      Re: Why not just use SSD or flash

      you may be surprised about this discussion, by the same token I'm NOT surprised your not in charge of any serious storage (sorry).

      1. JEDIDIAH
        Linux

        Re: Why not just use SSD or flash

        Again we are dealing with the issue of unacceptable latency that I have complained about before. Pretending that you can throw together wildly different use cases and call it done won't necessarily make it so.

        I may not be "in charge of any serious storage" but I am a victim of those that are.

      2. Jim O'Reilly

        Re: Why not just use SSD or flash

        Sorry to deflate you, Been there, done that!

  5. Dave@SolidFire
    Thumb Up

    Indeed, delivering good QoS isn't easy

    Good to see some of the incumbent vendors acknowledging that QoS is essential in large scale, multi-application & multi-tenant environments. But as the article alludes to at the end, it's not such a simple task on most systems today. Between juggling tiers, RAID levels, and noisy neighbors, it's nearly impossible on most systems to guarantee a minimum level of performance... which is really the key.

    Despite claims in the article otherwise, Netapp's QoS features today are just rate limiting ( http://www.ntapgeek.com/2013/06/storage-qos-for-clustered-data-ontap-82.html ).

    Fujitsu has made some a few references to "automating" QoS, but there doesn't appear to be any real detail on what that entails.

    Only SolidFire has built its architecture from the ground up for Guaranteed QoS, including the ability to easily specify and deliver minimum performance guarantees, and adjust performance in real-time without data movement. ( http://solidfire.com/technology/qos-benchmark-architecture/ )

    Going forward, high quality QoS, including guaranteed performance, is going to be essential in enterprise class storage systems.

    1. R8it

      Re: Indeed, delivering good QoS isn't easy

      Sorry Dave@SolidFire, but 3PAR Priority Optimization QoS software seems to do exactly what you claim "only" SolidFire does. If that was your big unique differentiator, time to think again:

      http://www8.hp.com/us/en/products/storage-software/product-detail.html?oid=5386541#!tab=features

      1. Dave@SolidFire

        Re: Indeed, delivering good QoS isn't easy

        What I said was that only SolidFire had build the architecture from the ground up to support Guaranteed QoS. 3par has bolted QoS functionality onto a 15 year old asic-based controller -- a good architecture, but not one designed with QoS from the start.

        I expect you'll see every other major vendor follow suit, just as they bolted on thin provisioning after 3par innovated in that area. If I've learned anything from 3par's marketing over the years, a bolt-on is never as good as designing it in from the start :)

  6. GWagner
    Pint

    I’ve been educating storage admins on the merits of storage QoS over the last couple of years, and I think you’ve hit the key value props (rescuing stranded workloads from DAS and dealing with noisy neighbor syndrome). What I’ve observed is that before learning about storage QoS, admins accepted the fact that resource contention was the ugly reality of shared storage, and their job was to work some MacGyver action to make it bearable.

    Bryan mentioned NexGen Storage / Fusion-io (now ioControl Hybrid) as an example of companies that currently offer storage QoS capabilities. ioControl uses QoS to make the most of what is a limited resource in hybrids, flash. It allows admins manage which workloads get flash priority. Workloads that need less get less, and those that don’t need any, get none. As Bryan mentioned, ioControl QoS policies enforce performance minimums, which are based on IOPS/TP/Latency levels.

  7. Iknowsomethingaboutstorage

    This article miss key information

    Hi,

    We're currently using EMC VMAX and VNX systems and both have QoS functionality. We use this feature mainly on VNX because this system is shared between prod and dev apps.

    VNX QoS engine let you create policies to limit or guarantee IOPS, Mbps or response time.

    It runs pretty well.

    I'm quite surprised to see no mention to one of key storage players, so I believe Bryan has a partial view of the storage arena.

    1. Anonymous Coward
      Anonymous Coward

      Re: This article miss key information

      Surprised, because the one complaint I hear time and again about the VNX implementation of QOS is regarding its inherent complexity, limitations and hidden "features". To the point that it is seen as unusable for anything outside of a a very basic implementation or a typically slick lab demo.

  8. Louw Pretorius

    Not so simple...

    I think that this topic cannot be solved by just one vendor or just one approach.

    Latency becomes a problem as soon as the request leaves the server hence, server-level caching is required. (SSD or PCI style)

    VMware provides a simple solution with their SIOC (Storage IO Control) method where you can choose the latency and the VMware cluster will throttle the whole volume's Queue length to 1.

    VMware also provide IOPS and throughput limitations per vm if your require this.

    Obviously storage tiering is needed with the required SSD's to sponge up the read's and writes but care must be taken here to not overburden MLC as it's not an ideal write-cache.

    Lastly RAID-write penalty should be taken into consideration or NVRAM solutions should be mandatory to try and alleviate unneeded IO-multipliers on the back-end.

    1. Anonymous Coward
      Anonymous Coward

      Re: Not so simple...

      Agree with most of what you said "horses for courses" but what if you don't have VMware or what if your latency critical apps sit on physical servers. The storage is the lowest common denominator in the stack so really should have these features built in. As usual in the storage industry the usual suspects will claim feature parity, but the devil is as ALWAYS in the detail.

      Mirrored NVram and write coalescing are no longer really differentiators, haven't been for years, everyone has them and if they don't, or rely on some external UPS facility or similar for cache backup you should run a mile.

  9. Anonymous Coward
    Pint

    This is like tuning racing cars which are having parts replaced/upgraded as you are tuning them for the next race. I've been there, done that, burned the t-shirt and I most certainly loved the challenge. When I was young and (somewhat?) stupid. On that note, it's over the yardarm somewhere.

This topic is closed for new posts.

Other stories you might like