back to article Aw, SNAP. It's too late, you've already PAID for your storage array

When a vendor says it has feature X, how do you know it's true? Storage arrays are mighty complex boxes and vendors' marketing of them can sometimes take a bypass around reality. How can you detect if this is happening? One way is to look for independent reviews of the product by a trusted reviewer. For example, product …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    can't afford to test?

    If you can't afford a test lab, and yet you're going to be spending 100k on storage devices, then you have your priorities in the wrong order.

    1. Trevor_Pott Gold badge

      Re: can't afford to test?

      The overwhelming majority of my clients can't afford a testlab, yet they need $100K worth of storage. They need storage to make their business go, but for them that $100K of storage is a *huge* chunk of annual revenue.

      Buying one because without it the business ceases to function is something that can be managed, with sacrifice. Buying two is likely not even possible, given the revenue situation, and certainly not because the nerds "need to test things on the second one" but can't really articulate what they need to test or why.

      This is why people like me build up test labs: multiple businesses combined can afford a proper lab, and someone to run it (me) even when they couldn't afford it on their lonesome. Testlab as a Service, wot?

      1. Nate Amsden

        Re: can't afford to test?

        [ended up being much longer than I expected]

        Some vendors don't even provide units for testing. I may not be a 3PAR customer today if NetApp hadn't outright refused to lend me an eval unit back in 2006. I talked to NetApp again in 2011 and the rep(different rep, different territory) said even in 2011 he would be *really* hard pressed to justify an evaluation unit, something to do with their internal processes or something. I don't know what HP's stance is on 3PAR evals these days, my last eval unit from them was in 2008(pre HP acq) and it was two racks of equipment. Basically we gave them a set of requirements and we agreed that we'd buy the product if it met those requirements. I suspect at least for HP given that 3PARs are much cheaper now that they would be open to evals, I know HP has *given* 3PAR arrays to some big customers absolutely free no strings attached I believe in order to provide incentive to test them out I suspect in order to try to convince folks to migrate off of P9000/9500 for their next purchase.

        I don't know what other vendor policies are, though usually the smaller startups are happy to give out eval gear. Testing is difficult though, I've never worked at a place that has had more than minimal resources to properly test something. I've never been at an org that had a test lab for infrastructure period. One of the companies I was at bought millions and millions of $ of storage, others typically 500k-1M.

        My current company we moved out of a public cloud provider to our own hosted stuff - when we evaluating what to get then we had absolutely nothing to test with. No data center, no servers nothing. Fortunately I already had experience with most of the products and we didn't need much testing. The only thing that caught us very much off guard from a storage perspective is we were assuming a typical 60-70% read ratio on the storage because we could get no reliable metrics from our cloud provider. Turns out we were over 90% write(almost all reads were coming from cache layers above storage). Fortunately the 3PAR system we bought was able to hold up with the initial configuration for about a year(good architecture) before we needed to add more resources, that % of writes is quite expensive!

        Another storage related story, going back to 2008 again, bought a 2-node 150TB 3PAR T400 with a two node Exanet cluster to replace four racks of BlueArc storage which had NAS controllers that were going EOL/EOS. When testing the evaluation 3PAR/Exanet system we were on a tight time line and were doing stuff on the fly. I asked the on site Exanet engineer if their system was thin provisioning friendly. He said yes, he had worked with Exanet on 3PAR before and they did thin provisioning fine.

        So I exported a bunch of storage to the Exanet cluster somewhere around 90TB usable. And we started testing, everything went awesome, performance far exceeded that of the previous system. We bought the stuff and migrated more production stuff onto it. The workload was very write-delete-write heavy. As time went on I saw the disk space on 3PAR going up and up and up, but Exanet holding fairly steady. Obviously NOT thin provisioning friendly(Exanet not re-using deleted blocks before allocating new). I ran some numbers and determined, oh shit, if this continues to it's logical conclusion I will exceed the capacity of my 3PAR system. The 3PAR system was 4-controller capable, and I was at the maximum physical capacity(150TB at the time) of a two node system. So if I had to add ANY more disks I needed two more controllers(big big expense). A year or two later a software update came out that allowed that generation of 2-node controllers to scale to I believe 200TB. Part of the issue was they were on a 32-bit operating system until something like 2011.

        So I ran some numbers with my 3PAR SE, and determined that if I converted the system from RAID 50 (3+1) to RAID 50 (5+1) that we would have sufficient disk space for Exanet(and others - vmware, MSSQL, MySQL) to grow into and not run out. We had six drive shelves so 5+1 made more sense anyway from an efficiency standpoint.

        I started the process, unlike a lot(all?? I don't know) of other systems this could be done on the fly without any impact to the applications. I remember talking to HDS at about the time we were going to evaluate 3PAR (HDS was partnered with BlueArc and has since acquired them) and asked them this very same question - can you change your RAID levels etc on the fly. They said yes - but you need blank disks to migrate the data to(at the time referencing their AMS2500 system which they were proposing this was back in Nov 2008 that platform was brand new at the time - it didn't even have thin provisioning yet! and HDS refused to estimate pricing for TP which was slated to be released in 6-8 months). Big caveat right there! I'm not sure how others do it.

        Anyway with 3PAR of course no blank disks are required you just need space to copy the data to(since the disks are virtualized finding available space typically isn't too hard). Pretty simple command line, basically one command per volume to migrate, you can migrate a half dozen or so and the system self throttles based on other activity on the system. I'd fire off a set of migrations and they'd take a few days to complete, as the system filled up over time there was more data to move with each set of volume changes so the time required was longer. Towards the end it was something like 2 weeks to move 7-8 volumes, so it's not like I was having to babysit the thing I'd submit some commands check on it when I was bored and after a few days/week submit the next set of commands.

        On a very heavily loaded system it took me roughly 5 months of 24/7 conversions to complete the process of probably 110-120TB of raw data. Nobody ever knew what was going on other than me giving periodic updates as to the progress of the process. I was pretty happy, it's one of the core reasons I like 3PAR and am a rabid fan/customer, if you make a mistake up front(or down the line) you can correct it pretty easily without application impact or complicated data migrations. I don't live and breathe storage by any stretch(though to some I give that impression) it is only a very small part of what I do from an ops standpoint. In this case it was a decent amount of data that had to be re-ordered which took a while but we could do it.

        So yeah couldn't afford to do a really good test there, things fell short once we moved to real production but we were able to correct the issue without any additional purchases, or downtime, or complex data migrations.

        After Exanet went bust I advocated for a NetApp V-series to replace it(2010), in part because it's SPECsfs numbers were double that of Exanet at the time, so I figured it's got to be at least as fast as a two node Exanet cluster. Again not a lot of time to test, and I was in my final weeks at the company. I left and they later deployed it, and their workload just wasn't compatible with NetApp, even though the V-series was more powerful it fell down hard(cpus pegged) and Netapp reps unwilling to help(they even threatened the customer that NetApp was pulling support for 3PAR systems - which of course did not happen).

        Another advantage to the 3PAR architecture is the customer was able to migrate from Exanet to NetApp on the same back end storage without any complex stuff going on, the back end was entirely virtualized of course so it's just a matter of allocating storage to one, and unallocating it from the other(no concept of disk based raid so both systems had access to every spindle on the then 4-node T400). I'm not sure specifically what process they went through as the bulk of that was done after I left, but had I been there I know the way I would of used and it wouldn't of had much impact at all(you still have to move the data to the other filer since they are on different file systems of course).

        I didn't know the mgmt at the customer at the time there was heavy turnover after I left, but basically HP went to them and said "we'll own the problem" and the customer with with HP instead(they bought I think a 4 node X9000 cluster to connect to their 3PAR, in that case they did no testing either). I don't know what happened after that, maybe it worked, maybe it failed. The NetApp rep(s) in that territory responsible for some of that shit left or were fired a couple years ago. I was told that even people inside NetApp did not like them, but for years felt they couldn't do much about them since they were pulling in good numbers.

        I spoke with one company who also went to 3PAR many years ago they did real testing, I got a copy of their ~60 page test guide and all the things they went through - honestly quite over kill but they are a big org. They were consolidating several mid range NetApp arrays onto a big 3PAR with NetApp V-series on the front end.

        1. benjarrell
          Thumb Up

          Re: can't afford to test?

          "I've never been at an org that had a test lab for infrastructure period." Me either.

          Nice to hear some storage war stories :)

  2. Tony Green

    Aw, Snap???

    Can someone translate into English please?

    1. Steve Knox

      Re: Aw, Snap???

      "Oh? How very unfortunate for you."

  3. M. B.

    Anyone hocking kit...

    ...should be able to arrange a demo of the product meeting the requirements, if not actually bringing you face-to-face with other clients running on the same platform. If they aren't willing to go the extra mile to actually show you the system doing what you want to see it do, I would question both the product and the partner.

  4. garetht t

    No shiller, but all filler.

    Play this game with the article - after every sentence, add the words "No shit!"

    "One way is to look for independent reviews of the product by a trusted reviewer. No shit!"

    "For example, product reviewers paid by a media outlet should be more independent than ones paid by the product's vendor. No shit!"

    "It is, to be frank, inconceivable that a vendor would pay for a product review and publish the thing if it was negative[.] No shit!"

    "A review of a product's claimed specifications using supplier-provided information is not enough on its own to justify a purchase. No shit!"

    "All such reviews or product comparisons should be discounted unless there are independent reviews of the product. No shit!"

    I hope I have, like the article, or a small asian child whittling at twenty-foot high bamboo, made my simple point at quite unjustifiable length.

  5. DeepStorage

    Test Labs are Expensive to Run/Maintain

    Back in the 20th Century I wrote for PC Magazine and Network Computing when they could afford to pay well into six figures of dollars year on maintaining their labs. I also spent 25 years consulting to midmarket companies that regularly spent $50,000 a year on storage. None could afford a lab at all.

    I run my lab on a shoestring getting donations of gear from vendors and buying other gear used from liquidators on eBay. The lab still costs over $50,000 a year to keep up and running. Rent on space for 5 racks worth of gear, $15,000 for air conditioning installation and electricians and of course a constant flow of new gear.

    I'm currently running a 250 user VDI benchmark, that takes a half dozen servers with 96GB of RAM each.

    Yes I do reviews for vendors, yes when it's turned out their product didn't work as expected the project was canceled. I count myself lucky that DeepStorage is small enough that I decide I generally like a product before I even pitch the vendor to test it.

    You can believe everything a reputable analyst like those Chris mentioned, thanks Chris, tells you. You can't trust what you only think we say.

    I've been lucky that my clients realize that "We'd really like to see feature X that's currently lacking in a future version" adds to the credibility when we say the product does Y and Z well.

    - Howard

  6. storman

    Always Trust, but Verify

    Testing of storage systems in real world scenarios has always been a tough problem. SMBs and SMEs simply can't afford to build test labs. They have to rely on vendor claims, "independent" reports, and published benchmarks like SPEC SFS and peak IOPS, which as Nate correctly points out are simply not representative of typical installed production workloads and often grossly misleading. For the mid to larger IT organizations, which I define as those that spend $Ms per year on storage, a trust, but verify strategy that includes a test lab is essential. Making a performance assumption that results in under-provisioning could decrease company revenues and unnecessarily over-provisioning just by 20% can easily waste many hundreds of thousands or millions of dollars. No company can easily afford to waste such money. For this class of IT customer, vendors will nearly always provide an evaluation unit for a POC. If not, then they are not serious about earning your business. What is needed is a solution that can help you understand the I/O profile of your existing production application workloads to great detail and then allow you to use them (in a truly realistic workload model) to evaluate the new vendors, products, technologies, and configurations in a test lab. This is where companies like Load DynamiX are extremely valuable. They offer a simple to use workload modeling and storage performance validation system that is 100% vendor independent that enables users to run tests based on the characteristics of their production workloads using a load generator. Storage architects can now have the data to make key decisions to align purchases and deployment decisions directly to performance requirements. The power is now shifted back to the users and away from the vendors. Trust, but verify is the key to assuring performance and controlling runaway storage costs.

This topic is closed for new posts.