back to article VSANs choking on VMware's recommended components

VMware has changed the recipe for its virtual storage area networks (VSANs) after some components it recommended were found out as not being up to the job. Virtzilla's notification of the change says it is being made because some “low-end IO controllers” it once recommended “offer very low IO throughput”. So low, in fact, …

  1. Voland's right hand Silver badge

    Bad software engineering

    That is exactly why a (V)SAN should be able to support rate throttling and andvanced queue management for IO operations in software before it even hits the controller. As any network engineer will tell you, throwing "more bandwidth" at a congestion problem is only a temporary replacement for active queue and congestion management. Storage is no different (regarldess of how much does it love to pretend that it is).

    Take some slightly bigger server(s), throw some more cores in and voila, you just managed to congest that controller queue once more on "supported" hardware.

    1. Random Q Hacker

      Re: Bad software engineering

      QoS is like swap: if it's at all active, you didn't size your system appropriately. In this case, you would have to QoS the rebuild of an array to take hours or days longer. If you're comfortable with that, then I'm glad I don't work at your IT shop.

      Not surprised to see Cisco on that list, UCS sucks.

      1. Anonymous Coward
        Anonymous Coward

        Re: Bad software engineering (@ Random Q Hacker)

        Well, better analogy would be with CPU overload, as QoS pertains to throughput. And you definitely don't buy your systems with fifty times the average load in mind just to handle every possible peak, because you'd go bust. Especially considering licenses and for storage available IOPs, which are prohibitive once we pass some small tens-of-thousand IOPS implementations.

        So you have workload management in place to throttle down resource consumption enabling coexistence of applications using shared resource, especially when applications are written in such a way that they consume all available performance - operating systems behave this way. Moreover, software bug may wreak havoc if not compartmentalized.

        To paraphrase, barring small implementations, NOT using QoS is bad design. And the commenter before you didn't mention throttling rebuild rate, but IO in shared resource environment in general. And you should know that rebuild rate is always throttled on controller, otherwise you'd have completely DoS'ed yourself for the whole time of the rebuild.

        Sure, QoS is just one part of the shared environment management, but dissing it is just dumb.

      2. Captain Scarlet
        Coffee/keyboard

        Re: Bad software engineering

        I think you will find the majority of makers are on the list (Although I don't see Supermicro).

  2. JonW

    This makes their entire "Certified" program look very questionable - what *exactly* does that testing regime involve? I/O is one of the easier things to test, imho....

  3. Anonymous Coward
    Anonymous Coward

    Calls into question "Certified"...

    Not just that but their entire "we have extensively tested therefore we know we're right" ethos with regards to straying from default values. Those of us exposed to the failure of that ethos on a daily basis knew there was something to be called into question.

  4. Trevor_Pott Gold badge

    What I fail to understand is why VSAN is so sensitive. Maxta runs like greased lightning on even marginal hardware.

    1. Anonymous Coward
      Anonymous Coward

      Clearly a failure of certification/testing by VMware. Embarrassing. I was surprised to see those controllers in the original HCL to start with.

      However, suggesting that "System X runs like greased lightning" on a controller with a queue depth of 32 shows gross ignorance of basic system (and storage) principles. Except if you use System X as a toy.

      Also, per the original reddit thread, vSAN apparently throttles back rebuild traffic but not down to zero. Their argument is that they want to make some progress with the rebuild to ensure bounded time for re-protecting the affected data. I can see the argument. But, again, they did not test rebuild workloads with those controllers, obviously. Also lack of sizing guidelines.

      1. Trevor_Pott Gold badge

        Depends on what you define as "a toy", I suppose. "Able to handle 125 VDI instances per node without complaint" seems reasonable to me. Alternately "200+VMs across 4 nodes that run a mix of workloads ranging from Exchange to SQL to VDI to image render engines" seems not toy-like to me.

        Now, to be perfectly honest, I haven't run Maxta against those controllers. Maybe Maxta would run like a dog on them too. I do know that I don't need hardware controllers to run Maxta, and it does a damned fine job using AHCI SATA 7200 rpm spindles + SSDs. It handles all the I/O I want to throw at it on a per-node basis right up to the point that I run out of RAM.

        Now, it's obviously an open question how each of these will behave when we start talking about setups that run $25K+ per node just for the hardware. I can't answer that. But I do know that Maxta runs real-world production workloads just fine on some rather weedy hardware using configs that VMware officially pooh-poohs for their own offering.

        It's all about what you're optimizing for. Are you optimizing for IOmeter and SPC-2 benchmarks, or for workloads that are ridiculously latency sensitive that only 0.000002% of the world actually employ? Or are you optimizing for the kinds of workloads (and disk/CPU/RAM balances) utilized by 80% of the world's businesses?

        So yeah, some of this shouldn't have made it onto the HCL...but there are things that got pulled from the HCL (or never made it on) that also raise huge questions about what the merry hob VSAN is doing that it can't get workable performance out of the same hardware that is used by Nutanix (certain LSI controllers), or Maxta (AHCI).

        It makes me ask uncomfortable questions. Like "is the only thing that VSAN has to offer a series of high benchmarks, and even then only when used on the absolute best of the best hardware"? How does VSAN work with the kinds of hardware normal people and mundane businesses can actually afford, or already have to hand?

        If asking those questions, instead of blindly lapping up marketing tripe and praising a solution that is twice as expensive as others competing offerings is "not understanding storage basis" I'm remarkably cool with that.

        "How many IOPS do you need" is just as important a question as "how many IOPS can this solution deliver." And what's really of interest is the lovely question "why can X deliver Y IOPS on Z hardware, but W cannot?"

        And, quite frankly, I don't care who gets upset when I ask those questions.

  5. Anonymous Coward
    Anonymous Coward

    Clearly a failure of certification/testing by VMware. Embarrassing. I was surprised to see those controllers in the original HCL to start with.

    However, suggesting that "System X runs like greased lightning" on a controller with a queue depth of 32 shows gross ignorance of basic system (and storage) principles. Except if you use System X as a toy.

    Also, per the original reddit thread, vSAN apparently throttles back rebuild traffic but not down to zero. Their argument is that they want to make some progress with the rebuild to ensure bounded time for re-protecting the affected data. I can see the argument. But, again, they did not test rebuild workloads with those controllers, obviously. Also they provide no sizing guidelines. How is one supposed to figure what hardware to choose and how many components.

  6. markkulacz

    The RDT driver is no replacement for Infiniband

    Note - I am an employee of NetApp, but my thoughts here are my own and do not represent those of NetApp.

    VSAN would be better off, with respect to the cluster interconnect, if it used a true Infiniband physical cluster interconnect, like Isilon. Using a layered "RDT" driver in the IO stack of the interconnect to abstract 1Gb Ethernet into something that provides the reliable data transport of InfiniBand is the root of many of the problems that the HCL is trying to fix. Even VMware has commented in the past that Ethernet over InfiniBand is superior to over Ethernet. The 10GbE interconnect will help, but Infiniband (or FC or RapidIO) is the correct way to go for a #tightly-coupled# scalable storage cluster providing synchronous block IO transactions on the interconnect. 10GbE is an exceptionally capable cluster interconnect for more loosely associated storage clusters (Clustered ONTAP), or storage clusters which tend to involve less small transactional IO (such as append+read only Hadoop).

    1. Trevor_Pott Gold badge

      Re: The RDT driver is no replacement for Infiniband

      You know, this could be why I have so much luck with server SANs. I stopped using anything but 10GbE ages ago. You can get 24 ports of 10GbE for $5K from Netgear now. There's just no excuse.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like