back to article vBlock user says EMC bug slipped through VCE's matrix

The Cisco/EMC/VMware/Intel lovechild VCE has a simple schtick: the boxed-up rigs of hardware and software it sells are sold in configurations that have been documented and tested to the last detail. As the company told us by email “we commit to delivering Systems that have been engineered, tested and certified as one.” VCE is …

  1. Nate Amsden

    known bugs

    I remember a LONG time ago I was at a company that used BEA Weblogic, and we had a lot of problems with JMS. One of our top engineers engaged BEA and asked them if they had any outstanding issues with that subsystem, they said no.

    About 3 months later we had about 36 hours of downtime, due to JMS in weblogic. We "worked around it" by deleting all of the data in the JMS queues(last resort). After about 2-3 weeks of investigation the teams determined not only was it a known bug, but BEA had a fix for it all along and did not tell us even when we explicitly asked. It was their policy at the time not to disclose bugs to customers unless the customer was experiencing that specific problem.

    The customer of ours that was down for 36 hours was one of the largest telcos in the world also a big customer of BEAs directly. BEA changed their support policies very quickly(at least for us) after that.

    It was an interesting experience to be in my then boss' office with about a dozen senior software engineers and architects and all of them shrugging "I have no idea why it is broken, it shouldn't be doing this".

    I talk to people since and they say "oh my god we've had a 2 hour outage! the horror!" pfft. Wake me up when you've been up for 24 hours straight and still don't know what the problem is.

  2. Trevor_Pott Gold badge

    None if this would ever have happened if he weren't using some fly-by-night high-risk startup. Proper enterprise vendors with proper enterprise support is what's needed to prevent these sort of things from happening!

    You know...I can't even type that with a straight face anymore. I am going to print this article, roll it into a tube and beat the next person who talks about how Nutanix SimliVity or Maxta aren't "proper" vendors to within a micron of their cognitive capacity.

  3. Anonymous Coward
    Anonymous Coward

    Make sure to buy a converged stack where you get as much equipment as possible (storage, compute, network) from the same vendor.

    HDS does storage and compute, Cisco has the opportunity to do everything but I don't know if they have any tight integrations on the way. IBM sold the servers, so only power left.

    1. Konstantin

      You forgot HP. And suddenly Huawei got full HW stack plus virtualization.

  4. Erik4872

    Fun with integration...

    This kind of stuff is actually my job (software/hardware/systems integration.) I don't work for VCE, but I deal a lot with this all-too-typical situation. It's often a huge headache being the "make this stuff work together guy" and adding multiple vendors to the mix all blaming each other just takes it to a whole new level. This job does keep me out of 24/7 ops mode, however I can't tell you how many hours have been spent literally refereeing vendor fights over late night conference calls.

    The problem with these integrated stack vendors is that, often, they're so big and unwieldy that Group A who provides the storage array firmware doesn't know that Group B just changed the iSCSI NIC firmware to a rev that's incompatible with Group C's latest compute node hardware rev. The problem rolls up at a human-visible level as "server can't see the storage array" and it takes a lot of troubleshooting to walk back through the entire connectivity tree. So when you get an Oracle Exadata stack, or a VBlock, you get "This recipe worked when we shipped the units." It's slightly easier than trying to marry an HP blade system with a NetApp filer over Juniper switchgear, but you can often run into the exact same problem. If you run into a problem in between recipe releases....that's where the whole "converged system" thing breaks down.

    Sometimes the IT exec crowd doesn't realize that there are humans at these companies doing all this work behind the scenes. Humans make typos in documents. Humans also can't test every single little corner case. And when things blow up, you're still relying on humans (a combo of yours and the vendor's) to sit down and figure out what needs to be fixed. One thing that people don't realize is that the squeeze on salaries and entry level IT work dries up the pipeline for truly good systems people. I'm no genius and would never claim to be a "rockstar" or other idiotic term. but doing these kinds of systems integration tasks does require a highly developed troubleshooting skill set. It takes someone with a lot of experience to pull apart a mess and figure out what broke without making the situation worse or losing customer data.

    A vendor can sell you a rack-in-the-box, but they need to back that up with talented integration people...and nothing is foolproof. Stuff like this will always happen.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon