back to article OCP supporters hit back over testing claims – but there's dissent in the ranks

Open Compute Project aficionados did not like our story about its allegedly insufficient hardware testing procedures and said so, publicly and loudly. “Terrible journalism” whinged one, adding it was an attack on the entire open source movement. Is this fair? It seems a can of worms has been opened. Cole Crawford, the …

  1. Lionel Baden

    OCP

    I was looking for more information on robocop !!

    Imagine my disappointment, on a Friday, that in there was a normal news story !

    1. asdf

      Re: OCP

      You have 20 seconds to comply.

  2. Trollslayer

    Correct acronym

    So they will design server motherboards but not test for signal integrity?

    I bet any problems will be blamed on people manufacturing the boards.

  3. asdf

    >“Terrible journalism” whinged one, adding it was an attack on the entire open source movement.

    No the OCP implementation is an attack on the open source movement. Their incompetence has the potential to reflect badly on the entire OSS world (ie all OSS is amateurish, half ass about quality, etc), to those who don't know better or the difference. In practice a lot of open source is developed by professional developers being paid to develop it following industry standard best practices. Something these fools haven't heard of it sounds like.

    1. asdf

      correction

      After more research its quite obvious this project is the equivalent of buying knock off goods from a street vendor. The quality is complete crap but it almost looks like the real thing and cost pennies on the dollar (so you can buy 5 spares). There may be a place for this but it looks like it is getting marketed where it shouldn't and worse it does reflect badly in general on OSS which does produce some world class software.

  4. jaybeez

    Cole is Delusional

    To Cole (not the Register),

    I could tear your entire article apart, but it's painfully obvious that you know nothing about hardware quality or reliability. And that seems to be true of OCP in general. And if your hardware is crap, you cannot simply abstract this away with software. Your notion of a cloud run on cell phones is not even in the same universe here. OCP has been given the tremendous responsibility of the certification of enterprise servers, switches, and apparently storage.

    However, the most delusional comment from you was here:

    "This is almost always initiated by a benevolent dictator. Linux was given to us by Linus Torvalds."

    Linus was and is a tech genius. And he has stayed on and is still the man when it comes to the Linux kernel. And the people that are in the upper echelons of kernel development are all hard core geniuses too. These guys are perpetually testing their code, and they are damn good at what they do.

    OCP on the other hand was started by people that have left a long time ago. The founders don't appear to have any skin in the game. And recent news on the lack of testing from your officially endorsed testing centers (and weak test plans) have shown the world that they are not even at the level of a rubber stamp. Or do you guys simply keep your best talent hidden away in your not so open world?

    And could you provide some links when hundreds of racks of Facebook or Google servers went down without disruption or noticeable performance problems? That's a bold claim, too.

    I applaud the Register for their first article. And your time would be better spent fixing the problems at OCP, because I doubt any intelligent human being would read your blog and concur with your thinking.

    1. Coleinthecloud

      Re: Cole is Delusional

      Cole here.

      Feel free to call me delusional but I can't imagine any company the size of Facebook not testing the gear that goes into their data center. Btw maybe you saw the news about the 1 billion dollar data center Facebook is building in Fort Worth? I'd put money on Open Compute being deployed there.

      You said the founders have all left? Seems to me Facebook, Rackspace, Goldman Sachs and Intel were all founders and last I checked they were still all on the board.

      It would be bad business for any ODM (or OEM) to sell and deliver untested / uncertified hardware considering the SLA / OLA / RMA agreements in place. Again there is a discrepancy on what OCP certification was intended to do and what was reported.

      If you like the principles of open source and value open hardware go help OCP make certification better.

      I'm happy to be called a fool. I'm rather indifferent on whether it makes wise or not in this case. I jump out of planes often though so the basis for comparison might already be skewed. =)

      1. Vic

        Re: Cole is Delusional

        I'm happy to be called a fool.

        Phew! That's lucky, then.

        Vic.

      2. Trevor_Pott Gold badge

        Re: Cole is Delusional

        Feel free to call me delusional but I can't imagine any company the size of Facebook not testing the gear that goes into their data center

        What, what whaaaaaaaaaaaaaaaaaaaaaaaaaaaaaat? You are expecting each company that buys OCP hardware to do their own tier 1 class testing? What?

        How does that make sense at all? OCP is about driving down cost. Testing is a cost that should be centralized so that it doesn't have to be replicated.

        More to the point: lots of companies that aren't the size of Facebook have come to rely on OCP gear. What are you doing to the open compute project? Why are you doing it?

        1. Probie

          Re: Cole is Delusional

          Trevor, Its rare I agree with Cole enough to write about things, but I have to break a long standing tradition of saying sweet FA, and now I have to publicly agree with Cole. At least on this one.

          My name is James Hesketh and I was there at Rackspace as part of the founding member team, I am the guy who chaired the Virtual IO project and I left a while ago. The great opportunity I personally saw in Open Compute is the one Trevor champions, its the one where the little guy gets to use Open Compute, sadly for me the project is not there now and I doubt it ever will be. But that is not Cole's or Open Compute Projects problem. There are some points from the founding that probably need to be revisited?

          1) Driving down cost is an important aspect but WHERE do you drive down the cost? Open Compute was about reduced cost by efficiency, not primarily about reduced cost at point of purchase. This leads to a long debate on how IT equipment is produced, Talk to the ODM's about how stuff is produced.

          2) How do you do hardware and testing on open hardware - an just for shits and giggles think about the testing argument you waded in on with Nutanix and VMware. Open servers have proprietary shit in them too. Do you think OEM's place all the testing our there in the open or shut the doors on a whole bunch o' crap? When you do certification what do you, do it on different vendor hard disks, different raid controllers, nics, memory, just where do you draw the line? And do not get me started about firmware updates and the whole cycle of testing. ANY ODM could make the motherboard so to be comprehensive testing and not biased how many configurations do you test?

          3) i) Costs I mention costs because with certification there comes LIABILITY. You said "this" will work with "that", who has liability for what?, Open Compute is a overarching body, its a council it does not fullfill purchase orders. ii) Not to mention who the hell pays for a full tier1 testing regime? You the end customer? You want that cost added to the purchase price?

          All that said Barbara raises good points and ones cannot be brushed away, but Barbara's point does nothing to address the points above, it compounds them, even the remedy compounds them if undertaken by the Open Compute Project. It has to be done outside of the Open Compute project main body, because of the Open Compute ecosystem.

          Picture this, in a chain of companies that create and fullfill an order for "something" open source where is the "value"?, value can mean a lot of things, but in this case I mean "where is the bit that I can say here is something that you Mr customer value that I can assign a monetary value to". Arguably in open compute deeper testing /certification can be provided in the supply chain most likley at the system integrator, and at a "cost" if that is what is desired, or as an end customer I can forgo that and do it myself.

          Trevor, I read your articles regularly and generally agree with a large percentage of what you right, and in my heart of hearts I truly wish "this" certification and testing was up to Tier1 standards so everyone could use it without fear, but that does not come free and the cost of filling in the "gaps" falls to someone. So if you really want to have an OCP certification standard that is equal or better to an OEM standard then by all means ask for it, but expect to pay for it, and not just fiscally either. I trust you are wise enough to see the parallel with Cole's comment about facebook doing testing.

          As I have said in the past "buyer beware". Holy shit look at the difference between Red Hat and Centos or Debian and Ubutnu, or Suse Linux and Open Suse etc ....This is philosophically no different, seriously why do people by Red Hat support subscriptions when they could run Centos ?

          To Cole's other point, it is easy to be critical or castigate from afar and do little else (i personally have not seen you do it). but I have seen the rant of open source software folks on this comment list. How many times have people had to change something just to make it "functional" because it was written for a particular flavour of Linux? I know I have had to do it for Community editions of software that have a "enterprise - pay me for it" edition frequently I contribute the change back somehow and make the world just infinitesimal amount better off. Of course that is the Open Source way.

          Open Compute stuff is not ephemeral, it is not code, it is physical it needs to justify the expense of companies developing and building for orders. It has to show a return, it somehow has to conform to a degree of business economics. In the end the community vote with the wallet, if there is no wallet for Open Compute then Open Compute Project dies on the vine.

          1. Trevor_Pott Gold badge

            Re: Cole is Delusional

            I grok the liability argument, I really do...but I think centralized testing is core and critical to economies of scale. There has to be a balance between "certifying everything works together" and "meh, I hope it all goes to plan".

            I think that balance rests on testing for established standards. E.G. meeting JEDEC standards for your memory channels/traces/controllers/etc.

            In a perfect world I envision the OCP as essentially becoming the "reference implementation" of various hardware standards. If your RAM doesn't work in an OCP box then chances are you screwed up and didn't meet spec because OCP verified that their widgetry meets the published specs.

            The other side of it is that if the testing is to be left up to the customer than I think those folks behind OCP should open source testing tools relevant to all elements as well as procedures for using them/expected results for the tests. This would let any tom dick and harry assemble OCP gear, select parts from various suppliers and verify it all works to plan before ordering it by the datacenter load.

            If OCP is to be just some plans for someone to (apparently badly) put together some motherboards then what's the point? It becomes something you can't trust to do the job and ultimately doesn't drive down the costs, because instead of centralising the costs of testing, verification and R&D those costs have to now be replicated by each and every company implementing OCP systems!

            Lots of companies don't feel the need for the liability portion of the equation to be taken by the vendor. And, to be frank, that's a huge part of the cost. But making sure that at least basic quality is dealt with and that testing R&D is central and open is essential.

            The OCP doesn't have to be "a cheap Tier 1" vendor. We have Supermicro for that. But OCP should also be more than a PR exercise or a way to offload hardware engineering on "the community". The community will contribute back if there is a great base to start from. That starts at verifying standards compliance and making available the ecosystem of testing tools and procedures required for companies to do testing in house.

            At least, that's my take on it. I understand entirely that others may well see it differently.

            1. Probie

              Re: Cole is Delusional

              Its hard not to agree with you on the quality of a major design element such as a server motherboard or storage backplane, and honestly if Open Compute "could" publish everything then it would make life a lot easier, but not being able to publish propriety stuff makes that impossible. This is why I think Barbara's point resonates so much.

              When I suggested the SI in the supply chain look at taking that burden, it was because I know that is a viable alternative. Honestly if the SI's got their shit together they could then pool the results to the community at large. That would solve both problems. Of course there would be "someone footing" the bill, but likely that would be a large anchor customer. Example "facebook" for a specific run of motherboard or storage etc ... but its an enabler and way over the hump and its an embryo of an idea. Why not run it by the OCP board, either Frank, Mark or Andy. From the sounds of it Barbara did not have much luck with C and I project lead and for that reason I would leap frog the innovation committee.

              And when I say "why not run it by" I mean could El Reg run an opinion piece? That would be a way of putting 2 pennies in for all the little guys.

              1. Trevor_Pott Gold badge

                Re: Cole is Delusional

                "why not run it by" I mean could El Reg run an opinion piece?

                Well, of course El Reg can. It becomes a question of who is qualified to write it. If you wanted to write something I could get you in touch with the relevant people to see about a guest piece.

                As for myself, the truth is that I don't know enough about all the nooks and crannies of this just yet to open my big mouth in print. There's a lot of research to be done and many opinions and views to gather before I weigh in.

                OCP is a different world from the one I normally inhabit. Perhaps more to the point VMworld is a month and a half on fire and the vendors are on fire and their content is on fire and I'm on fire and everything's on fire and air travel is hell. I'm full up for the next while and don't have time to learn a whole new world until after the big game. (I just learned OpenStack and am putting my free time to SDN/NFV at the moment.)

                I think the problems presented are deep and complex. They deserve a full research and analysis treatment. Ideally, I'd like to see the OCP become much more important and central to the how we all procure IT, and I fear that going off half-cocked writing about it could do far more harm than good.

                1. Probie

                  Re: Cole is Delusional

                  Me write a piece, have you seen my English. It's like I was taught by a weasel. !! Holy crap if you saw my OCP charter you would not ask!!

                  Honestly I am not sure this is technical in nature. I think It's more about listening to the community around them, and not just the signed up members but also the nascent rumble from everyone else.

                  As I have hinted at right now I think OCP is in mass deployment only mode. That is a hard view to change. Also I have no context to what really made the anonymous test engineer to break cover. One thing I am sure of though, you are not going to break how IT procurement is done. The SI's such as penguin, quanta etc ... see the opportunity. It's about customer momentum and how to make that happen.

                  That said 'mouth and money'. I can see what's involved I writing something if you are game...

                  1. Trevor_Pott Gold badge

                    Re: Cole is Delusional

                    For guest pieces, content is more important than style and form.

                    As for not being technical in nature, I agree (to a limited extent). That said, there's still a lot of research. We're treading ground that vendors have to walk, so how do they do it, and why? What corners do they cut? What lessons have they learned? Can OCP implement the difficult stuff and leave the easy stuff as a todo for buyers?

                    And if OCP doesn't move downmarket beyond Facebook-class deployments, what's the relevance? There are only a handful of Facebook-class entities that will ever exist at any one time, and I'm not remotely sure that systems integrators have the capability to take up the slack.

                    If they do, what's in it for them? What's the business case for them to do so? Will it help save them in the face of the public cloud, or just draw out an inevitable painful death?

                    At the same time, large vendors are moving towards massive "black-box" vertically integrated endgame machines. Is OCP - and for that matter systems integrators - relevant in the fact of that sort of market shift?

                    As developers cut their teeth on cloud tech (private, public and hybrid) first, is OCP still relevant? Will regular enterprises even be able to field sysadmin teams and developers who code to anything other than the black-box style clouds?

                    And these are just the questions off the top of my head.

                    1. Probie

                      Re: Cole is Delusional

                      Hi Trevor,

                      I do not have the spare time to do all that research, I am like you on fire, but for different reasons, thankfully though I have managed to quit the flying. You make really good points, some of which I hope I answered in my reply to jaybeez.

                      I think OCP stays relevant for the moment at least, but maybe not be visible, especially if it is only large - hyper-scale deployments.

                      Black boxes will have a place such as hyper converged systems, but in truth they only scale out so far. They are great for the SMB, and the smallish enterprise that wants ease of use etc ...

                      A main reason I walked away from OCP was because in the end I could not see a way of making it viable for the small companies. I hope that someone else can prove me wrong.

                2. jaybeez

                  Re: Cole is Delusional

                  Hi James,

                  Glad to know about your background and internal perspective. And you seem much more rational/technical than the overly emotional Cole. As an outsider, I wonder if you were around during the golden (early stages of OCP) years and the problems being raised by this article are more recent. From a quick review of what's publicly available, it looks like OCP certification centers were heavily pumped in early 2014. In 2015, however, it appears as though the director of ITRI and OCP certification resigned and UTSA has been doing it's best to remove links of the work they did in 2014. The only publicly certified OCP system from 2015 that I could find is from ITRI.

                  It seemed as though there was big fanfare, possibly millions invested, ribbon cutting ceremonies and no results to show, at least not in recent times. And even Cole acknowledges that Facebook must test systems before deployment. I mean, why have OCP certification if it's lacking in technical merit?

                  So, all emotion aside, isn't this respectable proof that the original article was mostly correct? That OCP testing appears (externally) as AWOL. And if not, where's the proof of life?

                  I'm 100% behind open source movements, but I agree with asdf, OCP is not yet of the standards of other open source such as the Linux foundation. And, from my quick research, it does not appear to be nearly as open.

                  1. Probie

                    Re: Cole is Delusional

                    Hi jaybeez,

                    Yes I was around at the start from the golden years up to the Santa Clara Open Compute summit. so 2013 officially in the project, but worked with an SI (AVNET) for another year or so. I also know Yf Juan, the ex director of ITRI you are referring to and was in Asia for long enough (about 6 months) when the chapters were being formed for Taiwan, and yes I know Paul Rad as well (UTSA).

                    The Centres were an answer to a problem, the problem is not the one that most people are commenting on though. First of all you will have to suspend disbelief for a second and realize that Open Source software is nothing like open source hardware, and second stay with me through a rambling explanation.

                    For a start (and as the main driver for the bitching point here) the licenses used to govern contributions and manufacture are taken from the open source world, where software is can be 100% open source, or if not can be compartmented up enough that a license granting use or non assert clauses does not over reach into proprietary code. E.G. Apache License (ASF). There is no corollary to that that I know of in the hardware world. So Open compute uses the OWFa license 1.0 which grants no assert rights but not transfer of ownership. E.G. "Bob" makes hardware uses OWFa 1.0 I can make the hardware exactly the same as "Bob" defined, but I am not allowed any deviation from it because I do not own the IP, but as long as i make the hardware the way "Bob" defined then he cannot bring a lawsuit against me. At least that is how I understand it to be. Apply that logic to a project say a motherboard and see who owns what. Add to that a fear (reasonable in some cases) that to publish any technical details would be to open up Pandora's box on your IP and you can see a) why OWF was adopted and b) why publishing detailed technical information on a component is scarce. Rather it is easier to publish specifications that force a particular way of doing things, generally only ONE way of doing things.

                    So in a way we have an openish thing with a black box core as a result. Because to get to a meaningful state where you can understand this thing you have had to sign NDA's with ODM's and other manufacturers. Remember the specification does not say HOW you do things, just what it has to be at the end.

                    Now jet back to the start of open compute and remember that this is a project for hypersccale deployments. ASDF has a couple of things wrong in my opinion, a) nearly all public clouds of note certainly large ones run on some sort of "bespoke" hardware. b) most users do not give a crap about that, they care about there workload running not what makes it run, as long as there is an SLA they feel fine, the same can probably be said of large big data farms as well I mean do you think AWS run around changing failed drives every second?. So having Open sourced a specification that works for hyperscale deployments where substantial amounts of money can be thrown at hardware by in house testing teams or contracted testing teams what do we do to say ODM A produces something specification equivalent to ODM B? Or that servers from ODM A work with Knox JBODS from ODM B ? We cannot open source testing for components because we most likely have an NDA against them, we cannot open source the IP, the only thing we can do is provide tools to test against the specification, or at least that is all we can do from within side Open Compute as a foundation.

                    Independent testing labs can/could go a lot further, but and here is the catch it still requires the IP holders consent, and then we also get into that cost exercise I described in an above post.

                    So considering what the use case is Hyperscale and considering the tools at our disposal to help the community we have two wildly diverging points of view.

                    If I then overlayed how ODM's will make the run for motherboards or backplanes, say 20,000 in a month, and you need that volume to swallow the tooling costs for the production line (including any potentially lost revenue to the ODM by holding up other production if you need it in a rush). No ODM wants to make 20,000 motherboards just for them to sit in a warehouse, you need to take delivery of the 20,000 motherboards. This tends to put the non-hyperscale guy out of the purchasing equation.

                    So I go back to my two diverging points of view and only one becomes relevant.

                    Hopefully you can see now where a) OCP is applicable, b) why the certification is what it is c) why this is not prime time for non hyperscale.

                    That's not an end though, because OCP is also supposed to foster innovation, and innovation can trickle down to the little person or the non-hyperscale person. I have seen precious little of that myself in the last year or so, but then I have not tried to find it either, it seems happy to just "bimble along" in that sense ASDF is right. What I can say though is that ODM's have seen a path to take more of the "value" out of the supply chain, example Quanta with QCT, and OEM's have responded with HP whiteboxes by Foxconn and Dell with DCS (although they were around before OCP). We have seen innovation from storage companies as well think closely coupled compute and storage, seagate Kinetic was back ported to Knox i think.

                    So all in all I have to say OCP has been somewhat successful on what IT said it was going to deliver and not what people HOPE it is going to deliver., So Cole's main points stand, although I may not agree with the method of elucidation or self flagellation around how the points were conveyed. Also I think there are things OCP could do a lot better, it could be more communicative. It could take on board the internal and external feedback better. I am sad that OCP is still at that juncture - something that I saw in 2012-13, bit it is what it is. The problems are complex (perhaps needlessly) and what I have descried above is like pealing a layer of paint, there is still more underneath that. So most of the OCP folks do what I consider and admirable effort in trying to keep all appeased, through I agree the outliers sometimes they need a taser up the backside.

                    As for the external testing, well the answer may well be wrapped up in the explanation above. Proof of life (if you can call this proof of life) is using the taser and seeing if the C and I chair squeaks or smokes.

                    Now to keep all the lawyers happy, this is my personal subjective view, treat it as a hypothetical conversation if you like.

  5. Henry Wertz 1 Gold badge

    Agree with both

    I agree with both... I agree with Cole Crawford's quote (you'll see in the recent article about the OCP testing that I said almost the same thing), that the OCP hardware is designed for clusters or "clouds" where you want to get hardware at the best possible cost, without extra faff that is unnecessary for that kind of cluster... whereas for systems that demand hardware fault tolerance, the "extra faff" provides this and you'll want to pay for it.

    On the other hand, I also agree with Barbara Aichinger, even if the base OCP certification does not have her recommendations, perhaps there should be a higher OCP certification available that does. The tests she recommends do sound like a good idea.

    I do think characterizing this as an "attack" on OCP or whatever is hyperbole though.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon