back to article Open Compute Project testing is a 'complete and total joke'

Facebook's Open Compute Project testing is sub-standard and doesn't follow well-established industry procedures, according to The Register's sources. The Open Compute Project (OCP) was formed in 2011 and involves the Facebook-initiated design of bare-bones computer equipment that can supposedly be built, installed and operated …

  1. Henry Wertz 1 Gold badge

    I don't think these guys understand...

    I don't think these guys understand what this cert is for. If your "cloud" is servers running VMs, you would likely want enterprise reliability. If you are running software that cannot be checkpointed well (so failure is serious, rather than just resuming off some checkpoint), you'd want reliability.

    This is not for that kind of software. This is for the Google and Facebook type of "cloud", which assumes failures WILL happen -- the software performs data integrity checks are performed, data is stored at least in duplicate, and it tolerates a system failing, resuming whatever work it was doing on another system. In this case, I would not want to spend on enterprise reliability features.

    1. razorfishsl

      Re: I don't think these guys understand...

      I think you are confused........

      This crap is being speced for use in cloud systems, it is hardly gamers and porn addicts using such cloud systems.

      These cloud systems are being offered to big business by sales staff who do not give a toss about how it is implemented.

      Software checks data integrity does it ?

      At what level?

      You going to check every byte that comes over the wire with software are you?

      Are you even aware of how difficult 'de-dupe' is in software, your proposing running all the checksums and data integrality in software are you?

      what happens when your program gets corrupted, you going to correct bit errors in ram the same way ?

      There is a reason you PAY for a certain level of hardware, bit like the difference between an Intel network chip and one of those cheap offerings from some far eastern supplier.

    2. fajensen
      FAIL

      Re: I don't think these guys understand...

      "What this cert is for": Its just some parasitic business trying to carve out a place for itself inside a running value chain where it can then extract rent from the flow; nobody cares about the application or usefulness of this "certification" - if money comes in for doing it and sticking a sticker on a certain brand of hardware for ever, then it "works".

      We see a lot of similar business models these days. Value creation must be under some pressure.

  2. Henry Wertz 1 Gold badge

    I question...

    I'm not sure what Goldman Sachs etc. need. I would HOPE they would value full reliability... BUT, I remember hearing years ago about the high frequency trading systems they employ beginning to use significantly overclocked CPUs -- the goal seemed to simply be raw speed, to be faster than the other HFT competitors, rather than any view toward reliability. (After all, when these systems HAVE malfunctioned and lost them money, for no reason I can determine the stock exchange has obligingly rolled back their trades for them.)

    1. razorfishsl

      Re: I question...

      They are unlikely to use it for HFT

      They way HFT went years ago was to use byte stream inspection by FPGA.

      Why pull a byte stream thru a NW card , up a hw stack, transfer over PCI-E/X bus, up a sw stack,thru an os, validate , into your Sw package to compare data , to send an order back down the same route.

      far smarter to have an FPGA looking directly at the data on the wire(NO stacks), triggering a string of and/or gates based on STREAMED REALTIME bit values, to trigger a send routine of gates putting a stream of bit values DIRECTLY back onto a wire, BEFORE the data has ever reached your competitor!!!!.......(ns-us)

      Then have a 'slow' front end to setup the inspection values....(ms)

      You can even trigger a massive sell/buy, then cancel the order before anyone has had time to react...or in some cases even receive the initial data you used to trigger the action..

      Flashcrash= FPGA systems in a 'perfet storm' wanking pud pulling competition, before the humans can react.

  3. Anonymous Coward
    Anonymous Coward

    Ya think?

    This must be sudden news for a few people.

    1. Anonymous Coward
      Anonymous Coward

      Re: Ya think?

      Nice to see it so comprehensively splaffed out, in public for all to admire, just the same.

  4. razorfishsl

    You only have to read the specs for the hardware and the system design material to realize at a hardware level it sucks cock.....

    and let's not even get into EMC & RFI testing.....

    Infact I'm surprised they did not throw a few FPGA's in the design to show how hip and up-todate they were.

  5. Indolent Wretch

    Watch what your saying guys, OCP OWN THE POLICE!

    1. Roj Blake Silver badge

      Yeah, but the hardware's so awful that an OCP ED-209 can be hacked by a small child.

  6. Anonymous Coward
    Anonymous Coward

    if your shop can handle a level of unreliability

    then this is great - even many enterprise apps can be stateless and survive minimal downtime as VM's or clusters reset.

    Clearly this is not great for everthing.

    I dont really want my enterprise to spend $$$ on the service that orders coffee on a monthly basis underlying infrastructure either as it seems some people may want everything bulletproofed.

    Much bettwe to consider it a compute performance/reliability tier just like storage people have been doing for ages...

  7. dieselapple

    Clarifying for ITRI~~it takes 15 days to complete the testing process

    I do like to clarify, however, the test methodologies, test cases, certification processes of ITRI OCP Certification Testing Center.

    1. ITRI OCP Certification Testing Center works with a global team made up of OCP member organizations to establish a common set of standard OCP testing programs. The primary system testing methods are:

    - Functional testing

    - Stability testing

    - Stress testing

    - Local testing

    - Remote testing

    Currently testing procedures include more than 60 tests.

    In conformance to ISO/IEC 17011:2004 (ISO17011) that requires separation of the testing and issuance organizations, the OCP team will send the testing results to the OCP Foundation for issuance of the OCP Certified status.

    2. There are three requirements in order for testing to commence:

    - An executed service agreement authorizing the ITRI OCP Certification Testing Center to conduct the work.

    - A completed SOW (Statement of Work) that includes all relevant information for the submission.

    - The physical acceptance of required samples at the OCP Certification Testing Center.

    3. As ITRI OCP Certification Testing Center adopts the most critical testing procedure and highest standard of the engineering testing, it takes total 15 working days for the center to complete the process, including 10 working days on testing and 5 more working days for logistic. Throughout the process, ITRI OCP Certification Testing Center also takes photos and notes of each testing equipment and each testing step.

    4. Since its launch, ITRI OCP Certification Testing Center has successfully tested several products. As it is still in well operation, the website is http://www.ocpcertificationcenter.org/. (the web link the reporter clicked could be the ITRI's old website)

    Please be aware of the above facts. Thanks.

    1. jaybeez

      Re: Clarifying for ITRI~~it takes 15 days to complete the testing process

      Dieselapple,

      You appear to be an ITRI cheerleader simply spouting out their boilerplate marketing fluff. The director of ITRI and head of OCP certification just resigned and is unwilling to talk. I just visited the ITRI website and it is full of broken links and there is not a lot of evidence of test activities only a handful of tested servers that "passed".

      The only meaningful links from the ITRI website appear to be their "during testing page" here:

      http://www.ocpcertificationcenter.org/submit/page-during.aspx

      It starts with a Phase 1 that appears to be a five minute out of box inspection, and then stage 2 which is the OCP testing process. Where are these 15 days that you speak of?

      And this website appears to be developed/maintained by children. Look at these broken links:

      http://www.ocpcertificationcenter.org/submit/~participate/page-join.aspx

      http://www.ocpcertificationcenter.org/services/page-process.aspx

      No director, no head of OCP certification, a broken website, and no pictures or videos of active testing. Isn't this supposed to be open?

      1. dieselapple

        Re: Clarifying for ITRI~~it takes 15 days to complete the testing process

        First of all, I have no idea how you got the broken links. There was a website revision not long ago and the links you got could be before the revision. Besides, the new C&I Project lead had just been selected at the same time you published the article. ITRI’s OCP Certification Testing Center has successfully conducted OCP certification tests for several clients, including multiple submissions in 2015. Currently, it is very much in active operation.

  8. Probie

    Open letter for an Open Project

    So being involved in OCP from the start and running a project group (virtual IO), and I got involved as an interested party in the testing /certification delivery asked to look at it from my then employer at the time. I should also mention I know Paul Rad and YF Juan both professionally and personally.

    I left or got removed OCP depending on the perspective that you want to employ around 2012/2013, I know I certainly made the decision to no longer participate in OCP at the summit in 2013, although my project mandate got split and subsumed around the time of the announcement of Intel donating certain optical interconnects. most of the reasoning behind this is not really germane to this conversation. One most certainly is and that is applicability of OCP projects.

    At the time (and nothing has lead me to think otherwise since) the project as a whole was and is geared towards massive scale deployments (or hyper-scale for those with a marketing disposition). In other words it was not meant for the legacy enterprise workload. It is true to say that given some application development, some effort in testing, and a nice tailwind that the enterprise could use OCP hardware, it was certainly defined as an undertaking. However my personally held view and one that can resonate in OCP to varying levels is this.

    "ANY entity wanting to use OCP hardware has to take a level of responsibility in testing this equipment for itself, to satisfy itself that the necessary criteria that it needs met are met. If an entity or company EXPECTS this as a defacto provided service then an OEM should be your first port of call for your hardware, or you need to revise the expectation."

    The fundamental absolute personally held truths are that if you are not prepared to accept a "degrade in place" infrastructure model. A model that places methodology and implementation of data integrity, at the "software/application" level, think very hard about the hardware you use, think very hard about the needs of IT landscape you are overseeing. and that "OCP is about the deliverance of Open source hardware to a community. It was not about being an OEM"

    As for the enterprises, the consumers using OCP, I consider them to this day vital to the effort, more than most people know they have contributed to OCP and they have contributed to moving the rhetoric from a single entity (Facebook) to multiple entities, and perhaps at some junction in the future enabled an ecosystem where community driven software and community driven hardware can be used whatever the scale.

    The project for Certification,(at the time this was headed by a representative from the financial industry, I am ashamed to admit I cannot remember his name, and one from the distribution and supply industry, I respect his privacy so he stays nameless) started out of a need, making sure that equipment was built to the specification, that say some from ODM A was the same as ODM B at least on a functional level, and providing a set of guiding or example scripts or harnesses so that sniff tests could be performed. I specified function level as there needs to be a degree of flexibility around a hardware BOM. I remember discussions around expanding that reasoning, but I took myself out of OCP at that time.

    I do not hold 20 years of testing in my resume /c.v. I barely register for a quarter of that time, unlike the anonymous testing engineer. I also have not kept up with Open Compute in any material sense. So I could be speaking out of turn here, but it seems doubtful they are trying to make an OEM like certification here, at least publicly.

    As for where the labs and facilities are located, it might be wiser to ask the question of "Where was the community effort located at the time?", rather than speculate or inform on whom is running what and the previous history or specific ego's involved. Not because I want to defend individuals, but because sometimes the most obvious answers are the right ones. ask yourself this "If I started a community project where would I try an locate resources, people, projects, "things that need doing and need presence" ?" The rest of whom does what I have no idea or interest in.

    On a final note, I would be very interested to learn in the anonymous testing engineer has submitted a proposal, plans, assistance or general help in rectifying what he sees as a problem? I assume he has, few people make a loud cry and stay anonymous without at least trying to fix the problem. I for one would very interested in knowing the message he got back from OCP if indeed he did as I assume discussing things with them.

    The great idea about it being open though is people get to judge for themselves if this is something you want. In your way, releasing your results. People do not have to take my view or the 20 years a testing engineers view, you get to choose what you want do with say hopefully a bit more transparency. At least I hope that is what an open community project stands for, because that is what it should stand for. I cannot necessarily have the same hope for OEM's, what was it someone said about the the condition upon which God hath given Liberty?

    I am sure the Register can check me out to see if I am kosher, if anybody cares.

    Regards

    Probie

  9. Pascal Monett Silver badge

    Great news

    Now I know that, if I am being proposed a cloud environment which is OCP certified, I should run the hell in the other direction.

  10. FuturePlus

    My 2 cents....

    I am not posting anonymously so folks can check out my credentials (Barbara Aichinger FuturePlus Systems). I sit on the OCP C&I committee and I have pushed for more rigorous testing. I started with OCP in 2013 and received considerable resistance to my idea's. Since I am a 20+ year veteran of the T&M industry I was promoting the type of Validation and Compliance testing that the tier 1 vendors use. However those involved did not want to pay the price for that type of testing. I was repeatedly told that the test labs will not use any T&M hardware (scopes, analyzers, etc) to test the OCP servers. Initially OCP wanted tier 1 quality at a tier 3 price and they would do this by standardizing the HW and using volume. However to get the tier 1 quality you have to adopt the tier 1 Validation strategy. This has not happened. I have continued to push my idea's and have submitted a new concept called an 'audit'. This 'audit' is not a full validation of the hardware as we would expect the OEM to do this. What the 'audit' will do will be to verify signal integrity on the memory bus by measuring the data valid window of all the signals and check for protocol and timing compliance to the JEDEC spec. Oh BTW I think I should mention I only cover the memory interface of the server leaving the other aspects to other members of the C&I team. FB has identified Memory as the #2 failure in the data center. Google has also published several papers on memory errors. So the memory subsystem clearly needs some validation. I would encourage the anonymous test engineer to join me in the battle to bring tier 1 validation to OCP servers.

  11. Anonymous Coward
    Anonymous Coward

    Wow, you must have struck a nerve. I can't remember the last time multiple vendor(ha!) reps commented on an article. What surprises me is that they didn't contact the author directly...since there's a nice mouse-over by-line at the top of the article.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like