back to article Collective SSL FAIL a symptom of software's cultural malaise

In the 19 years that have passed since the first implementation of SSL, you could be forgiven for expecting that the industry could do it right by now: and yet last week, not one but three SSL vendors were discovered to have implementation problems. Belkin was caught not checking SSL certificates; WhatsApp was discovered to …

COMMENTS

This topic is closed for new posts.
  1. Jonesy

    You'll get nothing from clang using -w

    -Wunreachable-code .... case is important

    1. T. F. M. Reader

      Re: You'll get nothing from clang using -w

      GCC also has -Wunreachable-code. There are reasons why it is not enabled by default: unreachable code warnings are very often spurious. Debug code may be unreachable with one set of preprocessor flags and reachable with another. Code in inline functions may be unreachable in one instance and not in another. Etc.

      1. Destroy All Monsters Silver badge
        Headmaster

        Who is Wun and how do I reach him?

        Which means it's a case of shitty tools - in this case, the language.

        There is a reason for why high-assurance code only may use a subset of C.

        1. Destroy All Monsters Silver badge

          Re: Who is Wun and how do I reach him?

          Yep, downvotes.

          Shitty developers abound, too. No surprise there.

          1. Roo

            Re: Who is Wun and how do I reach him?

            "Yep, downvotes.

            Shitty developers abound, too. No surprise there."

            Shitty comments abound, too... You should expect some down votes for making an unsubstantiated snidey comment about a wide-spread set of tools that a lot of people use by choice.

            P.S. Conflating people who 'downvote' you with "Shitty developers" is probably generate some more down-votes for you.

            1. wikkity

              Re: Shitty comments abound, too..

              > Shitty comments abound, too... You should expect some down votes for making an unsubstantiated snidey comment about a wide-spread set of tools that a lot of people use by choice.

              Maybe the op thought that what he said is did not need to be substantiated given the audience on this forum. For many application areas standards identify subsets of language features that can be used, e.g. MISRA for c/c++

        2. Ken Hagan Gold badge

          Re: Who is Wun and how do I reach him?

          Er, I think you mean "in this case, the compiler". If the unreachability requires knowledge from outside the function (so the optimising compiler can see it in a particular case, but the programmer cannot assume that for the general case) then the code should not be flagged.

        3. Michael Wojcik Silver badge

          Re: Who is Wun and how do I reach him?

          Which means it's a case of shitty tools - in this case, the language.

          No, in this case, it's the implementations.

          There is no reason - none whatsoever - that the implementation should not emit a diagnostic for adjacent gotos. That's not just unreachable code; it's patently unreachable code due to a trivial and explicit construct in the source. It's not a matter of someone using goto, break, or return to disable code (which is poor practice anyway), and it's not a will-never-succeed control structure like "if (0)" (which is also poor practice).

          This is clearly a case where the implementation can distinguish between cases of unreachable code that are possibly intentional, and those that almost certainly are not.

          That said, I too am of the opinion that if your program is riddled with so much unreachable code that you have to disable the warning - much less have it disabled by default - then your code is crap. Any developer working for me who produced code like that would be required to clean every one of those warnings up.

          (And that said, I'll also repeat what I wrote in another post on this topic: It's possible to write good software in C and C++, but it's not easy, and very few programmers learn enough of the languages to do so. The number of C programmers I've met who understand incomplete structure types, for example, is appallingly small.)

      2. Anonymous Coward
        Anonymous Coward

        Re: You'll get nothing from clang using -w

        You can use pragmas to silence spurious warnings. Compile with everything enabled, investigate, silence ones you know are spurious, then when new ones appear you're not overwhelmed with noise.

      3. David Cantrell

        Re: You'll get nothing from clang using -w

        In current gcc, -Wunreachable-code doesn't do anything. That functionality was removed. In Clang it does work, but isn't turned on by -Wall. Apparently 'all' doesn't mean 'all' in Clang-land.

        The real problem though is that the code clearly doesn't have unit tests, and they probably didn't do any analysis of whether all the code was covered by the tests.

  2. ecofeco Silver badge

    Cultural malaise?

    If by that you mean cheap bastards in suits, then yes, that is the problem.

    1. claude191

      Re: Cultural malaise?

      I think it is deeper than "bastards in suits". Whilst not disagreeing that this could be part of the problem, the "macho coding" ethic I see all the time doesn't help either.

      The two unfortunately reinforce each other in a toxic embrace :(

      1. ecofeco Silver badge

        Re: Cultural malaise?

        I've seen this as well.

        1. Destroy All Monsters Silver badge

          Re: Cultural malaise?

          Add to this "he's cheap, so he's good enough on this team" and "we don't have time to do it right".

  3. Pet Peeve

    Monday morning quarterbacking. There is always one test you didn't realize you needed to perform. Remember that situation with some unixes where they created horribly weak SSL certs due to a subtle bug that didn't mix in enough randomness? This stuff happens, relax.

    By all reports, the test case would be an unholy bitch to create, which makes me wonder exactly how exploitable the problem really is.

    1. Version 1.0 Silver badge

      Never attribute to Malice anything that can be accomplished by Stupidity.

      Come on, we've all looked at code for days and not seen the problem, only to awaken in the middle of the night and realise that the problem is simply that we're always checking for a>b and never a=b or some similar malfunction (sic).

    2. David Cantrell

      Easy. There are examples out there. Which means that creating a test case is also easy. Time-consuming, perhaps, but easy. All the certificate tests would be time-consuming but easy, and I *really* hope that they've got at least some - and that means that they know how to create them, so creating more shouldn't be such an awful task.

    3. Michael Wojcik Silver badge

      By all reports, the test case would be an unholy bitch to create

      Not "by all reports". Adam Langley created a test case for it.

      It is a bit subtle. You have to be using a protocol prior to TLSv1.2 and a non-anonymous DHE or ECDHE suite. The certificate chain has to be valid but the ephemeral key signed with the wrong private key. You could achieve this by modifying the ServerKeyExchange message in flight, or by modifying the SSL/TLS stack; I don't think you can easily do it with an out-of-the-box OpenSSL-based application, for example. (You'd probably have to create a filter BIO and modify the ServerKeyExchange message before it's sent.)

      Exploiting it? Act as a man-in-the-middle and pass messages through unchanged, except for SKE. By substituting your own ephemeral key in the SKE, you can decrypt all the traffic.

  4. claude191

    I've also ignored the conspiracy theories, as the "programmer mistake" sounds about right. But I wonder if there is an outside chance that this duplicated line was a result of a broken merge?

    Either way, the issue is with the testing, and so comes back to the "do it cheap" culture that the article is complaining about.

  5. tonybarry

    Hi Pet,

    The other article on ElReg about the SSL issues is here:-

    http://www.theregister.co.uk/2014/02/23/apple_mac_os_x_10_9_ssl_fix/

    and lists a website (gotofail.com) which will helpfully check your cert authentication process is actually working. I suspect that it is a much more easily crafted exploit than we might think.

    Regards,

    TB

  6. Allan George Dyer
    Black Helicopters

    That's just what THEY want you to think!

    Now, where's my tinfoil-hat icon...

  7. Gene Cash Silver badge

    No, it's a sign that security & encryption is hard

    Security is the one spot where you have to be not just good but perfect, because people will jump on the tiniest mistake (like this one) to crack your things. Security has to be tight all the time and the crackers just need to get lucky once.

    It's also the hardest to test, because the testers in your company won't be 1/10th as motivated or ingenious as the crackers.

    I won't even get into how completely undocumented SSL is, and how hard it is to use. I've ranted on that before.

    1. Destroy All Monsters Silver badge
      Headmaster

      Re: No, it's a sign that security & encryption is hard

      1) This is not a tiny mistake. It is tiny only in the sense of how delivering the wrong container is an off-by-one error because the serial numbers were nearly the same.

      2) This has NOTHING to do with security & encryption. If you have this kind of problem in your radiation therapy machine, you have Therac-25 and dead people.

      3) "It's also the hardest to test, because the testers in your company won't be motivated". You are doing your "testing by the gorilla regiment" wrong

      1. Neil B

        Re: No, it's a sign that security & encryption is hard

        @Destroy All Monsters "2) This has NOTHING to do with security & encryption. If you have this kind of problem in your radiation therapy machine, you have Therac-25 and dead people."

        That's one healthy sense of perspective you've got there.

  8. solo

    Inhuman working hours.. where is my red flag

    I will stick to the original suggestion of the article.

    It's easy to see that a dev not sitting at night is not considered enthusiastic (adding value).

  9. Daniel B.
    Boffin

    Goto

    I do wonder why on earth are they using goto in C. Isn't this a bad sign of someone trying to do stuff the ole BASIC way? That goto is frowned upon most functional/procedural languages?

    1. unwarranted triumphalism

      Re: Goto

      The JPL coding standards forbid it, with good reason.

    2. richardcox13

      Re: Goto

      Using goto when goto is the best approach (such as a series of tests with shared non-trivial clean-up) is reasonable. It would be possible to fake the gotos with breaks in a do...while (false) one-shot loop but that is a significant abuse of a loop (and the one-shot nature is hidden at the bottom).

      In this case a boolean tracking hasFailed and checking in each test would be reasonable, but if the existing tests are more complex the addition of checking that boolean adds to the complexity.

      In the end the pattern of a series of tests with a fall-out to common clean up is seen often enough the pattern is recognised and thus the code easier to understand.

      Remember Dijkstra's paper (it is worth the effort to read) is about the overuse of goto when there are better alternatives. Too many people see the title without reading the content and thus "ban goto".

      All that said; in more than two decades of professional programming I've use goto twice (C++'s ability to do cleanup in a destructor of local objects really helps).

      1. NullReference Exception

        Re: Goto

        Not to mention, if you use the do { /* stuff */ } while (false); construct and have two break statements where there should be one (instead of two goto statements where there should be one) you have the exact same bug...

    3. wikkity

      Re: Goto

      The code in question highlights the benefits of at least 4 essential coding standard rules:

      Never initialise a return value if the language can detect an unassigned value (e.g. java) otherwise initialise to a value that can not be possibly be correct.

      Have automated unit tests, both for success and failures.

      ALWAYS add braces around blocks of code

      Have automated unit tests, both for success and failures.

      Plus a couple of minor ones:

      only use goto _really_ improves readability (only rarely seen cases where it does)

      don't use tabs for code indentation

      In this case I've have thought a code development IDE would have made this glaringly obvious to the operator if they allow it to indent your code.

      1. Vic

        Re: Goto

        > ALWAYS add braces around blocks of code

        They haven't got you writing python yet, then?

        Vic.

        1. John Gamble
          Boffin

          Re: Goto

          > ALWAYS add braces around blocks of code

          They haven't got you writing python yet, then?

          Vic.

          Actually, I wondered if this was a case where programming in one language blinded a programmer to the bug written in another language.

          There are a lot of comments in this and in prior articles about the necessity of using braces even around single-statement if blocks, which up to now struck me as odd -- how could anyone not see the indentation error? But if your brain is trained to view multiple-indented lines as a single block regardless of braces, then maybe you wouldn't "see" the problem.

          Note, not blaming python (for what it's worth I'm writing a project in coffescript right now), and I don't know if the coders in question even use python or other indent-block languages. But I do wonder if conflicting programming standards contributed to the problem.

      2. David Cantrell

        Re: Goto

        And also ... use 'else if' for long chains of independent conditions. Not only does it make the code clearer by making their independence obvious, but it would also have turned this particular error into a syntax error and compilation failure.

      3. David Cantrell

        Re: Goto

        And ... code coverage analysis will tell you where you've got missing tests.

    4. Dagg Silver badge
      Trollface

      Re: Goto

      Why do you need to use a goto! These days just throw an exception it does the same thing!

      1. PhoenixRevealed

        Re: Goto

        Very low level system code like authentication is often written in C, not C++. In fact, this page...

        http://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-goto-fail-apples-ssl-bug-explained-plus-an-unofficial-patch/

        ...not only describes it as C, but the code snippet that shows the error is clearly written in a style commonly used in C, but not in C++ (although it is valid syntax). C++ has exceptions... C does not.

    5. Rich 2 Silver badge

      Yuk!

      The syle of the code in question is awful. Shite even. I'm sure it was acceptable 30 years ago when it was probably written, but really.

      Using "goto"? Nil points (and using break within a while loop is just as bad by the way)

      Not using braces in a condition block? Nil points.

      Assigning within an if() clause? Nil points.

      This is a fantastic example of how to write a really poorly structured piece of code. No wonder the fault was never spotted.

  10. Anonymous Coward
    Anonymous Coward

    So what's the solution?

    I know, some people will say "don't reinvent the wheel". Was it dumb for Apple to (apparently) implement its own SSL library, instead of using something like openssl? Perhaps.

    However, serious bugs have been found in widely used open source security software that have been present for a long time before they found (or publicly found) If everyone was using openssl, on average everyone would be more secure. Apple would have avoided this bug. Sounds good, right?

    Except when a bug was found in openssl it would affect everyone, all at once. There's no way that could be kept quiet long enough for patches to be developed for more than a fraction of devices, so you'd end up with 3/4 of the world open to attack for a few days or longer.

    1. richardcox13

      Re: So what's the solution?

      Your third paragraph is why. Mono-culture is bad, however much better that one is currently.

      (This is why I am glad IE and Firefox do not use Webkit: everyone using one browser engine would just lead to a repeat of the first browser wars with Webkits' defects being the new standard.)

    2. Charlie Clark Silver badge

      Re: So what's the solution?

      Two wrongs don't make a right: software can always have bugs. Open source has the advantage of peer review and the chance to learn from each other's mistakes.

      Apple already makes extensive use of open source software in the stack but it doesn't really embrace it. No, this doesn't mean that they should suddenly open source all their stuff immediately but that they can contribute more actively to making key libraries better for everyone. Doing this properly would mean Apple developers could spend time reinventing and retesting the wheel.

      Currently, if you buy a Mac your POSIX stack will stay frozen until Apple release a new version of the OS (Apple's openssl on my machine seems to be 0.9.8y, MacPorts is on 1.0.1f). It would be a cinch for them to adopt any of the ports projects and integrate into the OS and lever their own sophisticated QA so that we all get better components.

      All of this has nothing to do with a caffeine-infused development culture which I think is irrelevant here. Companies still focus on features over quality. Someone took a decision here not to implement code review, static code analysis, pen-testing, etc and all likelihood that wasn't some kid hunched over a keyboard a 3 in the morning.

  11. Anonymous Coward
    Anonymous Coward

    NSA Link

    The vulnerability was publicly released by Apple on 24 September 2012.

    Source: https://mobile.twitter.com/Jeffrey903/status/437273379855667201?screen_name=Jeffrey903

    NSA internally boasts of Apple being "added" in October 2012.

    Source - slide 6: http://www.theguardian.com/world/interactive/2013/nov/01/prism-slides-nsa-document

    More info:

    http://daringfireball.net/2014/02/apple_prism

    1. thomanski

      Re: NSA Link

      From the daring fireball link you've given :

      > Once the bug was in place, the NSA wouldn’t even have needed to find the bug by manually reading the source code. All they would need are automated tests using spoofed certificates that they run against each new release of every OS. Apple releases iOS, the NSA’s automated spoofed certificate testing finds the vulnerability, and boom, Apple gets “added” to PRISM.

      That's exactly what's needed. A proper functional test suite attempting to fool an SSL implementation. Not in the hands of the NSA but of everyone who produces SSL-based apps.

      I find it hard to blame Apple they didn't have this. But they definitely should be working on it now.

      1. Michael Wojcik Silver badge

        Re: NSA Link

        From the daring fireball link you've given :

        > Once the bug was in place, the NSA wouldn’t even have needed to find the bug by manually reading the source code. All they would need are automated tests using spoofed certificates

        And once again, Daring Fireball is daringly wrong.

        "Spoofed certificates" would not in fact work with this bug. See my explanation above. Exploiting the bug requires a valid certificate chain and tampering with the key exchange, for a DHE or ECDHE suite.

    2. Anonymous Coward
      Anonymous Coward

      Re: NSA Link

      I could believe the NSA has some employees secretly working for Apple and other large companies to put holes in the OS to give them easy access.

      But I find it harder to believe they'd do it this way with the double goto. I admit I had to think about it for a second or two to realize why that was a bad thing - if they ran their code through 'indent' it would make stuff like that more visible, though I expect if someone was surfing through the code and saw that they'd fix it even if they didn't immediately realize it was a problem (let alone have the further realization it was a big security problem)

      Hopefully they'll look at all the code modified by whoever did that to see if they inserted other more subtle issues elsewhere...

  12. John Smith 19 Gold badge
    FAIL

    So how many fails did this take to get out there.

    a) Compiled without maximum error reporting during compile (If this is release code there should be no errors and no warnings)

    b)Never eyeballed by either formal or informal code reviews.

    c)No scanning of the code base for the sort of stupid cant-possibly-happen coding errors that in fact can and do happen IRL.

  13. lee harvey osmond

    Wasn't an accidental double-paste

    I have seen this, incredibly rarely, myself. It's something to do with Subversion; when committing changes to a resource that has non-conflicting changes, a single line can either be omitted, or repeated. I've seen this twice in a decade. This looks like a accidental double-paste; the two I saw did not. It's the sort of thing you don't discover for days or even months, so investigating what actually happened, testing for reproducibility, is ... difficult.

    1. Michael Wojcik Silver badge

      Re: Wasn't an accidental double-paste

      It might have been a merge error, as any number of people have already suggested. You do not know it was a merge error, and not a cut-and-paste error or similar mistake.

  14. Anonymous Coward
    Anonymous Coward

    Bullet-proof code processes?

    we know that the software industry has collectively spent a lot of time and energy creating and implementing processes to more-or-less bulletproof code creation

    Do tell, because I don't think it's as obvious as you do. Even MISRA has suffered some epic failures, and that's one of the tightest standards going.

    1. Michael Wojcik Silver badge

      Re: Bullet-proof code processes?

      Technically, Richard is correct: the industry has put great resources into creating such processes. That those processes are not successful doesn't change either the goal or the effort made to reach it.

  15. Rajiv Dhir

    Is the malaise "Feature Creep" or "Sleep Creep"

    Sure security is hard, but would someone at Apple care to explain how such a straightforward bug made it through regression testing. Do they have regression testing? Is it complete?

    One could equally complain, that if the coding standards at the organisation insisted on braces, the second ctrl-v MAY have caused a compile time syntax error. Its also possible, that the programmer put it in deliberately to test something and forgot to take it out.

    Perhaps the malaise is focusing on features rather than writing boring test harnesses for core libraries?

    While everyone points the finger at M$ for this, its not exactly unique to them.

    1. PhoenixRevealed

      Re: Is the malaise "Feature Creep" or "Sleep Creep"

      Your question assumes that the flaw was added after an earlier version functioned correctly. We don't know this, and the extra line may have been there from the start. Regression testing will only catch errors that were previously integration tested and worked, but not flaws that were never tested for in the past. It seems quite possible that this code just wasn't properly tested from the start before promotion for release. I have quite often seen situations where a developer performs unit tests to make sure things work properly with correct inputs, but neglect to test for incorrect inputs.

      Subsequent integration tests will then often also only test that things work as expected when inputs are correct, especially since the tests are often designed by the same developer that originally wrote the code (they shouldn't be but often are.)

  16. PassiveSmoking

    Good, fast, cheap

    Most of us of an engineering bent know the old addage "Good, fast, cheap: pick two", yet we still always cave when our managers almost invariable select "fast and cheap".

  17. John Smith 19 Gold badge
    Unhappy

    Shouldn't this be on the list of "idoms that will f**k up software but still compile" list?

    IE the sort of stuff you have a search script to scrub your code for?

    I think that the basic ideas of how to produce error free (or low risk of error) software are fairly simple.

    Enforcing them (and understanding shy you should enforce them) is hard.

  18. PhoenixRevealed

    Malicious? Probably not... but don't rule it out...

    I'm not one for conspiracy theories, and I lean toward the "programmer error" explanation, but if I was a baddie knowing how human cognition works and intent on a subtle, easily missed change that would render SSL worthless, I might do EXACTLY this. I've been a professional programmer since the mid 1980's, and even knowing there was something wrong in the code I would have likely needed to step through it with a debugger to spot the flaw as it just didn't jump out at me. The fact that every other line was "goto fail;" made the offending line recede into the background noise. Code reviews are tedious and boring tasks that will usually only catch things that look "wrong", not flaws like this one that trick bored eyes by repeating a valid pattern with an extraneous element.

    Yes it is obvious... once it's been pointed out to you with a big arrow.

    P.S. to those claiming that this would have been caught if "unreachable code" warnings were turned on in the compiler... how so? "Unreachable code", as the name suggests, is code that can NEVER be executed, such as inside an "if" block where the condition can never be true. In this situation there was no code that could never be executed, just one line that shouldn't have been there. Complier warnings wouldn't have caught this, but correctly written automated unit testing should have.

    1. thomanski

      Re: Malicious? Probably not... but don't rule it out...

      > to those claiming that this would have been caught if "unreachable code" warnings were turned on in the compiler... how so? [...] In this situation there was no code that could never be executed, just one line that shouldn't have been there.

      Look again, everything in bold is unreachable since the second "goto fail" is not conditional on the two lines above it:

      if ((...) != 0)

      goto fail; // this one's actually conditional

      goto fail; // this one isn't, always gets executed

      if ((...) != 0) // all these lines in bold are unreachable code

      goto fail;

      err = sslRawVerify(...);

      if(err) {

      sslErrorLog(...);

      goto fail;

      }

      fail:

      SSLFreeBuffer(&signedHashes);

      SSLFreeBuffer(&hashCtx);

      return err;

      (from http://opensource.apple.com/source/Security/Security-55471/libsecurity_ssl/lib/sslKeyExchange.c, edited)

  19. Anonymous Coward
    Anonymous Coward

    Conspiracy theory time!

    This bug would make it trivial to perform SSL MitM attacks against Apple devices.

    It was introduced in September 2012.

    In October 2012, according to documents leaked by Snowden, Apple was listed as having signed up to PRISM.

  20. Michael Wojcik Silver badge

    This bug would make it trivial to perform SSL MitM attacks against Apple devices.

    "Trivial" is an overstatement - it's not hard, but it's not as easy as, say, a failure to perform certificate checks at all would be (and we know there's software out there guilty of that). And there are ways for the client to mitigate it (require TLSv2.1, or disable DHE and ECDHE suites), though admittedly they are rarely used.

    As the article notes, this would be a pretty blunt backdoor. It's not nearly as sophisticated as Dual-EC-DRBG, for example. That doesn't mean it isn't malicious, but I'd like to hope that the Bad Guys have a bit more style.

  21. Anonymous Coward
    Anonymous Coward

    If Windows has a bug it's because MS is bad, if iOS & OSX have a bug is a "cultural malaise"?

    That's just a demonstration that:

    1) Bad developers are bad developers wherever they are employed

    2) Code is never automagically secure - nor one OS is better than another "just because"

    3) "Open source" doesn't mean someone will really look at the code - how this one went unnoticed by the legions of open source supporters? Don't you spend your free time reading code to ensure it's properly written and without bugs? Why you don't?

    4) Issue like this would be identified not only by static code analysis, but also by tests complemented by test coverage analysis tools - they should spot the code which tests never exercise. I use one for Windows applications - it's not cheap but it's great, and I'm not the Mighty Apple With BIllions in Cayman^H^H^H^Hsh. Don't they buy such tools for their developers? Or they do but their developers are too lazy and spend more time using the iPhones than writing proper tests and checking coverage?

This topic is closed for new posts.

Other stories you might like