back to article Software dev 101: 'The best time to understand how your system works is when it is dying'

At the QCon Developer conference underway in London, William Hill's R&D Engineering Lead Gavin Stevenson told attendees that they should celebrate IT failures. "The best time to understand how your system works is when it is dying," he said. QCon is a vendor-neutral event focused on large-scale software development and …

  1. Anonymous Coward
    Anonymous Coward

    Is it just me ..

    .. or would you too have less confidence in someone who thinks that an app dying is brilliant?

    I know it's not very trendy these days if I look at banking or government projects, but I have this weird desire that I want things to actually *work*..

    1. Richard 12 Silver badge

      Re: Is it just me ..

      I would have more confidence.

      It means they're actually testing the limits, not just spouting off a marketing specification.

    2. Lysenko

      Re: Is it just me ..

      The way to keep a system working is to know what it's limitations are and the only way to do that is to break it.

      You test a gun barrel by packing in more and more propellant until you burst it. Tyres: you push the car faster until you finally spin off the track. If you don't know where the limits are then there is always the possibility that "one more transaction" is all it takes.

      So, "no" I don't have less confidence in an R&D guy who thinks failure is brilliant. What gives me less confidence is hearing: "It works fine on the dev system!" or (worse): "It passed the unit tests, so ship it!".

    3. John Robson Silver badge

      Re: Is it just me ..

      In (lab) testing I want them to fail - and this is what happened here.

      They tested against a known large load - and it fell over. They tweaked it and now it doesn't

      That's the point of this testing...

      1. Anonymous Coward
        Anonymous Coward

        Re: Is it just me ..

        They tested against a known large load - and it fell over. They tweaked it and now it doesn't

        I think that's where the point of possible conflict occurs: what if YOU are part of that large load? That's not an allusion to any physical shape you may have, more a reminder of the experiments Facebook was running with its customers without their knowledge.

        The problem is that nothing simulates real life quite as well as real life ..

        1. John Robson Silver badge

          Re: Is it just me ..

          I did specify lab testing..

          We used to do this at a previous place of employment - we'd run simulations etc, then we'd test on the real world. But we'd do so in a non destructive manner (fairly easy, we were testing torrenting performance, so we contributed as much as we could to the swarm)

          They replayed an old set of data into the system in a lab...

          This is a *good* time for the failure to occur.

        2. Anonymous Coward
          Anonymous Coward

          Re: Is it just me ..

          There is a good chance that a whole lot of us started out as part of a large load.

          1. Anonymous Coward
            Anonymous Coward

            Re: Is it just me ..

            There is a good chance that a whole lot of us started out as part of a large load.

            Well, if you'll excuse me then for a moment, gotta dump core..

    4. Jedit Silver badge
      Boffin

      "would you have less confidence in someone who thinks that an app dying is brilliant?"

      No. I'd have less confidence in someone who thinks their app can't fail.

  2. Bc1609

    "Architect for failure"?

    I know there ain't no word that can't be verbed, but that's a horrible construction.

    1. Pascal Monett Silver badge
      Coat

      "Failuretect" then ?

      1. Lysenko

        ... or "Archifailure"

  3. Tony-A

    +!

    Does it fail "gracefully" (whatever that means) or does it try to take out the rest of the world as it dies?

  4. happy but not clappy
    WTF?

    "today's best practice"

    Wha? Since when?

    "today's fashion-statement". There, FTFY.

  5. Dan 55 Silver badge
    Devil

    Nope

    I'm pretty sure the best time to understand how it works is when they give you proper training/shadowing at the start, instead of "argh, it's gone tits up again, work out what happened".

  6. tiggity Silver badge

    Basic testing really

    No point having test systems that are unrepresentative of actual use.

    For some dev work might want small system just so you can test on a single machine easily.

    But for proper QA good to do worst case tests.

    It's a far better approach than people testing on a small amount of data / transactions & letting it out of QA & then finding the system fails in a big heap when real world big numbers of records / transactions involved

    1. HmmmYes

      Re: Basic testing really

      Youd think that but you'd been wrong.

      One, systems are getting bigger and more complex. Test complexity is shooting off the scale. Its s very hard to verify large systems. Chuck in working on a system thats been kicking around for 20 odd years and has been passed around more contractors than a fat ginger girl at a party ...

      Two, testing/QA can become political. Worked on a new system in the early ISO9000 days. Some iditio had staked his large salary on it being a quality product. Problem was the device caught fire after 4 hours. All tests were desgiend to finish and the machine turned off at 2hours. VW appear to have a similar thing 15 years down the line.

    2. Grikath

      Re: Basic testing really

      "No point having test systems that are unrepresentative of actual use."

      You try getting the projected man-hours and resources from Manglement, prodded by the Beancounters about Budget, and by Sales about Delivery to do that..

  7. HmmmYes

    Dont walk away from Java, run!

    Erlang/OTP is designed for massive, in-connected networked applications.

    Java is designed for running an animated tooth.

  8. SVV

    Ground breaking insight

    So, he's saying "you learn from your mistakes".

    How did anyone ever get by before this wonderful new piece of insight was revealed to the hushed masses of the QCon Developert Conference?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like