back to article What Murphy’s law has to teach you about data centres

Data centres are loud, noisy places. So loud, it's often hard for anyone to hear your screams and cries of frustration when Murphy decides to make you the example for his law. For some, this is a good thing. For others, this is why social media was invented. Outages of all kinds echo out into social media like waves in a pond …

  1. Anonymous Coward
    Anonymous Coward

    And if worst comes to worst, the US decides for no particular reason to ventilate a JDAM through the HVAC roof opening.

    It's a William Gibson world, and mo messing.

  2. Anonymous Coward
    Anonymous Coward

    ""Is it up yet? When will it be back up? Why is it taking so long? Can't you do anything to make it go faster?""

    To which add "Management have scheduled crisis meetings for every hour - your attendance is mandatory".

    1. Fortycoats

      Obligatory Dilbert

      http://dilbert.com/strip/2011-10-02

      Reminds me of a former employer, and also why they are a "former" employer.

    2. Anonymous Custard

      To which add "Management have scheduled crisis meetings for every hour - your attendance is mandatory".

      My usual answer to such is "Do you want me to talk about fixing it, or do you want me to actually just fix it and we can then discuss it later?" I usually find that focuses the mind, even of managers.

      This is doubly true when they also insist that you have a full and nicely tuned PowerPoint presentation to bring to the meeting, as they need pretty graphs and whizzy animations to have a hope in hell of understanding anything even slightly technical.

      1. Anonymous Coward
        Anonymous Coward

        ""Do you want me to talk about fixing it, or do you want me to actually just fix it and we can then discuss it later?"

        And if they respond, "Both—simultaneously"? And perhaps mutter later on, "Who hired this guy who can't be in two places at once?"

        1. Anonymous Custard
          Boffin

          Ah, Schroedinger's Sysadmin...

          1. Hollerith 1

            Can't be: once he's seen, he can be in only one place.

      2. Fatman

        Focusing Manglement

        My usual answer to such is "Do you want me to talk about fixing it, or do you want me to actually just fix it and we can then discuss it later?" I usually find that focuses the mind, even of managers.

        Sorry, but I don't agree with you.

        The only way to focus a mangler in such situations is the judicious use of a clue-by-four to the head of such mangler. (Or if your name is Simon, you might use a cattleprod, instead.)

        Hopefully, with them reeling from that assault, you have time to 'fix things'. Then, you may have to get your resume in order, unless there is a mangler higher up in the corporate org chart that gets it and sends the inane mangler packing.

        I am lucky to have a boss that knows that when the shit hits the fan, she does her best to stay out of our hair until things are fixed. Then comes the debriefing, and finger pointing.

        1. tfewster
          Thumb Up

          Re: Focusing Manglement

          Even better would be a boss that takes responsibility for deciding priorities, shields you from senior manglement and the phones, and gets the resources you need (like caffeine, nicotine, pizza-ine and electricians). In return for which you give him/her 110% and regular status updates so she/he can prove that it's getting results.

    3. Andy A
      Facepalm

      ...and each meeting is scheduled to take 90 minutes.

  3. Anonymous Coward
    Anonymous Coward

    "Murphy's law is typically stated as "anything that can go wrong, will go wrong.""

    "...at the worst possible moment".

    As many of the examples illustrate.

    1. Anonymous Custard
      Headmaster

      And not forgetting the extension to Murphy's Law - "If it cannot go wrong, it will probably still go wrong just out of spite and to make you look like an idiot for not being paranoid enough with your disaster prediction and understanding..."

  4. Anonymous Coward
    Anonymous Coward

    I wonder how many failures have been due to SPOFs that are totally beyond the firm's control, like some critical exchange or trunk like some distance away where no alternatives are physically available?

    1. Anonymous Coward
      Anonymous Coward

      SPOF beyond firm's control

      Try, in this order..

      Sustained generator test friday.

      Back to mains.

      Diesel delivery does not turn up friday.

      Copper thieves trash the local substation at weekend.

      Generators run out of fuel on sunday.

      Even this was avoidable but manglement never thought to progress the missing delivery across to anybody important. Stuff could have been shut down.

      Result. Gunged up generators. They didn't quite run out of fuel, just sucked in all the crud at the bottom of the tank. UPS blew because it was constantly cranking over the generators whilst supplying server room only to have them stop a few seconds later due to gunge. Repeat until batteries go bang. Power consequently did not go gracefully. Fluctuating brownouts to server room.

      Manglement wrote this off as unforseeable.

  5. K

    Am I the only one

    who has a problem with people broadcasting this stuff?

    <rant>

    This might stem from social insecurities, or the fact that I've had team members trying to take disputes and grievances into the twattersphere. But I've always believed that its the employers decision to communicate internal issues, not an employees.

    </rant>

    1. ilmari

      Re: Am I the only one

      Communicating internal issues should obviously stay internal.

      Socializing and talking about your day at work and how your boss must be reading Dilbert to be such s total copy of the PHB, is best not communicated inside a company, as I discovered.

      1. K

        Re: Am I the only one

        True.. no issues with somebody talking about their day, or rather I do, but since my g/f insists on it, I've no choice but too listen and have grown immune!

        Its more they disclose employers names, locations, details of events and at the same publishing photos of relatively sensitive locations.. I wonder how many of these people get dismissed under non-disclosure clauses in their contracts!

        meh.. I must be getting old.

  6. Anonymous Coward
    Anonymous Coward

    n00b?

    In the last video, the "network slayer" has no idea how to remove an LC cable fron an SFP..

    Surely people like that don't belong in data centres....

    1. Anonymous Coward
      Anonymous Coward

      Re: n00b?

      Same place as diesel above so anon again!

      Fibre SFP failed. No spares because manglement wanted to save costs (about £800 each at time). Critical system down. Many meetings between which we just said "fuck it" and nicked an SFP off a non critical system only to discover the switch had gone. Returned SFP whence it came.

      Manglement is now discussing moving to a DR situation. DR has no fibre just ethernet and yet again this is a friday. Popped to local computer shop & purchased 30m ethernet cable. Sprawled it across server room and we're running again albeit a bit slower, we expected.

      Turns out not to be the case. We were bypassed in spec'ing this system. It was vastly over-spec'd and ran fine under average load. So well in fact we were forbidden downtime when the new switch arrived and ended up taping the ethernet cable to the floor because manglement kept wandering into the server room unsupervised and tripping over the thing.

      It stayed like that for weeks.

  7. Anonymous Coward
    Anonymous Coward

    Does the author dislike McClain? ;)

    Couldn't help notice that his quote got used twice. Murphy, or just an attempt at adding a little filler? ;)

    1. Anonymous Coward
      Anonymous Coward

      Re: Does the author dislike McClain? ;)

      I think that is Murphy's Law in evidence - but for sub-editors. McClain is no longer doubled up.

  8. Anonymous Coward
    Anonymous Coward

    Generational shift?

    Wondering if this is the people born in the '90s entering the workplace.... waaah, waaah, why it no workey instead of taking the time to understand how the effing infrastructure your shitty apps are running on, suckling on the teat of venture capital and overblown valuations, for reinventing the wheel yet again, ACTUALLY effing works.

    AC since I've just pulled a 24-hour shift fixing some of the aforementioned infrastructure somewhere in the world. And damnit I feel old having read that back to me. Gen X/Y cusper here at the end of the day if you like that sort of thing - my teat had 8 bits.

  9. Andy A

    The Official list of Murphy's Laws

    Murphy's First Law:-

    If anything can go wrong, it will.

    Murphy's Second Law:-

    If anything can't go wrong, it will.

    O'Toole's Comment:-

    Murphy was an optimist.

    1. Charles 9

      Re: The Official list of Murphy's Laws

      I wonder what would happen when a device is built that would require a violation of the laws of physics to fail catastrophically.

      1. Anonymous Coward
        Anonymous Coward

        Re: The Official list of Murphy's Laws

        Simple... one of two things:

        1: Murphy would change the laws of physics.

        2: It would fail non-catastrophically.

      2. swampdog

        Re: The Official list of Murphy's Laws

        The profit margin would dictate such a device would be constructed using ever cheaper labour and ever cheaper parts until it teeters into a quantum entanglement such that the non-desired state becomes inevitable.

        Btw. I know next to nothing about about such matters. I happened to rewatch Jim Al-Khalili's excellent documentary about the quantum robin last night & for a short while I thought I might just be able to grasp it. Fading now though!

  10. fajensen

    I really don't care much. I made like 15000 EUR "on call" one year *just* because the muppet java developers:

    a) think it is exceptionally clever to use MySQL instead of syslog for logging,

    b) think that "java.util.logging" still needs to log to a flat file in a place that syslog doesn't know about,

    c) think that MySQL replication == backup,

    d) believe that adding a new framework is always better than writing 100 lines of Java source code,

    e) leaving the old one(s) in place when adding a new framework (since no-one is there long enough to find a person who remembers what the old stuff is supposed to be doing),

    And dum-dum Managers, who couldn't manage to run a fast food joint into the ground:

    f) think it is "too expensive" to fix the code, mainly because "On-call time" is just Hours and all employee hours are Free, because if not used deleting files and chocking dead databases back to life the slobs would just do something else, not work-related with their time, thus wasting it.

    h) have KPI's tracking the extent that developer-hours are billed to customer accounts and linked to customer requirements (The "needle" is pinned at 100%, nobody questions "why").

    Anyway, this is what the other end of you "iDevice" looks like!

  11. Lee D Silver badge

    This story is very relevant

    I work in a school, I'm the IT Manager, my boss is the Bursar.

    My boss just came to me. We're going to be doing something in September that I've never done before.

    Planned, scheduled, disaster recovery scenario with NO warning for the other staff. Not just "Ah, yes, we can failover in this limited test plan that we know won't affect anything, before we've even fully-loaded the system" but instead "Let's just turn off the power to the stuff we use every day - EVERYTHING, during the middle of a full working day - and see what happens".

    He's going to walk up to the fuseboard, turn off half the place, and then see how IT / everyone else copes. Obviously I'm forewarned to make sure we don't do damage (but we should be ABLE to do damage and it not matter!), etc. but still - an exercise you don't often get to do on live systems.

    This is going to be interesting, and I don't mean that in a worried way but in a "Wow, I'm genuinely intrigued as to seeing how this will work out and what doesn't happen more than anything".

    I have complete faith in his intended demonstration (that we still have BUSINESS continuity, i.e. we can easily recover from the situation after the initial upheaval and maybe sending the kids home for a day or two at worst), but I also have faith that we stand a damn good chance - no such thing as a certainty - of having entire continuity too (i.e. apart from the lights going off in the rooms and alarms going off, people won't even notice any change in the way the IT operates).

    Of course, we've had power-outages and problems and failing hardware and all-sorts before that we've come back on so we have a hint that we'll do okay, but just the "Let's just do this quite badly and deliberately, without any kind of safety barrier or hesitation, just to make sure that the systems operate and, if they don't, the staff can cope with that".

    I'm not at all panicked. We've had similar things before with phase-crossing electrics and all sorts knocking out IT. But I am genuinely intrigued as to how it will pan out, and glad of the opportunity for not just an "internal IT test" but a proper, serious test of everything we make promises for. And I actually hope there's something (hopefully small and inconsequential) that we've missed that we can use to say "Ah, ha! Glad we did this! We should do this again next year", etc.

    1. Mark 85

      Re: This story is very relevant

      I like that idea and the support from the upper person. Where I'm working, all "crash" tests are carefully planned and set up such that nothing will go wrong (hopefully) and thus, no heads will roll. Management comes up with very specific guidelines and it seems that only "trusted and tested" systems are ever failed over in order to make nice Powerpoint slides showing success.

      Then when things do crash down for real and things really go hell, management points fingers and usually not at themselves for all the systems that have never been tested.

    2. keithpeter Silver badge
      Windows

      Re: This story is very relevant

      @Lee D

      Well, good luck with it.

      I'm assuming your simulation won't result in any pupils being sent home, as that would get the head sacked pretty sharpish. Remember we fine parents for keeping Jemima/Jeremy off school for a day without reason.

      You seem quite confident that none of your colleagues read this peerless journal.

    3. Sgt_Oddball

      Re: This story is very relevant

      My mother works in a school where the servers where on UPS's along with a few semi critical pc's (like the head teachers..) and local contractors managed to tear through the buried power cables outside... one big bang later and no UPS's... or servers as it turned out as the UPS's where massively underspecced for the job (yet of course cost 2 times as much as a better one I could have picked up online) and about £10,000 of computer equipment got toasted.

      Lots of shouting on that one...

      I've also had the pleasure of witnessing a fibre upgrade (my old workplace had 4 fibre cables into the premises for just this sort of switch over) and lots of head scratching as to why it wasn't working (turns out they hadn't turned on the switch at the exchange end. Fancy that...).

      And don't even get me started on dealing with the IT support contractors at the place I work at now. Getting a technical answer out of them that isn't condescending, wrong or avoiding answering the bloody question is like getting blood from a stone (or data... both applies I suppose).

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like