back to article Sysadmin left finger on power button for an hour to avert SAP outage

Welcome to the seventh instalment of Who, me? The Register's new column in which readers share stories of the times they broke stuff without any help at all from users. This week, meet "Jeremy" who back in 1999 scored his first "real" IT job "as part of a team sent out to run the IT at a big publisher." Said team was working …

    1. Uplink

      Re: Typed 'Reboot' where ... ?

      apt-get install molly-guard

      Then you get asked: "you want to reboot what?"

      1. keithzg

        Re: Typed 'Reboot' where ... ?

        molly-guard has definitely saved me more than once. I don't have it installed on *all* the servers, but I sure as hell do on the servers where it would matter . . .

        (That being said, if things are fragile enough that a clean reboot is a big problem, things are probably too fragile.)

    2. phuzz Silver badge
      Facepalm

      Re: Typed 'Reboot' where ... ?

      I have to admit to this one as well.

      Now I check very carefully which machine the prompt is for.

      (I've also be auto-logged out of a machine, and nearly run the command on the machine I was tunnelling from instead.)

      1. Boothy

        Re: Typed 'Reboot' where ... ?

        I used to use PuTTY a lot on Windows, into *NIX boxes, back then we had direct access to boxes. So we just set up each environment with it's own custom colour and Widows title settings. Green, you're on a Dev box, Red - prod, etc. Nice and easy.

        These days we have to go via jump boxes, and are usually on Linux laptops. So it's all basically one shell, same colour for text etc. But someone did tweak all the Red Hat boxes, so if in a live environment, the user and server name all have a red background colour at the prompt. (The name@server: bit).

        Still doesn't help if you're on the wrong prod box, but at least you are less likely to run something not prod friendly.

    3. Anonymous Coward
      Anonymous Coward

      Re: Typed 'Reboot' where ... ?

      Yeah, I feel your pain on that one.

      I once managed to balls-up a firewall rules tweak on a Linux machine in our new branch office in Sydney, accidentally removing a critical rule and therefore cutting on my own comms from said machine. Muppet.

      Had to wait for local staff to arrive in the office and talk them through restoring remote access via the console.

    4. GrumpenKraut
      Facepalm

      Re: Typed 'Reboot' where ... ?

      Been there, done that, sadly a "power off" in my case. But a machine I could (and had to) drive to. A Friday evening spoiled.

      Since that day remote access terminals have color background.

      1. Stevie

        Re: color background

        Agree. I have profiles for my terminal software for about a dozen or so foreground/background combinations. If I have something that cannot be the subject of a mistake, it goes in the white on red window.

        The youngsters in my office laugh at this and would rather use other (unapproved) software that either doesn't offer a way to store multiple profiles easily or that they can't be bothered to learn how to use properly.

        One of the bright young things, working in a forest of white on black consoles, restored pages from a test database over our production database and caused a complicated partial outage that lasted a week while we sorted it all out.

        Another Young Genius obliterated a QA cluster under the impression he was working on a dev system.

        Yep. The problem is The Old Guy doesn't "get it".

    5. regadpellagru

      Re: Typed 'Reboot' where ... ?

      "Telnetted into various Unix machines, wanted to restart the one in the server room. Whoops - I forgot which machine I was logged into and typed 'reboot' to a machine on the other side of the planet. It did not come up, had to wait until teatime for the guys there to come in and push a button :-("

      Who hasn't done this one, I wonder. Happened to me as well: wanted to reboot my SUN workstation, so typed "reboot", then I had "end connection" on that very window ...

      Got me quite pale for a moment: I didn't know which system I so rebooted and I was logged to quite a lot !

      Then colleagues told me every workstation had frozen: I was logged to the NIS server, which, fortunately came back 30 s after ...

  1. chivo243 Silver badge

    Little fingers

    Once I was working (playing a game actually) in the office, and my boy (I think he was 4 at the time) comes rolling by, and and says what does this light do papa? as he's pushing the power button!! Needless to say, my gaming session was ended, and that PC never seemed right after that incident.

    1. Anonymous Coward Silver badge

      Re: Little fingers

      Similar, but a little girl and the office UPS.

      Suddenly everything went very quiet.

      1. Anonymous Coward
        Anonymous Coward

        Re: Little fingers

        I think I can top that one - at security in the airport waiting to get on a plane - my little one accidentally shut off the entire scanning line when he turned off a power bar - had to wait 15 minutes while everything rebooted and then they had to rescan everything....when I left homeland security was looking at the power bar and talking about how to prevent such an incidence again...

  2. &rew

    Fast fingers

    I recall for old PCs that used an actual mains voltage power button, if you pressed the power button in, and then really quickly popped the switch out and in again, there was enough smoothing in the power supply to cover the momentary blip. True, I would not be willing to attempt that on a company server, though...

    1. Oz

      Re: Fast fingers

      I have saved myself from a thorough dressing down doing just that back in the mid 90s. I held the power button down on a server to force a power off, realised it was the wrong server, thankfully before letting go again and, after several minutes of holding the button in and deliberating, was able to release and re-press the button before the power dropped out.

    2. Alan Brown Silver badge

      Re: Fast fingers

      "there was enough smoothing in the power supply to cover the momentary blip"

      On home PCs yes. Not so much on swervers.

      Of course if it was critical it would have had redundant PSUs that had to be individually switched off.

  3. Tim99 Silver badge
    Facepalm

    I guess that beats my post

    From last month: my idiocy was only going to trash my work...

  4. Remy Redert

    Ever since I had a cat induced computer outage when one jumped onto the case and sat on the power button, I've taken to the simple expedient of not connecting any of the buttons on the case, setting the machine to start when the power comes on. The big switch for the power bar is much less sensitive to cat induced failures.

    On a related note, which idiot of a designer decided that buttons should be put on the top of the case, where they're hard to reach if the case is in any kind of enclosure and easy to set off accidentally if they're not?

    1. DuchessofDukeStreet

      Which Idiot of Designer?

      The one who recognised that most office users would end up with a large box sitting beside their legs under their desk - buttons on top are the most accessible from a seated position (assuming you're talking about a vertical unit).

      For horizontal ones, on top still makes sense as it prevents them being knocked accidentally for objects being pushed around the desk surface.

      But also one who doesn't own/is owned by a cat, and doesn't recognise their tendency to jump onto any available (and inconvenient) surface, particularly one that's radiating heat.

      1. graeme leggett Silver badge

        Re: Which Idiot of Designer?

        I have exactly that sort of machine (a Dell OptiPlex "designed" for office use) sat beside me and occasionally I nudge the power button with my knee. Fortunately this is set to initiate a hibernation rather than shutdown.

        I have experimented with putting some of those flippy lid button covers over the switch - held on with double sided tape due to location at the top corner of the front bezel. Short of dismantling the front and getting busy with glue and screws the fix is far from permanent.

        1. d3vy

          Re: Which Idiot of Designer?

          "I have experimented with putting some of those flippy lid button covers over the switch - held on with double sided tape due to location at the top corner of the front bezel. Short of dismantling the front and getting busy with glue and screws the fix is far from permanent"

          Pull the side off and disconnect the button completely.

          Then either buy a replacement button that can be positioned at the back of the PC or set the machine to wake on keyboard so you no longer need a physical button on the case.

          I have mine set to boot on power resume and everything on the desk is plugged into a 5 way surge protector so that when I flick the mains switch everything comes on at once.

        2. Alan Brown Silver badge

          Re: Which Idiot of Designer?

          "Fortunately this is set to initiate a hibernation rather than shutdown."

          Assuming Windows, go into the power settings and change "when the power button is pressed" from whatever it's set to, to "ASK"

          It's not that difficult really - and there are similar settings in most *nixes (even if you have a CLI-only system)

          It won't help you if you have an old style single PSU server with a real power switch on the front, but the "switch" on ATX systems is merely an input device and you can change its functionality.

          Just don't do what someone I know did and swap "power" (big button) with "reset" (needed a pencil to press). Reset means RESET and having a wayward cat hit it is more of a problem than having the power go off.

      2. Prst. V.Jeltz Silver badge

        Re: Which Idiot of Designer?

        The one who recognised that most office users would end up with a large box sitting beside their legs under their desk - buttons on top are the most accessible from a seated position (assuming you're talking about a vertical unit).

        For horizontal ones, on top still makes sense as it prevents them being knocked accidentally for objects being pushed around the desk surface.

        Nah , sorry but all of that is bullshit . Buttons go on the front of things - end of . A user with a tower box under there desk will of course instictively look on the front of because - thats where buttons go . Yes , it may be *physically* easier to put it on the top , but its still bloody stupid. cos: cant put anything on top of it , what if theres a shelf above it . people dont look there, just as easy to accidentally push, etc , ad infintum. (this is why top loading VCRs died out? )

        Your middle paragrah makes little grammatical sense but I think the gist of what you were getting at is covered above.

        Your 3rd paragraph is of course correct, cats will sit on warm things , they will also jump on the desk itself and get between you and your game of Farcry in an effort to get fed. The more fiendish ones will do this by standing on F5, which you have assigned to "Load last saved game" :(

        1. Prst. V.Jeltz Silver badge

          Re: Which Idiot of Designer?

          The power button on my home box , apart from being in prime position get get toed when resting foot on the shelf its on , has become a bit sticky and will tend to stick in when used which causes a kind of hernia / stroke in the BIOS . It takes a skilled touch to use it now - im not looking forward to having to explain that to someone over the phone in some sort of emergency .

    2. John Stirling

      @automatic power on

      ...I've taken to the simple expedient of not connecting any of the buttons on the case, setting the machine to start when the power comes on. The big switch for the power bar is much less sensitive to cat induced failures....

      I used to do that, until the local power company decided to have an outage, which came back on 1 minute and 45 seconds later, and then went off again at the 2 minute mark, before repeating. For 26 hours over the weekend.

      Which taught me a couple of things;

      1) think hard before enabling auto on after power outage;

      2) always use UPS on anything you care about.

      3) Fridges also benefit from UPS.

      Surprisingly a large percentage of the dozen of so PCs survived that little incident, although a number did not - and the Fridge needed a new motherboard!

      1. Alan Brown Silver badge

        Re: @automatic power on

        "Which taught me a couple of things;"

        Due to many such episodes, $orkplace has a trips on all the server room power to ensure that if the power goes off, it STAYS off until manually reset. There are similar setups on all the AC systems. You have to manually power up.

        In the old days I would have put any critical (must be up) systems on a startup timer of 5 minutes or so to ensure the power was stable before booting (that includes UPS inputs, I've seen a couple fried by dirty power when it was restored)

        Whilst you can do this using bios delay timers it's not ideal in a lot of cases (drives don't like being spun up/down repeatedly) and there are smart distribution panel controllers around these days which take it a few steps further, with things like a selectable startup delay coupled with longer lockouts if they detect several power failures in a row.

  5. Bob Wheeler
    Facepalm

    Repetitive work on multiple servers

    I was working on a 16-node Novell Cluster, updating drivers. A process that had been done many times and non invasive and with no loss of service so deemed by management as safe to do in working hours.

    The process was simple, take a node out of the cluster - “CLUSTER LEAVE”, copy the new device drivers and then reboot that node - “SERVER DOWN”, wait for it to start up and re-join the cluster, and move onto the next node.

    By about the 14th or 15th node, after typing the same commands time after time, instead of typing “CLUSTER LEAVE” to take the node out of the cluster, I typed “CLUSTER DOWN”.

    It should be noted that Novell does NOT ask “Are you sure?” when you type such a command, and it does what the command suggests it does – instantly. All users, potentially some 4,500 of them suddenly lost their file shares, email, printing, internet access – the works.

    My only saving grace was it was late afternoon on a Friday so there was not that many users actually affected.

    1. Anonymous Coward
      Anonymous Coward

      Re: Repetitive work on multiple servers

      wow , that would have qualified for a "who me?" article ! I think i read they are running short - send em in folks!

    2. Alan Brown Silver badge

      Re: Repetitive work on multiple servers

      "All users, potentially some 4,500 of them suddenly lost their file shares, email, printing, internet access – the works. My only saving grace was it was late afternoon on a Friday so there was not that many users actually affected."

      We have a policy of warning users when work is happening. They're a lot more forgiving if they've been given a heads-up

  6. JeffyPoooh
    Pint

    What about Power Failures?

    "UPS" you scream.

    No, I'm referring to the power failure caused by the UPS catching fire, ...again.

    A well designed database would have journaling at the transaction later, and more journaling again at the FS level. Oh, sorry. SAP.

    My buddy runs the IT for a company. He tells me that the server can have its power cord yanked out, and the backup server in his basement at home will complete the transactions, transparent to the users. They run in parallel and his done something clever at the networking level.

    1. Yet Another Anonymous coward Silver badge

      Re: What about Power Failures?

      Compaq used to run an ad 20+ years ago of a cluster when you destroy one server (shotgun, drop a safe on it, wrecking ball etc) and the system keeps goings

      1. Anonymous Coward
        Anonymous Coward

        Re: What about Power Failures?

        cluster when you destroy one server ... and the system keeps goings

        HP Non-Stop. Check out the price, then come back after you've recovered.

        BTW us in telecoms have had active / active standby for a very long time, its how we roll.

        Upgrades? No problem, upgrade "non-live", flip, upgrade old live.

        100% of calls and systems still live.

        1. imanidiot Silver badge

          Re: What about Power Failures?

          And then comes a Who, Me? with the basic storyline of

          Upgrades? No problem, Upgrade "non-live", goes wrong but don't notice, flip, upgrade old live, goes wrong and all hell breaks loose...

        2. Alan Brown Silver badge

          Re: What about Power Failures?

          "BTW us in telecoms have had active / active standby for a very long time, its how we roll"

          Which works really well, until it doesn't.

          At which point you may discover that whilst the running systems were ok, what's in the configuration (and has been backed up to tape for the last 2 years) is scrambled. So if you reboot one controller after the other when applying your y2k fixes, you find your NEAX-61E has forgotten that it's a telephone exchange - and that after spending 2 days finding a working backup (3 years old), you then have to replay every update made from that point - which takes 6 weeks - and means that a large number of your customers can't be sure from day to day what their phone number might be - or even if they'll have dialtone.

          Yes, it happened.

  7. OzBob

    Came close myself just today

    What bright spark decided to allow keypresses on VSphere Client to perform menu actions? So if you don't properly focus on the console, you can type away and get prompted for "do you want to shutdown"? Fortunately I looked up and saw that before I got too far, but it was close.

  8. ysgubor anhysbys

    database reboot

    Our sys admin was doing some maintenance on a replicated database, he had stopped the slave and made the necessary changes and then hit the power button to do a hard reset... unfortunately, the power button belonged to a different server - the live database master. Some how we got lucky and our 3TB of data survived.

  9. Anonymous Coward
    Anonymous Coward

    Probably my fault for being unclear

    I used to manage "the UK's Most Dubious Beowulf Cluster", 80-some Pentium 4s running a scheduling job one each of them that waited for a text file to tell them what simulations to run. Not the world's most brilliant solution (especially since they used a regular user's account), but it worked well enough.

    One day, I was having trouble with my email, probably because Outlook Exchange was a delight back in the day, and our Scottish helpdesk were very helpful, doing all the things they needed to do to fix it until, without warning, they said "Right, your new password is...".

    While I was logged in to 80-odd Pentium 4s that suddenly had outdated credentials and thus, no LAN access. Cue me and a room full of KVM switches, re-logging dozens of machines and restarting failed simulations.

    On the plus side they did fix my email.

    1. Korev Silver badge

      Re: Probably my fault for being unclear

      Which uni was this?

    2. HPCJohn

      Re: Probably my fault for being unclear

      Talking about Beowulf clusters.... A several of years ago I was at a customer site in a big UK company which may or may not build jet engines.

      Stood at the console of said machine, I wanted to reboot one of the servers in the cluster. I was telnetted into one of the servers in the cluster and wanted to reboot it. I go ahead and press the Vulcan Death Grip - ctrl-alt-del. Only the whole shooting match went down, not the server I was logged into. Cue red face from me. But they were very good about it.

  10. Greg Stovall

    Silence is NOT golden...

    Back in the 80s, I was on a coop term at a major telecommunications manufacturer. My assignment for the summer was to port a wire wrapping program from an DG Eclipse to an HP 3000. It was a very enjoyable exercise writing a converter from RATFOR to Fortran 77.

    The factory floor was quite a noisy place with all the manufacturing equipment. Since I was new to the HP 3000, I spent a little time exploring. Discovered that as administrator, I could actually poke any memory location directly. I experimented with this...then noticed it was quiet --- too quiet. Panic filled my soul when I realized that the HP3000 I was poking on was the same one that ran all the manufacturing equipment -- and I had crashed it in the middle of the work day.

    I learned NOT TO POKE memory on the HP 3000...

  11. Anonymous Custard
    Mushroom

    The hardware version...

    I take your server shutdowns and offer you a colleague doing it on a semiconductor manufacturing machine (of course in the middle of running 150 production wafers). Needed to power down machine A in a bank of them to work on it, so goes around the back and accidentally hits the power button on machine b beside it. Bye-bye 150 product wafers towards the end of their production flow, in all worth a many thousands of dollars.

    We are now strictly verboten from even touching any machine which doesn't have clear ID labelling (customer responsibility to add those, the ones above didn't) and even then we have to point and say plus buddy-check. This is not to say that it hasn't happened since these measures were introduced of course, given some of my colleagues and the old adage about idiot-proofing...

    1. Anonymous Coward
      Anonymous Coward

      Re: The hardware version...

      I know that feeling...

      I work for the supplier to a semicon lithography systems maker. They use 4 number (hex) machine identifiers. They're not sequential but can be quite similar and a typo is easy to make when remotely accessing into a system. I may or may not have shut down the wrong system for service at some point... Luckily this was at the manufacturers fab and not a field system though. Working on field systems always makes me nervous given the dollar amounts involved.

      From experience it's also not easy to explain to a customers line manager at 9pm that you broke his system some more instead of fixing it like you were supposed to.

    2. HPCJohn

      Re: The hardware version...

      Do you work at ASML?

      1. Anonymous Coward
        Anonymous Coward

        Re: The hardware version...

        Not ASML itself, we supply several of the important sub-modules for several generations of systems. Including the new EUV systems. Pretty interesting stuff.

  12. Rufus McDufus

    Did this myself

    DEC AlphaServer, also around 1999, working for well-known internet-based retailer. Went to power cycle some server, accidentally pressed power button on adjacent server (probably a rather critical NFS server). Boss came in after 10 minutes and laughed at me stuck there.

    1. Anonymous Coward
      Anonymous Coward

      And me!

      Dec Alpha workstation, while visiting a physics department in Oregon in the early 90's. Having finished up late while running some simulations, I confidently reach down to the power bar and remove the power brick for my portable CD player .... and the workstation suddenly goes off.

      I'd also nudged the adjacent switch at the same time. Oops.

  13. Rufus McDufus

    Emergency power off

    First job working in the comp sci department at a well-known technology-focused university in London. Annually we'd show prospective students around the facilities including the server rooms. There were big red emergency power-off buttons in various places. A particularly tall budding student decides to lean back against the wall and... These were the days of IBM 4331s, various DEC servers, a big ICL mainframe and others. Generally things didn't tend to work well after a sudden power-off.

  14. Anonymous Coward
    Anonymous Coward

    April 01

    haha

  15. Anonymous Coward
    Anonymous Coward

    Toggle power switch

    This story is strikingly similar to an anecdote from colleagues at a previous job, when one of the ops guys went to power off a server and was informed as he pressed the switch that it was the wrong machine. Although in roughly the same time period, I'm certain its not the same incident because out site never ran SAP.

    The box in question was running end-of-day batch processing, so could not be allowed to power off otherwise carnage would be caused.

    Unfortunately the recessed nature of the switch meant that nothing could be jammed in to replace his finger without also releasing the button at the same time - so he was forced to stand there in the comms room for the next two hours or so, with the end of his finger going blue, waiting for the batches to finish so that the machine could be gracefully shut down.

    In a separate incident, another colleague at the same site had apparently stepped into a hole in the floor of the comms room where a tile had been removed ('elfin safety??) - reached out instinctively to stop his fall, but found he'd hit the emergency power-off button on the side of the AS/400 ... oops!

  16. AndersBreiner

    The Big Red Button

    I was once working on a web application, back in the .com boom. We had a production server which was heinously unstable. We'd test on our dev server for a week and then send stuff over to the production one. The production one was administered by another company and we'd call them and tell them how to do stuff. Either this was before the days of VPN or they didn't want to allow that, for reasons that will become clear.

    Anyhow I was in an interminable call with them.

    "Ok, you've got the files unzipped"

    "Yep"

    "Right click on the .reg file and add it to the registry"

    "It crashed"

    "What do you mean crashed? Did you add it?"

    "No it crashed when I right clicked"

    "How did it crash?"

    "It said explorer.exe performed an illegal operation"

    "Well that's odd, isn't it. Try to open up this folder"

    "It crashed again"

    "Ok press the Windows keys and R and type"

    "Crashed again"

    "Let's try a restart"

    At this point I hear a load clunk, then a pause then another loud clunk

    "What was that?"

    "We're restarting"

    "Don't you do that through the start menu?"

    "No, it always hangs when we do that, do we just use the big red button"

    And then I worked out why nothing new ever working in production and why things that used to work stopped - the server was so addled at this point that it couldn't reboot without someone power cycling it. It'd probably gone through hundreds or thousands of hard power cycles. This was NT so it was somewhat robust but you did lose data on a hard crash - any files that were open for writing would be corrupted and sooner or later you corrupted something vital.

  17. Joseph Haig

    Going Dutch?

    Is this what the Little Dutch Boy is doing now?

    1. ssharwood

      Re: Going Dutch?

      Yeah I did think of that but couldn't remember which dike would get me in trouble

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like