back to article Y2K, Windows NT4 Server and Notes. It's a 1990s Who, Me? special

The weekend is over, but for some, the knuckle-chewing over a decades-old event still goes on. Welcome to Who, Me?, The Register's confessional for misdeeds and mishaps in the IT world. Today's tale comes from a reader we shall call "Ivan", who told us of the time he learnt an important lesson regarding the use of CD-ROM trays …

  1. Pascal Monett Silver badge

    Shutting down the wrong server

    In those days, he was a bit lucky that everything came back online correctly.

    On the other hand, having a sticker with the name of the server would have helped as well. Hindsight and all that.

    Of course, in today's virtualized server centers, the sticker could name quite a lot of servers, but then you can put the name of the physical machine.

    In any case, stickers with names on it. It's the only way to maximize the chances of avoiding such errors.

    1. A Non e-mouse Silver badge

      Re: Shutting down the wrong server

      Stickers on the front AND the back of the server. When you're around the back of a server rack, all systems look very similar..

      1. Shadow Systems

        Re: Shutting down the wrong server

        I'll second that tactic.

        Dynamo label with the server name, one front, one back, plus one *inside* the case where you couldn't miss it.

        Yes it's a PITA to create all those labels, but you'll thank your foresight when some numpty starts to futz with the wrong box...

        Especially if the numpty is _you_.

        =-j

        1. big_D Silver badge

          Re: Shutting down the wrong server

          We do this for every device we send out, servers, SANs, tape drives, NAS, PCs, laptops, monitors, printers, smartphones. They also get the asset tracking number and the Redmine ticket number with the devices configuration information.

          Those that have a static IP-address also get a label with that.

      2. chivo243 Silver badge

        Re: Shutting down the wrong server

        And a drawing of all the servers running any VM's somewhere nearby. Front and back have server name, services and ip address, all cables to nics labeled at both ends...

    2. Dave K

      Re: Shutting down the wrong server

      Only potential issue with stickers on the front are with PowerEdge servers (and some others for that matter) where the whole front bezel clips on/off before you can access the drives, buttons etc. So do you unclip the bezel and stick the label onto the chassis of the server? This does mean you know you've got the right one, but have to remove the the bezel from a server to see the name - annoying when you've got 50 identical servers! Or do you stick the name onto the bezel? Much more visible, but if someone removes two identical bezels from servers and accidentally clips them back on the wrong way around? Oooh, that can have fun consequences!

      1. A Non e-mouse Silver badge

        Re: Shutting down the wrong server

        With those types of servers, there's usually a little pull out tab with the server's serial number on it. (I've seen this on both Dell & HP) I've put the server name there.

      2. Alister

        Re: Shutting down the wrong server

        For all of our Power-Edges, we put a sticker on the front of the chassis, AND a sticker on the bezel, after having the very problem mentioned, where the bezels of two machines got swapped by accident.

        1. Chris G

          Re: Shutting down the wrong server

          I have always labeled and numbered everything, particularly as stuff is being taken down. At one job a guy asked me if I thought everyone was stupid, my reply was ' no, it's to prevent me or anyone else from being stupid'.

          1. Doctor Syntax Silver badge

            Re: Shutting down the wrong server

            Best reply would have been "Not everyone" said very meaningfully. After all, someone was.

          2. Anonymous Coward
            Anonymous Coward

            Re: Shutting down the wrong server

            "I have always labeled and numbered everything, particularly as stuff is being taken down. At one job a guy asked me if I thought everyone was stupid, my reply was ' no, it's to prevent me or anyone else from being stupid'."

            Sure, and I'm doing the same, but you and me are a huge minority. Most people just don't care because it's not gonna impact their job from the next months. They know what is what, and really don't care about the next guy getting the job and falling in the trap.

            I think most of my job, is to deter such traps, nowadays ...

          3. donk1

            Re: Shutting down the wrong server

            Only had to do this once...my reply (shouted across machine room)..."Who was the last person to make X mistake"...turns to the complainer..."ooh, someone now in your team!"...nuff said!

      3. Trygve Henriksen

        Re: Shutting down the wrong server

        I put the label on top of the server/router/whatever.

        Just print the text double, with a long leader, then fold the tape between the names, and use the 'leader' part to fasten it to the top.

        If the server or whatever it is has a removable lid(as in 'can be removed while server is still racked'), I remove the lid and find a place inside to stick the leader, so that the label ends up in the gap.

        But I NEVER switch off a server in the rack by using the button if I can avoid it. That's what Shutdown commands or ILO/iDrac is for.

        1. Byron "Jito463"

          Re: Shutting down the wrong server

          "But I NEVER switch off a server in the rack by using the button if I can avoid it. That's what Shutdown commands or ILO/iDrac is for."

          In those days, NT (and 9x) would shut down the system, but not power it off. The power button still had to be pressed manually.

          1. H.Winter

            Re: "The power button still had to be pressed manually."

            "It's now safe to turn off your computer"

          2. Trygve Henriksen

            Re: Shutting down the wrong server

            Not with ILO.

            Even the first gen RIB (before RILOE I and II which came before ILO) did a proper power-off.

            1. Michael Wojcik Silver badge

              Re: Shutting down the wrong server

              Even the first gen RIB (before RILOE I and II which came before ILO) did a proper power-off.

              RIB and RILOE were proprietary Compaq (and later HP) technologies. iLO is proprietary HPE. Not all NT4 servers were made by Compaq.

              iDRAC is proprietary Dell, and I don't believe it was available in the days of NT4.

              So what's your point?

              1. Trygve Henriksen

                Re: Shutting down the wrong server

                My point is that SOME servers had the tech back then, and anyone managing a large server room or a remote site that didn't buy servers with this kind of tech really, really got what they deserved.

      4. Anonymous Custard
        Headmaster

        Re: Shutting down the wrong server

        Sticker both inside and out. You then have the added benefit of being able to check the bezel has been clipped back to the right server as well...

        If in doubt label it, and if it ain't labelled, don't touch it.

      5. SImon Hobson Bronze badge

        Re: Shutting down the wrong server

        Ah yes, the removable fronts. As said by others, sticker on both the cover and the front of the server.

        BUT, there's another problem I've found with the likes of PowerEdge - though it probably applies to a lot of modern machine now sizes seem to keep shrinking ... A lot of them now don't have enough room to put a sticker on !

        At one time, you could rely on the CD/DVD tray, but many of those have gone. So you've a machine where the whole of the front is occupied by removable drive trays, buttons & displays/indicators, and ventilation holes that you really don't want to block when there's only 1U of front to start with. I've ended up with some having to use a smaller print, and cut the label down in width.

        Also, label the network connections ! Really helps when some numpty decides that it doesn't matter which cable goes where and connects systems to the wrong networks.

        Lastly, PowerEdge have that handy status LED - press the ID button and it flashes front and back. I also labelled the racks - you can buy ready made stickers that will identify the U positions so as a last resort you can check the position front and back (yes, number them front and back) before yanking the power.

        Oh yes, if you are on the helldesk and have to talk a customer through "fat fingering" a dead server ... Don't forget that to the average user on site, the UPS in the bottom of the rack looks very much like "the bottom server in the rack" that you told them to power off !

        1. Olivier2553

          Re: Shutting down the wrong server

          But PowerEdge, at least, have a digital display on the front, spend the 3 minutes it takes to set-up the server name and you have the name encoded in the server itself.

          And network cables? I have them labeled in bulk when we receive a new order (at a time it did matter, I also had a convention to label crossed cables). This is a real PITA, but I delegate that to some PFY :) And so I know I can always check the other end of a cable. When we had to do a major network rework after 2011 flood, having all Eth cable marked at both end before delivery was part of the requirements for the networking company (and they did deliver something like 6000 patch cords).

          1. SImon Hobson Bronze badge

            Re: Shutting down the wrong server

            I introduced a colour code for network cables - sadly I wasn't permitted to educate others on the significance by just removing any "wrong coloured" cable without warning or notice :-(

            But that doesn't help when there's a number of "RJ45" sockets on the back of a devices (some of which might not be network ports) and someone doesn't make a note of what went where before unplugging them to remove a server from the rack. So as well as labelling the devices 9servers and everything else), I labelled all the network ports - including those not in use.

            There was a case a few years ago now where someone unplugged some cables from a switch in an internet exchange. They then compounded the problem by realising they shouldn't have been unplugged and plugged them back in again - but not into the right ports. That took out a few service provider links in a manner that wasn't as obvious as if they'd just left the cables unplugged !

      6. Anonymous Coward
        Anonymous Coward

        Re: Shutting down the wrong server

        if someone removes two identical bezels from servers and accidentally clips them back on the wrong way around

        That is not confined to servers. Imagine a panel with several rows of big circuit breakers, all identified by a small tally plate on the removable (and other than the tally plate, identical) doors. I think you can see where this is headed.

        I may or may not have been on board a naval vessel under construction, I couldn't possibly comment, hence being an AC ;-) Someone is told to go and close a breaker that connects a machine to the AC power bus. They do that, and it drops out immediately. Try again, same thing. Sneaky b'stard then gets someone else to try - thus being able to put the blame on someone else when he realises what's happened.

        Power has been applied to the wrong machine, without it's field being powered, and it's just vibrated the armature back and forth a bit until the breaker tripped. The excess current has burned pits into the sliprings - but luckily (if I recall the story correctly) not so deep that they couldn't re-face them in-situ. Had the pitting been much worse then they might have had to cut a hole in the side of the vessel to remove the machine for remedial work.

      7. Anonymous Coward
        Anonymous Coward

        Re: Shutting down the wrong server

        Yes you've just reminded me of an incident where the bezels had been swapped over at some point in the past on a two host VMware cluster. Then one day host 1 was put into maintenance mode moving everything onto host 2. But host 2 got powered off as it had host 1's bezel on it. Caused a bit of mayhem for a bit :-)

      8. jelabarre59

        Re: Shutting down the wrong server

        Only potential issue with stickers on the front are with PowerEdge servers (and some others for that matter) where the whole front bezel clips on/off before you can access the drives, buttons etc. So do you unclip the bezel and stick the label onto the chassis of the server?

        We usually ran such boxes without the bezels.

      9. Mark 85

        Re: Shutting down the wrong server

        Or do you stick the name onto the bezel?

        How about "both" the bezel and case as well as the rear and inside. Can't be too careful which is often realized after someone shuts down the wrong boxen. Been there, done that, learned from it.

      10. rcxb Silver badge

        Re: Shutting down the wrong server

        The rack ears remain uncovered by the bezel, so label them there.

        If you aren't in a dense, shared-hosting environment, you really don't need the bezels at all.

        Those massive swiss-cheese bezel are a thing of the ancient past. They seriously restricted airflow and were named and shamed by Google as to why they build their own servers.

        Dell has been nice enough to provide an LCD on their servers for a couple decades now, which you can use to show the hostname, or any custom string.

        Still, with the XD servers, there is NO SPACE on the front of the server that isn't a hard drive sled, so you're back to labels on the ears. They even had to put the VGA and USB ports on the ears, they have zero other space to work with.

      11. chivo243 Silver badge

        Re: Shutting down the wrong server

        Poweredge servers? Aren't EOL by now? Dell is a four letter word in my shop, not my viewpoint, so, haven't used them in over 12 years...

      12. Olivier2553

        Re: Shutting down the wrong server

        As you need to remove the bezel to access the switch, there is no need to put the label on the bezel, put it where it belongs, just beside the fatal button.

      13. Criggie

        Re: Shutting down the wrong server

        Take the bezel off and store them in a box elsewhere.

        Label the bare case.

        Minimization of fuckup potential is always good.

    3. Anonymous Coward Silver badge
      Facepalm

      Re: Shutting down the wrong server

      That's great for pressing the button on the wrong hardware, but doesn't help if the KVM is on the wrong port.

      You can guess how I learned to double-check that one. Ejecting the CD drive is handy for confirming both aspects.

    4. askr

      Re: Shutting down the wrong server

      Stickers still might bite you.

      We had 2 racks - A and B, everything down to the UPS-es mirrored in both. Job was to halt one, move it to a different place in the server room, re-wire and connect it back up.

      Higher ups decided that we can get the cablework ready and move it under power (from UPS-es) instead of shutting everything down.

      So admins disconnected rack A from the cluster, put everything there in maintenance mode and we started pulling network cables... Chaos ensured - users lost access to DB's, everything went offline and so on.

      Why? Rack A in our server room was Rack B in the managament suite.

      1. Pascal Monett Silver badge

        Re: Stickers still might bite you

        Sorry, that's not stickers' fault. That's management's fault for putting its finger where it shouldn't and deciding things without being in tune with reality.

        1. Anonymous Custard
          Headmaster

          Re: Stickers still might bite you

          Come on, if they didn't they wouldn't be management...

    5. Peter Prof Fox

      Naming machines

      Stickers are a must, but naming is more subtle. As we're talking hardware which is the actual thing we're labelling we should be giving it a name like, say, a ship. Not the purpose like 'Foo server'. My system is interesting boy's names for computers and girl's names for peripherals. So for example when somebody a hundred miles away says the "invoice printer isn't working" you start by asking them for the printer name. That's a unique identifier assigned and controlled by you. Then it turns out it's a replacement printer for some hardwary reason and you're half way home. When dealing with servers frequently, you and colleagues get to know the quirks of 'Brock' and 'Samson'. Also there's a mental step you have to take before 'fixing a server' as you have to translate 'Constantine' into what he actually does.

      1. Stuart Castle Silver badge

        Re: Naming machines

        We have an easy, if a little boring, solution to naming user computers. Every computer we buy has an asset tag. The name is a two digital area code, followed by a minus, then a two digit department code, followed by a minus and finally the asset tag. Doesn't help if the department in question occupies more than one building, and doesn't help determine (say) the office number, but that *should* be in the inventory system.

        Not sure about servers, as I don't currently deal with the hardware side of that (I do deal with several servers, but nearly all of them are virtual).

      2. A Non e-mouse Silver badge

        Re: Naming machines

        Naming conventions are a minefield. I prefer to avoid clever or funny names and stick to something bland: HQ-Dell-VMWare-1

        Otherwise, when you write up an incident report, you're not going to sound professional when you say "yoda died"

        1. Rich 11

          Re: Naming machines

          "yoda died"

          FFS! Give us a spoiler alert next time.

          *sob*

          1. AceRimmer1980
            Coat

            Re: Naming machines

            Mail servers are traditionally called: Pat.

        2. Anonymous Coward
          Anonymous Coward

          Re: Naming machines

          An old employer used the naming convention 3 (DC) - Name - 2 (Node - optional)), and named database clusters after planets. We had two sites - STN and WAK, so we had STN-Jupiter(-01/2) and WAK-Neptune(-01/2).

          Unfortunately, management objected to the next name in the schedule for the new cluster we building - WAK-URANUS

          1. A.P. Veening Silver badge

            Re: Naming machines

            What happened to Saturn?

        3. hmv

          Re: Naming machines

          The downside of 'boring' names is that it's too easy to stick a digit on the end. And 'hq-sun-1' is just one typo away from the far more important 'hq-sun-2'. Whereas you have to make a lot of typos to accidentally shut down 'slartibartfast' instead of 'erik'.

    6. Anonymous Coward
      Anonymous Coward

      Re: Shutting down the wrong server

      We had a KVM that got very confused once, and linked the keyboard to all the servers at the same time.

      Our Junior typed in "halt", and all the machines that had the terminal open, at the root prompt, promptly shut down ...

      Not the Junior's fault, and it's never happened again, but definitely a weird one.

      1. Doctor Syntax Silver badge

        Re: Shutting down the wrong server

        "it's never happened again"

        Extreme percussive maintenance on the KVM to ensure that?

    7. Terry 6 Silver badge

      Re: Shutting down the wrong server

      I'm glad you said this. To this day I meet situations where a label on a box would save, at least time, and probably data loss.

      Even something as simple as boxes close to, but not actually on a desk. It's not the switching off that's always the biggest risk either. Hitting the power button on the machine on the desk to the left to turn it on then too late realising that it's the box on the floor, on the right that is the one needed ( that one's screen must have been off or something). And yes, I've done that. And no I didn't notice the tiny green light that showed the power was already on.

  2. Will Godfrey Silver badge
    Linux

    Insurance

    When about to shut stuff down, it's a good idea, just before actually pressing the button to go and make a cup of coffee - for two reasons.

    1/ The break gives your brain a chance to catch up with what you are actually doing.

    2/ If it goes titsup you'll need the caffeine boost!

  3. Mike007 Bronze badge

    "It is now safe to turn off your computer"

    Ahh, yes, the days before electronic devices knew how to turn themselves off...

    Which was followed by the brief era of only God's knowing how to hold a button.

  4. TonyJ

    Even to this day...

    In Windows I will drop to a command prompt and type:

    hostname

    And then manually shut it down after comfirming it's the machine I expect it to be with something like

    shutdown /s /t 0 /f (substuting various /s /r etc as required)

    1. defiler

      Re: Even to this day...

      Yeah. Problem is that in this case the shutdown command was fed into the right computer, but it didn't switch itself off. That really wasn't unusual back in them days. Even if you had a motherboard and PSU capable of giving the power-off command when the OS was halted, Windows NT4 needed a registry entry manually added to actually go and do it.

      Makes me glad for the iLO lamp - you can confirm the machine before you do something stupid and physical.

      1. Nick Kew

        Re: Even to this day...

        Back in them days?

        End of the 1990s, my recollection is that while advanced power management (like suspend and hibernate) were often dodgy, regular shutdown/halt/reboot were not a problem. Is this yet another classic Windows-ism?

        1. Stuart Castle Silver badge

          Re: Even to this day...

          I am going on memory here, so could be very wrong. However, I don't think rebooting was ever a problem with NT 4 (on the contrary, there were hundreds of things you could do to reboot the machine), but power management was, which meant it was sometimes a little bit of a hack to get the machine to turn itself off after the shutdown was completed.

        2. Alister

          Re: Even to this day...

          My recollection is that server hardware hadn't quite caught up with desktop hardware when it came to APM, and I remember having to manually turn off both Windows 2000 boxes and RedHat boxes after issuing the shutdown command, on various flavours of HP and Dell servers.

        3. rcxb Silver badge

          Re: Even to this day...

          End of the 1990s, my recollection is that while advanced power management (like suspend and hibernate) were often dodgy, regular shutdown/halt/reboot were not a problem. Is this yet another classic Windows-ism?

          NT4 was released back in 96, then patched for years and years after. It did NOT include the standard APM power-off in the base or any of the update. Windows 95 was just a hare ahead of it in that respect. There were 3rd party patches to give NT4 APM power-off capabilities, but that wasn't a common addition.

          You don't have to go back that far, either. I had a PC in the mid '00s that Linux decided couldn't be APM powered-off. A very minor nuisance... until the right confluence of other human errors conspires with.

        4. BinkyTheMagicPaperclip Silver badge

          Re: Even to this day...

          No, it's a hardware-ism and isn't limited to Windows.

          Early ACPI was pretty dodgy, and MPS era hardware didn't tend to switch itself off.

          I don't think much before a pentium 3 reliably switched itself off. I've certainly a pentium 2 box at home that needs a manual power off when halting OpenBSD.

      2. TonyJ

        Re: Even to this day...

        "...Yeah. Problem is that in this case the shutdown command was fed into the right computer, but it didn't switch itself off....

        Fair comment, well made! Between reading the article and responding to the comments, I forgot that part! Have an upvote

    2. Si 1

      Re: Even to this day...

      That's a good idea although I recently had a situation where users were receiving emails from a system I maintain and I could find no evidence that my system had sent it. After looking at the mail headers and finding the sending IP I discovered there was a duplicate of the live VM running right down to the same hostname! It was merrily pulling in data and sending out order updates all on an old copy of its database.

      I don't know who spun that server up or why, but you should always beware that some dodgy sysadmin hasn't cloned your test server from the live one and you're actually on the wrong server! ;)

      1. Alister

        Re: Even to this day...

        Yep, I've suffered from the same ghost machine problems, only not with a VM, we had a monitoring server which was retired in favour of a newer model with upgraded software, but was just turned off and left in the rack.

        We suffered a power glitch on the rack which caused the server to restart, although we didn't know it at the time, and we were getting false alert emails from we didn't know where, until we figured it out.

        1. It's just me

          Re: Even to this day...

          RE: ghost systems

          A few months back we had an AWS instance that experienced some hardware failure, was given the commands to shut down, and then restarted so aws brought it up on different hardware. But the old zombie instance kept running for a couple weeks with us having no way to access it, but it continued to send notices and warnings that took a while to track down as there was no trace of them on the supposed source machine.

    3. simonlb Silver badge

      Re: Even to this day...

      And it's even worse when you're connected remotely to an appliance and meant to type in the reboot command rather than shutdown...

      1. A Non e-mouse Silver badge
        Facepalm

        Re: Even to this day...

        And it's a weekend, no-one's in the remote office and that remote office is a five hour drive away.

        1. Nick Kew

          Re: Even to this day...

          Luxury! Your office was on the same continent.

          A stuff-of-nightmares scenario I've somehow managed to avoid (though I've been in worse places, like when the server disappears and it turns out the company hosting it went bust without telling us).

        2. Anonymous Custard
          Trollface

          Re: Even to this day...

          And it's a weekend, no-one's in the remote office and that remote office is a five hour drive away.

          5 hours? Try 5 timezones...

          1. el_oscuro
            Mushroom

            Re: Even to this day...

            5 timezones just happens to be difference between Canada and Scotland. I was remotely rebooting a database server in Hallifax which I thought was in Scotland. It was actually in Canada and I rebooted it right in the the middle of the day.

            1. Doctor Syntax Silver badge

              Re: Even to this day...

              I never heard of Hallifax and the one true Halifax isn't in Scotland.

              1. Anonymous Custard
                Trollface

                Re: Even to this day...

                Presumably meaning the one in Nova Scotia.

                So it's an understandable mistake, confusing the versioning of Scotlands...

                1. Alister

                  Re: Even to this day...

                  Presumably meaning the one in Nova Scotia.

                  or West Yorkshire?

              2. Nick Kew

                Re: Even to this day...

                It's a bank that went bust in 2008.

                HBOS - Halifax Bank of Scotland.

      2. SImon Hobson Bronze badge

        Re: Even to this day...

        Or you're remotely doing some networking, forget which router you are connected to, and change the address on the wrong bit of the network 8-O Luckily managed to avoid that myself, but had to avoid smirking as I watched the consultant setting up some new routers do exactly that. As it happens, he was able to change the Ip address at the local end of the link and reset it, and it was only 10 minutes drive away anyway - but still good to see others make the mistake so you can file it in your "things to avoid" list at the back of the mind. Learn from the mistakes of others, you won't live long enough to make them all yourself !

        1. Kiwi
          Pint

          Re: Even to this day...

          Luckily managed to avoid that myself, but had to avoid smirking as I watched the consultant setting up some new routers do exactly that.

          Be careful! Too many times I've boasted about not making a mistake, or teased someone about one they made, only to do so myself within a few days.

          Sods law 'n all that... :)

          1. The Quiet One

            Re: Even to this day...

            Getting harder and harder to totally lock yourself out of stuff these days.

            Our new SDWAN kit (Cisco Viptela) has a feature that after you apply a config change, if it does not get a confirmation back from you in a set time (say 3 minutes), it reverts back the change as it assumes you screwed up and lost access.

            1. SImon Hobson Bronze badge

              Re: Even to this day...

              Getting harder and harder to totally lock yourself out of stuff these days

              But when you are working on the oldest (ie, cheapest) ${DerogatoryTerm} your manglement and customers' manglement between them will pay for, then that's not much consolation.

              if it does not get a confirmation back from you in a set time (say 3 minutes), it reverts back the change as it assumes you screwed up and lost access

              You can do that on Cisco IOS with (IIRC) a "reload in nnn" command - if you do your changes and can still log in* then abort the reload, otherwise it'll reload and revert the config. Shorewall on GNU/Linux systems has a safe-restart command to achieve much the same thing.

              * With in-band management, I make a point of opening a new connection - just because you can still type in the same session window doesn't guarantee that you can open a new session, particularly when working with session aware firewalls. Guess how I learned about that :-(

        2. Down not across

          Re: Even to this day...

          Learn from the mistakes of others, you won't live long enough to make them all yourself !

          I've come across some people who were challenging that.

          1. Kiwi
            Pint

            Re: Even to this day...

            Learn from the mistakes of others, you won't live long enough to make them all yourself !

            I've come across some people who were challenging that.

            Sorry, I didn't know we'd met???

      3. defiler

        Re: Even to this day...

        I'll put my hand up and confess that I've locked myself out of a Cisco ASA and had to call the datacentre night staff one continent and five time zones over to pull the plug on it and start it again.

        Luckily it was the middle of the night. When they did that it only interrupted the offsite copies for the backups.

      4. Anonymous South African Coward Bronze badge
        Trollface

        Re: Even to this day...

        And it's even worse when you're connected remotely to an appliance and meant to type in the reboot command rather than shutdown...

        That happened to me, once. Was a fun time. <grins>

        I now put a wallpaper on the desktop telling me which PC it is (we also use virtual machines) and that helps a lot to ID the right server.

        1. Anonymous Coward
          Anonymous Coward

          Re: Even to this day...

          I now put a wallpaper on the desktop telling me which PC it is (we also use virtual machines) and that helps a lot to ID the right server.

          Yes, BGInfo desktop wallpaper has saved my arse more than once when logged on to a production machine when I thought I was logged on to its duplicate in pre-production, and planning to do a restart...

        2. Donn Bly

          Re: Even to this day...

          Agreed - and production machines get a red background embossed with the machine name while development and staging machines have different colors.

        3. John Brown (no body) Silver badge

          Re: Even to this day...

          "I now put a wallpaper on the desktop telling me which PC it is (we also use virtual machines) and that helps a lot to ID the right server."

          Or/and make sure the command prompt contains the hostname since not all servers have a GUI running.

    4. Anonymous Coward
      Anonymous Coward

      Re: Even to this day...

      May I suggest shutdown /s /t 30 /f is used? That way you can type shutdown -a to abort it if you realise something bad after you hit enter ! Also, don't type shutdown -h to get the help up as I found out once :-)

  5. big_D Silver badge

    DEC Engineer

    I've given this story here before...

    But we had a DEC engineer turn out to upgrade a VAX 11/780, one of about half a dozen in the computer room on that floor (there were two floors full of VAX hardware).

    He turned up, all the jobs and users were shunted across to the next machine in the line, the ops shut it down and the power off message appeared.

    The DEC engineer went behind the wall of hardware that was the VAX and threw the power switch on the wall... It became quieter.

    For a moment, as he re-appeared, the ops stared at him, stared at the console saying power off, stared at the engineer. Then the screaming started. From the next VAX in the line. Yes, he had thrown the wrong breaker switch and the VAX with the extra load and users had gone bye-byes.

    1. Korev Silver badge
      Coat

      Re: DEC Engineer

      Were they screaming because they couldn't clean the office properly?

      1. big_D Silver badge
        Coffee/keyboard

        Re: DEC Engineer

        Not enough coffee, it went whoosh for a couple of seconds, then I wished I still had my old orange and black VAX to clean my keyboard.

        1. Anonymous Custard
          Joke

          Re: DEC Engineer

          That just sucks...

  6. Fading
    Paris Hilton

    Why can't they....

    Come in a selection of colours so at least there is a visual hint between the back and the front of each one? Might jolly up the server room a bit as well....

    1. Doctor Syntax Silver badge

      Re: Why can't they....

      As someone said in a comment somewhere above - label both back and front.

      1. A K Stiles

        Re: Why can't they....

        and if you don't have labels, you could run a strip of colourful 'leccy tape from front to back across the top of the box, which would at least help identify the same box from both sides.

        Add a permanent marker and suddenly you do have labels after all!

        1. Olivier2553

          Re: Why can't they....

          Or a couple of spray paint cans of different colours :)

          1. A K Stiles

            Re: Why can't they....

            Mmmmm, Volatiles - delicious!

      2. Ken Shabby
        Angel

        Re: Why can't they....

        underwear from C&A comes pre labelled.

  7. ISYS

    Oh god the memory!

    I remember the day a third party company were engaged to audit the datacentre where are servers were. They were mainly HP DL variants and the serial number was on a sticker on the side of the chassis. So these guys were undoing the knurled locking nuts, sliding the server carefully out of the rack until they could read the sticker and log it and then sliding it back again.

    All was going well until they found a rack with lots of 'servers' in it that had no sticker on the side. That rack was an HP EVA SAN and they were sliding out the disk trays. The EVA did not respond well.

  8. Somone Unimportant

    notes, y2k and nt 4

    Potent mix that saw me get three hours at triple time on Saturday January 1st 2000.

    Donated it to charity but got the tax deducton.

    Them were the days...

  9. Digiwake

    IBM Professionals

    We had a similarly nasty experience with our old IBM DS4300 FC SAN once. It had a controller issue and IBM were called to deal with it. All LUNs were confirmed as working on the good controller, the engineer disappeared round to the back of the cabinet and carefully removed the perfectly functional controller. Recovery took a while but I was impressed by how unconcerned all the ESX servers were.

    1. Captain Scarlet Silver badge

      Re: IBM Professionals

      What how, literally everything had LED's on and when something bad happened there was always an amber LED. I seem to remember the IBM software itself had the option of flashing an LED to locate the parts as well (As always did that to replace the flash cache battery modules)!

  10. This post has been deleted by its author

    1. rcxb Silver badge

      Re: Thank God for modern journaling file systems.

      ZFS isn't journalled so much as it is copy-on-write. And while you can guarantee the file-system is in a crash-consistent state, it doesn't mean your application database can so easily recover from a sudden unplug.

      1. This post has been deleted by its author

  11. Anonymous Coward
    Anonymous Coward

    All hash prompts look the same

    A fellow I knew had a habit of shutting down Unix workstations by issuing 'kill -9 -1' from a root prompt.

    Except one day the terminal he typed it into was a remote session into a server. Bugger.

    1. Anonymous South African Coward Bronze badge

      Re: All hash prompts look the same

      A fellow I knew had a habit of shutting down Unix workstations by issuing 'kill -9 -1' from a root prompt.

      What does it actually do? Something destructive? It looks so innocent sitting there...

      1. GrumpenKraut
        Mushroom

        Re: All hash prompts look the same

        From the man page: kill -9 -1 is "Kill all processes you can kill."

        This does NOT look like a sane way to shut down a system in a clean way: -9 is SIGKILL and that only should be used as very last resort. Send SIGTERM, then wait... long... enough..., only then kill stubborn processes with SIGKILL.

      2. Flocke Kroes Silver badge

        Re: All hash prompts look the same

        The -9 is a synonym for KILL. It means the kernel will stop the target process(es) so they never get any more CPU time then deallocate all the process's resources like memory and file descriptors. The meaning of -1 depends on who is asking. If root gives this command it means "all processes except for some special system processes". Anything that is not a "special system process" does not get to put its open files into a consistent state. Expect trouble from any database at the least and probably a bunch of other things too.

        If that is not terrifying enough, you then have to guess how special a process has to be so survive. How about the processes that maintain journaled file systems, caches, network and disk access?

    2. Peter Gathercole Silver badge

      Re: All hash prompts look the same

      Was obviously a real old UNIX admin.

      The official way of shutting down (to single user) a UNIX edition 6 or edition 7 system on PDP11 was slightly different, but not a lot.

      From a root session it was "kill -1 1" which sent HUP to the init process that would switch the run level to single-user mode. In single user mode, there was just the shell on the console running (and just maybe init itself).

      From there, you would issue several sync commands to flush any unwritten disk buffers, and then power down the system. It was all documented as the way to do it!

    3. Martin
      FAIL

      Re: All hash prompts look the same

      And, as always, the question has to be asked - why did he even have a remote root prompt into a server, rather than using sudo? Especially if it was a production server?

      Even at home on my own linux box, I never never use a root prompt, unless I can find no other way of doing it. Just too easy to make a mistake.

      1. cmaurand

        Re: All hash prompts look the same

        "And, as always, the question has to be asked - why did he even have a remote root prompt into a server, rather than using sudo?"

        Redhat seems to like it that way. I've noticed it with all Redhat derived distros. sudo is not even installed by default.

  12. Anonymous Coward
    Anonymous Coward

    Not quite the same . . but nearly.

    Many years ago a junior that I was ('mentoring' is over blown) 'looking after' was involved in a project on another site. He had installed a W2K server (if I remember correctly) and UPS but the server wasn't responding to power events.

    So we traipsed up there together; I checked the software installation checked the cable, checked the COM port numbers, etc. It all looked okay so I told him to turn the mains power off and see if any events were generated,

    He disappeared under the desk and a few seconds later the server, monitor and tape backup all went off!

    For some reason, instead of simply flipping the switch on the wall he had chosen to pull the mains lead. From the UPS. That went to the server.

    I, somehow, managed to avoid screaming, "What the @#@* are you doing?!!", and instead turned to the customer, watching proceedings with interest, and calmly assured him that with modern file systems, and no-one currently using the server everything would probably be alright.

    It was and I had a word with my ward off-site on the way home.

  13. Captain Scarlet Silver badge

    ilo is brilliant except

    I managed to shutdown a server at another site after mis-reading an a as an o. It was either an ip kvm or ilo with a long list of hundreds of servers, only used as RDP wouldn't allow me to logon.

    Annoying thing was I read the name like 5 times and I could feel something was wrong, but only when the device went red on the screen did I realise I kept reading a as o. I really wish site letter were at the front or back of any naming convention as apparently my brain doesn't read everything in the middle.

    Anyway that itself was a Notes 5 server for a customer, but as the transaction logging was always pretty good (Ok yes it was slow) in Notes no problems.

  14. jelabarre59

    jumpbox

    Then there's our lab setup where you had to go through a "jumpbox" machine to get to the management/console network in the server lab. Log into the JB, then log into the machine you're working on. Problem was, sometimes the connection to the machine you were working with would drop (might have been an intentional timeout, it's been a number of years now). But the prompt *wouldn't* change when the system dropped, it would sit there showing the name of the machine you were (no longer) connected to (hitting enter or some other key would make it drop, but you had to actually do that first).

    So one day I'm working on a system, and then go to get a coffee or hit the head. Either way it gave it enough time for the connection to get dropped. I get back to my desk, and issue a reboot command, not paying attention that whatever key I hit to clear the screensaver also dropped me back to the jumpbox. And of course the required command logging pointed it right back to me.

  15. Anonymous Coward
    Anonymous Coward

    Ahhh....Y2K££££

    £2.4K - What I earned for 2 shifts over the Millennium weekend for a software reseller.

    The only calls we got were friend, family and co-workers asking if we had had any calls.

    Then came the tax deductions..

  16. Jim Willsher

    Closest I've come is being on an RDP session all day, then finishing my work in a hurry and clicking Shutdown. GPO had been configured to not display any warnings so I clicked Shutdown then left for the day.

    Completely unaware that I had shut down a remote server and left my own PC chugging away merrily.

  17. Sequin

    I worked for a large government IT department who at the time did mainly mainframe systems - payrolls etc, using ICL kit. I was picked to be part of a new team that would be the first to put in systems based on IBM PC and compatibles, and we also started using Novell Netware servers. The tech support section refused to have anything to do with anything that weighed less than a ton, so we were left to specify our own kit and look after it ourselves.

    As part of our setup we got a UPS system which we plugged the server into, and did full test about every 3 months, pulling the plug to the UPS and having it shut down the server gracefully. This worked fine.

    After about 2 years of this we were moved to a new building, and we had a computer room in one corner and we were told to put our server in there. By this time the tech support team had finally started to support PC based systems for other teams, but we still looked after our own development kit. A couple of weeks after the move, we did our regular UPS test and our server shut down gracefully. A couple of seconds later there were screams from the other teams as they lost all of their network shares and print queues. It turns out that the tech support team had noticed several spare outlets on the UPS and had the bright idea of plugging the other servers in to it, without them having the software installed or being connected by serial port to the box, so while our server closed down gracefully, theirs all crashed and burned! I suppose we were luck that there was sufficient battery capacity available to allow ours to shut down in time.

  18. Bruce Ordway

    Wrong appserver

    On a much smaller scale but I did shut down the wrong database application server once. There was a treeview where you'd select an instance to control with start/stop icons located on the main menu bar. I didn't notice the focus had shifted from test to live before I clicked that stop icon. Almost immediately, I heard a wave of groans rise and pass thru the entire building. If I remember nobody really asked what had gone wrong and after about fifteen minutes I was actually a hero for getting things back up and running.

    That old application server is still running today, the only difference now is that I double (or even triple) check status before I click on anything.

    1. GrumpenKraut
      Angel

      Re: Wrong appserver

      > heard a wave of groans rise and pass thru the entire building

      Nor "entire building" in my case, just a big room full of computers. Sweet memories...

  19. Gerhard den Hollander

    Hold the power button ....

    I learned relatively quickly that certain brands if server hardware would not poweroff on the push of the button but on the release.

    Back in the NT4 days I remember having to get a colleague ( who was within shouting distance) logging in on the console and doing a graceful shutdown while i kept the powerbutton depressed.

    To be fair i’ve returned the favor a few times as well.

    When someone asks can you powerdown The server, always ask which of the servers we’ve just been discussing he meant ....,

  20. Yes Me Silver badge
    Facepalm

    Sticky labels help unless...

    This isn't about labels on servers, but labels on network sockets. Yes, they're essential. Unfortunately nobody told the painters, who in order to do a neat professional job removed all the tiny bezels on all the sockets before repainting the walls. Which also removed the labels. And they left tidy heaps of labelled bezels on the windowsills. So, back to trial and error to discover which sockets connect to which VLAN.

    1. Olivier2553

      Re: Sticky labels help unless...

      That is why the cabling company should issue a floor plan with the location of the network sockets and their corresponding labels... That should greatly reduce the guess work.

    2. Criggie

      Re: Sticky labels help unless...

      Not sure what your wall plates look like, but most of them have a clip-on cover and the main part which is screwed to the wall.

      I've always put the pretty sticker label on the clip-on cover, but also written the port number directly on the plate underneath. Sometimes there's a label on the wire in the wall, but that's less-visible.

  21. hairy harrold

    Blue led

    Absolutely, until they introduced identifier LEDs on servers a year or two later.

  22. MtK

    init 0

    I remember a friend coming over to our team and asking "if I do an init 0, is there anyway to get control back?". Unfortunately he had no console access and the Sun server was in another country ...

  23. Evil_Tom

    A production server? I raise you a datacentre!

    One of a pair of UPS units had failed and was constantly on internal bypass. To do the work to repair, Health & Safety demanded (reasonably) that the faulty unit be hard bypassed at the wall. This meant we had to have the appointed person electrical engineer on site to flip the switch.

    He did and the entire data centre went silent. "Whats that meant to happen?" came the cry from the electrical engineer as all the disks span down and bleeps started happening all over. I just looked at him, bewildered. Without another word he just powered it back on about 20 seconds after power off, and I died a little inside.

    NetApp arrays, VMWare hosts and everything booted up. We lost some disks, and lost volumes and the config of one fibre switch reset to an older config (as someone hadn't saved it). Thing was, is that we lost the only two read-write domain controllers as they were both stored on that same volume that corrupted. All other DCs on other sites were read-only. To top all of this off this was an extra secure system which was separated from our main network and the company had skimped on the DR options and only given us 2 days snapshots and no tape backups. We also couldn't invoke a full DR quickly as the disks on the main site were Fibre Channel, while the DR site was iSCSI

    We had to get the RW DCs up again (which we did from old snap-mirrors of the disks - not system state backups), do a system state backup of them in their crap, old state we'd got running, and then do an authoritative restore into themselves. There were so many conflicts.

    It turned out the labels on the electrical switches was wrong (or the labels on the UPS). Either-way it was 5 days work to get it all up and running.

    Things I learned:

    Never store CRITICAL passwords for a system within the system (especially if it's super secure/isolated). Have "break-glass" accounts.

    Always take a system state backup of at least one DC. Consider storing it offline.

    Have more than two read-write DCs, Never store them on the same volumes. Consider one offsite.

    Save your switch configs religiously.

    Authoritative restores of domains aren't too scary. You'll have to make some sacrifices to some account and object changes but that's worth it!

    Don't be afraid to tell your non-technical boss to piss off if he's breathing down your neck looking for answers every 3 minutes. Give him a time you'll update and make sure you update him.

    1. A.P. Veening Silver badge

      Re: A production server? I raise you a datacentre!

      Don't be afraid to tell your non-technical boss to piss off if he's breathing down your neck looking for answers every 3 minutes. Give him a time you'll update and make sure you update him.

      Better to give him something useful (and harmless) to do like getting coffee.

      1. Anonymous South African Coward Bronze badge

        Re: A production server? I raise you a datacentre!

        Or ask him to get you some nice hot pizza.

      2. The Real Tony Smith

        Re: A production server? I raise you a datacentre!

        Don't be afraid to tell your non-technical boss to piss off if he's breathing down your neck looking for answers every 3 minutes. Give him a time you'll update and make sure you update him.

        Better to give him something useful (and harmless) to do like getting coffee.

        And a good boss will go and make it

        1. CalmHandOnTheTiller

          Re: A production server? I raise you a datacentre!

          Obligatory Dilbert - Best use of a Boss in a Crisis

          https://dilbert.com/strip/2011-10-02

          I actually did this to mine once when the SAN was down ... I asked him to stop anyone else from coming in to IT and bothering me, as the phone and email systems were also down so shoulder surfing visitors to the IT dungeon were all slowing me down.

    2. Criggie

      Re: A production server? I raise you a datacentre!

      Switch/router backups are frequently overlooked. I like Oxidized, but RANCID does the same sort of config management.

      Set it up once, add/test new devices, and then forget about them until there's a crisis.

      (wanders off to confirm mine's running as expected)

  24. This post has been deleted by its author

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like