back to article Sysadmin wiped two servers, left the country to escape the shame

Grab a very small cake and a bunch of candles, dear readers, for today we mark the 10th edition of “Who, me?”, The Register’s confessional for IT pros who broke things badly. This week, meet “Graham” who “ended up as an authority on a fledging new product called SFT III from Novell. SFT stood for “system fault tolerance”. “ …

Page:

  1. Anonymous Coward
    Anonymous Coward

    “I have subsequently moved to Australia to try and put this behind me,” Graham concluded.

    A bit too far for a mere cockup and a definitely not far enough if those servers belonged to people you really need to run from.

    I had to maintain a few of those during the hungry 90-es on the other side of the wall and I prefer not to remember the details of that. Healthier that way. Less likely to find my remains 10m up in the air starting the car.

    1. Mayday
      Pirate

      Sentenced to transportation then?

      “I have subsequently moved to Australia to try and put this behind me,” Graham concluded*

      *from an Aussie too.

  2. TonyJ

    Ohhh SFT III...I set that up for a company that used to make bricks for kilns.

    It was a real pig to get running - the hardware had to be absolutely identical, even down to the disk firmware and, if I recall correctly, geometry. Which was a problem when Compaq sent the same part number disks but from a different manufacturer.

    Anyway I still remember being asked to nip a spare fibre cable down, as they weren't far from where I lived.

    It was quite a thick, glass fibre. When I went to the stores to get it, the store man merrily bent it over itself to fit in a jiffy bag. I could literally hear it snapping each bend.

    "Oh well...we need another one now" :)

    1. Anonymous Coward
      Anonymous Coward

      > Ohhh SFT III...I set that up for a company that used to make bricks for kilns.

      Making bricks - how appropriate for SFTIII. Having inherited an SFTIII installation, I found that as long as you didn't want to change anything ever, it was fine, but in a dynamic environment, it was a ball and chain on change.

      Who remembers those "demonstrations" at the Networks show in Birmingham in the 90s, where Novell dropped anvils on one of a pair of running servers to show, er, gravity or something. That kind of 90s loadsamoney waste was one of the reasons I was turned off Netware and started heading towards this new NT thing, which turned out to be rather good, and cheaper too. Imagine that - MS cheaper, no doubt as Netware had to pay for demo anvils.

      1. TonyJ

        "...Who remembers those "demonstrations" at the Networks show in Birmingham in the 90s, where Novell dropped anvils on one of a pair of running servers to show, er, gravity or something. ..."

        Yoikes! I remember! Seemed pretty wasteful. I'd have loved a server to own back then. My lab was a couple of ageing desktops

        1. DJSpuddyLizard

          ..Who remembers those "demonstrations" at the Networks show in Birmingham in the 90s, where Novell dropped anvils on one of a pair of running servers to show, er, gravity or something. ...

          I remember seeing the Compaq Destructive Testing Lab when I was a contractor back in Houston in the 90s. It seemed they would attempt to crush servers, then act surprised when the poor machine actually crushed. I really don't know how that was supposed to help the average buyer ... "Yes, but our servers can stand up to FOURTEEN thousand pounds of force!" sorry, mate, if there's 14,000 pounds force extra in your server room, there's a problem that most likely needs immediate attention.

      2. Water Cooler

        Mirror, mirror, off the wall...

        Ah yes, I remember! We ran a semi-critical development environment on one of these setups, way back. We never had the old 'anvil falling from the sky' incident on our site, but we did have some sort of nasty OS software corruption on one of the boxes that caused it to go TITSUP. Fortunately, we had the second box, which was instantly mirrored to duplicate the... no, wait, stop!!!

        The whole system was offline for about a week.

      3. Anonymous Coward
        Anonymous Coward

        Yes, back then mirrored servers was the sort of thing done by DEC at very high cost, so the SFT idea was a good try at a lower cost but did have that unfortunate requirement to be a very homogeneous environment. Then again, in my experience so did the higher end stuff as well, but the variability was lower in that high cost environment.

        I tried the SFT way once but decided it wasn't necessary in a file server environment as long as one kept sufficiently frequent backups. The file focused nature meant that large amounts of data weren't at risk. Novell servers weren't that often used for database hosting.

        Funnily enough we looked at Windows NT as an alternative F&P solution and concluded that it was absolute garbage compared to a Novell installation well through into the 2000's. Even had an MS "evangelist": tout its security advantages until one of my peers pointed out that the "high" security rating only applied when the NT server was running in an air-gapped environment and wasn't connected to any other system. If there were connections the rating changed to very poor.

        Biggest problem sticking with Novell through the late 1990's was the sabotage campaign the MS conducted as it used undocumented procedure calls and hidden features to disable Novell client connections on PC's. This wasn't revealed as an active campaign until much later, and at the time people tended to blame Novell, which was rather MS';s intent and did help (in my environment) promote the desire to change over to Windows server despite the inadequacies and heavy footprint of AD at the time.

        1. GrumpyKiwi

          You didn't always need Microsoft's assistance to break the Netware client. Back in the early naughties and I was working for the NZ branch of a very large Japanese consumer electronics manufacturer. Who would regularly send out updates to their reporting software that would break the Netware client on Windows requiring a reinstall of the Netware client. On every single computer. I can only guess that it was pushing Japanese language DLL's or something similar.

        2. Stoneshop

          Then again, in my experience so did the higher end stuff as well, but the variability was lower in that high cost environment.

          VMS clusters can be quite unhomogeneous, VAX8800 with Alpha DS10 and MicroVAXes, as long as they're running the same VMS version. Different CPU architectures require that the system disks be separate (you can have a common system for all the machines of the same arch), but data disks can be mirrored right across them. For disk mirroring you do want roughly identical sized disks, although they don't need to be exactly identical. Just that you want to start building the mirror set off the smallest.

    2. J.G.Harston Silver badge

      One a site I needed a 25m length of earth bonding cable. Went to the suppliers who measured it out, then promptly folded it into 25 1-m lengths of earth bonding cable.

      1. ssharwood

        Did you buy another or splice?

  3. John Robson Silver badge

    Don’t need human error for that

    Place I had a summer job had a raid controller do this all on its own - failed drive was pulled, replaced... controller then mirrored the drive - the new drive that is, over the remaining good drive... oops

    1. Anonymous Coward
      Anonymous Coward

      Re: Don’t need human error for that

      Had that happen on my Netgear ReadyNAS. Its supposed to be hotswappable. I never bothered using that feature again.

      1. Antron Argaiv Silver badge
        Pint

        Re: Don’t need human error for that

        I have one. No failures yet.

        THANKS! (and have a virtual cold one on me) -------------------------->

        ...for the warning.

        I'll be shutting it down before replacing any failed drives in the future.

    2. Doctor Syntax Silver badge

      Re: Don’t need human error for that

      "mirrored the drive - the new drive that is, over the remaining good drive"

      It seems to be a standard feature of mirroring judging by the number of times I've heard of that.

      1. John Robson Silver badge

        Re: Don’t need human error for that

        I've only come across it once - but of course the number of people who have heard me tell of it is probably quite large, and some of those may have repeated it to others....

        I wonder how many instances there actually have been, and how many times we hear the same instance repeated...

  4. Korev Silver badge

    Incremental backups?

    I don't know the full story; but wouldn't taking a full backup before starting a major operation be prudent?

    1. Anonymous Coward
      Anonymous Coward

      Re: Incremental backups?

      But the company probably had an ISO 9001 certificated backup procedure of monthly full backups followed by daily incrementals with specioied policies of how the tapes get shuffled around the firesafe/sent off site/etc etc so taking a full backup would have been an exception to the policy and required far too much paperwork/management sign-off to contemplate.

      1. Anonymous Custard
        Headmaster

        Re: Incremental backups?

        And of course you just wait for some minion to make an incremental back-up over the top of the full one, or to use the same tape for the full one each month and end up overwriting the historical archive once it gets too full.

      2. CrazyOldCatMan Silver badge

        Re: Incremental backups?

        had an ISO 9001 certificated

        ISO 9001 only specifies that you have a process and follow that process in a documented fashion. It doesn't specify that the process has to be any good or have any value.

        So a process that says "we do nothing about backups" is a valid ISO 9001 process as long as you can proved in documentation that you did nothing.

        Of course, that would then fail any ISO 27001 certification but that's a whole different can of Annelids.

        I've worked at places that do both ISO schemes. Which is why I'm so cynical about it..

        1. Doctor Syntax Silver badge

          Re: Incremental backups?

          "ISO 9001 only specifies that you have a process and follow that process in a documented fashion. It doesn't specify that the process has to be any good or have any value."

          That was one of the objections I had about ISO 9001. It was supposed to be a quality management standard but providing the "quality" was repeatable it didn't matter how bad it was. I kept calling the mediocrity management system.

          We introduced it after TQM. It was supposed to bea step up from that. As TQM had a mantra of Get it right First Time Every Time I wanted to know how, if we were already doing that, it could be a step up.

          1. Anonymous Coward
            Anonymous Coward

            Re: Incremental backups?

            TQM brings back memories. The government agency I worked for pretended to adopt it when a political appointee who was mad about it forced all the agencies under his control to do it. The agency had been zig-zagging back and forth between management fads for a while. A lower level manager I knew wrote a proposal that the acronym be changed to CQM (C=first letter of agency's name), and remain CQM regardless of future zigs and zags so it looked like we were maintaining steady progress forward. He did it as a joke - he got an award for it.

        2. Anonymous Coward
          Anonymous Coward

          Re: Incremental backups?

          "ISO 9001 only specifies that you have a process and follow that process in a documented fashion. It doesn't specify that the process has to be any good or have any value."

          In my days at an ISO manufacturer, my standard line was that:

          ISO 9000 = "we have processes"

          ISO 9001 = "we have processes, and we wrote them down"

          ISO 9002 = "we have processes, we wrote them down, and we follow them"

          We ended up doing ISO 13485 at one point. I couldn't even come up with a cute joke about that.

          I've heard that the latest 9001 is focused more on risk management than specific processes. Sounds good in theory, but I have a feeling that most auditors just won't get it.

          1. Bitsminer Silver badge
            Happy

            Re: Incremental backups?

            "ISO 9001 only specifies that you have a process and follow that process in a documented fashion. It doesn't specify that the process has to be any good or have any value."

            We are ISO 9001 registered. We can repeat our mistakes exactly.

      3. Anonymous Coward
        Anonymous Coward

        Re: Incremental backups?

        But the company probably had an ISO 9001 certificated backup procedure of monthly full backups followed by daily incrementals with specioied policies of how the tapes get shuffled around the firesafe/sent off site/etc etc

        But the million dollar question is... did the procedure ever include verifying or testing the backups?

    2. Dick Emery

      Re: Incremental backups?

      I used to sys admin for a small company amd found out when their RAID5 failed that the incrementals had been corrupting slowly bit by bit for some time. Bad data in. Bad data out. That was a very sweaty evening rebuilding the array I can tell you. Did I get any thanks for it? Don't make me laugh.

  5. Korev Silver badge
    Joke

    Down under

    As servers are upside down in Australia does that mean that the discs are now the right way round?

    1. ssharwood

      Re: Down under

      I'll tell you as soon as I can ride my Kangaroo to the office, wrestle the crocs out of the way to get into the data centre and smoke the snakes out of the aircon. Once I'm done I'll drink a slab of Fosters (assuming it still exists) and tamper with a cricket ball.

  6. jake Silver badge

    No shame in cocking up!

    Nothing wrong with an honest mistake. You're only human.

    As Grampa told me, after I replanted the 10x100ft rows of carrot seeds that he had already planted the day before: "Own up to your mistake, learn from it, resolve not to do it again, and move on". Sound advice for a 9 year old half a century or so ago, still sound advice for an adult today.

    1. Korev Silver badge
      Coat

      Re: No shame in cocking up!

      I feel some vegetable puns coming on... Let's hope none leek out

      1. Anonymous Coward
        Anonymous Coward

        Re: No shame in cocking up!

        You can't top a carrot story.

      2. jake Silver badge

        Re: No shame in cocking up!

        Veg puns? I didn't think of that, but I'll beet you're right. We'll probably be peppered with them. Lettuce wait and see ...

      3. Anonymous Coward
        Anonymous Coward

        Re: No shame in cocking up!

        Not interested --- I don't carrot at about vegetable puns.

        1. Anonymous Custard
          Headmaster

          Re: No shame in cocking up!

          The true wisdom is not just to learn from your own mistakes, but to also learn from those of others to save you making your own in the first place.

          At least that's my excuse for reading this section on a Monday morning (and On-Call on a Friday)... :)

          1. Solmyr ibn Wali Barad

            Re: No shame in cocking up!

            Wise man can learn from mistakes of others, but fool cannot learn from his own.

          2. Anonymous Coward
            Anonymous Coward

            Re: No shame in cocking up!

            “Only a fool learns from his own mistakes. The wise man learns from the mistakes of others.”

            Bismarck

      4. Commswonk

        Re: No shame in cocking up!

        I feel some vegetable puns coming on... Let's hope none leek out

        1206A: thank goodness; they seem to have stopped sprouting up now.

        1. Paul Westerman
          Coat

          Re: No shame in cocking up!

          That's a turnip for the books.

      5. keith_w

        Re: No shame in cocking up!

        as a poster in a previously read story said, "There's not mushroom for a pun in Reg story"

      6. Doctor Syntax Silver badge

        Re: No shame in cocking up!

        "Let's hope none leek out"

        Then we'd be in the soup. Cock-a-leekie.

      7. Anonymous Coward
        Anonymous Coward

        Re: No shame in cocking up!

        only for those who can't cut the mustard.

      8. dmacleo

        Re: No shame in cocking up!

        we need to squash this now before it sprouts (especially those of you in brussels) and buds while dropping more seeds.

        no more vegetable punions.

    2. Anonymous Coward
      Anonymous Coward

      Re: No shame in cocking up!

      It is not the making of mistakes - but how quickly you can recover from them.

      That's called "experience". As the saying goes "An expert is someone who has made all the mistakes" ( and learned from them).

      It is always good to have at least a Plan 'B' in case of unknown unknowns. As Sod's Law states "Anything that can go wrong, will go wrong - at the worst possible moment.

      People who never expect to make a mistake will eventually get a comeuppance.

      1. Anonymous Coward
        Anonymous Coward

        Re: No shame in cocking up!

        A friend asked a junior colleague recently about the quality of the tests he was writing with his code. "Oh that's OK, I don't write bugs" was the response.

        1. uccsoundman

          Re: No shame in cocking up!

          Or what I hear all the time... "We're Agile, we don't need to test"

          1. Aladdin Sane

            Re: No shame in cocking up!

            Worked fine in dev, ops problem now.

      2. Fruit and Nutcase Silver badge

        Re: No shame in cocking up!

        It is not the making of mistakes - but how quickly you can recover from them.

        We had some databases which were prone to getting corrupted due to what was eventually traced to a hardware issue. The recovery process was to rebuild by restoring the previous nighty backup and rolling forward using the transaction logs. One one such occasion, I had cleared the wrong directory as part of the reset, losing some configuration files which would prevent restarting the database. No idea what made me think of the solution, which was to connect to one of the other remote sites (each had identical server builds), copy over the zapped files and carry on...

    3. Gordon Pryra

      Re: No shame in cocking up!

      "Nothing wrong with an honest mistake. You're only human."

      Yeah that sounds all good, unless you happen to have a mortgage ....

    4. Anonymous Coward
      Anonymous Coward

      Re: No shame in cocking up!

      Own up to a mistake? That's a good one and not something a true BOFH would admit to.

      On the other hand if you get away without owning up at all...then all the better. Least isn't that what the powers that be keep showing us?

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like