back to article Developer mistakenly deleted data - so thoroughly nobody could pin it on him!

Welcome to the eighth edition of "Who, me?", the column in which Reg readers confess to moments at which they messed things up but good. This week meet "Ben" who told us that "On a Friday afternoon about five years ago I was asked to complete some backup scripts before the weekend." Ben told us his employer "was too tight to …

  1. Anonymous Coward
    Anonymous Coward

    Not me, but someone else previously in my team

    There is a folklore story at my company who are a managed service provider and so deal with many clients and generally have lots of things on the go at the same time.

    A ticket came through requesting a sysprep of a machine at a client, but due to multiple windows being open and potentially some distraction, my ex-colleague sysprep'd a domain controller. No biggy in most circumstances, but this customer only had one domain controller and so were royally shafted. For some reason the backups had been failing on this domain controller and it couldn't be recovered from backup.

    Through some luck of the gods, this domain controller had been cloned accidentally a couple of weeks before and that was the only reason why they didn't lose all their domain configuration.

    So in this instance, two wrongs made a right, but he was never allowed to live it down.

    1. Adam 1 Silver badge

      Re: Not me, but someone else previously in my team

      Ok. So not you. Nope. Definitely someone else. Got it.

    2. Amos1

      Re: Not me, but someone else previously in my team

      Did you know that when you're in Active Directory Users and Computers and are going to delete something from the right-hand pane but accidentally are clicked in the left pane and say you're sure, it can delete the entire production OU? I know one sysadmin who now knows that and also is now far slower at his admin tasks. Until the next time...

      1. Phil W

        Re: Not me, but someone else previously in my team

        Maybe he should also learn about the "Protect from accidental deletion" tick box.

        1. J. Cook Silver badge

          Re: Not me, but someone else previously in my team

          @Phil W:

          "Maybe he should also learn about the "Protect from accidental deletion" tick box."

          I'm not quite sure when that little ticky box made it's first appearance; might have been server 2008R2. I honestly can't remember if it was in 2k3 or not; it's been quite some time since we had a 2k3 DC at [redactedCo] and frankly, I've slept since then. :D

          That little ticky box sure is a life saver, though.

      2. Youngone Silver badge

        Re: Not me, but someone else previously in my team

        I did know that, as I had to restore our domain controller one Saturday after my colleague deleted some vital top level OU.

        He would have done it himself, except for the whole getting sacked bit.

        1. Alan Brown Silver badge

          Re: Not me, but someone else previously in my team

          "He would have done it himself, except for the whole getting sacked bit."

          Which is pretty indicative of a company you don't want to be working for, unless this guy has a long history of that kind of thing or it was malicious.

  2. tip pc Bronze badge

    I hope lessons where learnt and a proper backup suction put in place, but I suspect not.

    1. Anonymous Coward
      Anonymous Coward

      To hoover up the data?

      1. Kane Silver badge
        Joke

        "To hoover up the data?"

        To "vacuum" up the data, I think you'll find.

        1. Herby Silver badge
          Joke

          Vacuum??

          "To hoover up the data?"

          To "vacuum" up the data, I think you'll find.

          This being a UK site and all shouldn't it be "Dyson" up the data??

          1. Random Handle

            Re: Vacuum??

            >This being a UK site and all shouldn't it be "Dyson" up the data??

            ....that would mean rsync to a cheap NAS in Malaysia

    2. Evil Auditor Silver badge

      The proper backup suction that I came across some decade ago. Setting up a backup procedure just about as manually as Ben did. For checking if the procedure worked as expected they didn't want to wait until the tape was fully written. Instead to the tape they directed the data stream to /dev/null - the backup ran flawlessly, answered with a success message and was ready to be deployed.

      Problem was, no one changed the device to write to the tape. Until their auditor asked if they ever did backup restore tests.

      1. 2Nick3 Bronze badge

        In a meeting on how we could save costs on our backups I suggested we could write the backups to /dev/null, and since we only had SLAs on the backups and nothing on restores (yeah, really - no idea how that contract got signed) we would not be violating the contract. Not only would backups run faster, we would also increase the success rate, as we would no longer need to manage a scratch tape pool. It was only slightly (in my opinion, at least) more ridiculous than some of the other suggestions being made.

        It took a few hours for a group of us (who could detect sarcasm) to convince one of managers that we really couldn't do it.

        1. Alan Brown Silver badge

          backups to /dev/null

          Has someone been reading Simon's early "Red Bucket" efforts?

  3. Timmy B Silver badge

    Sat at my desk one day happily carrying out my programming duties... All of a sudden a expletive, or more exactly a stream of profound and harsh expletives, come from behind my managers monitor. He stands up ashen faced and beads of sweat already forming on his brow. In updating the software for the main customer database for the company he had reversed the source and destination databases and totally blatted the whole thing. All of the information about sales and support and leads was gone. The most recent backup month older. Thankfully we had a development backup from earlier that morning.

    Our main office is 50 miles away and this is in the days before decent internet connections and the backup is far too big for us to copy.

    The two of us make an excuse about it seems to be some kind of hardware fault on the server take the system offline by deleting a random dll. We jump in the car like batman and robin - driving in a not dissimilar way to the office with a freshly burned CD with the data.

    Later that afternoon we're fixed. Losing only a tiny amount of work that we put down to what could be a failing hard drive that we would "keep an eye on" looking like geniuses because in one day we not only fixed the hardware, updated the software, showed how willing we were to drop everying to support head office, etc.

    1. Sir Loin Of Beef

      I will admit that I did the same thing once but with the helpdesk ticketing system. I wanted to show I was more than a phone jockey so I grabbed the manual and attempted to "update" the software. Instead I ran the delete all script. Luckily we caught it before it deleted the whole database (we lost about 40%). This gave my manager an excuse to rewrite all of the categories and knowledge base so it was a backhanded win for me.

      1. werdsmith Silver badge

        I changed an ODBC DSN, checked it, double checked it, triple checked it. All good

        So I ran the process, which connected via the 32 bit DSN and promptly shafted everything.

        I experienced that special sinking feeling that is only known to IT people with enough permissions.

    2. PPK
      Facepalm

      Reversi

      A loooong time ago (probably around '93) my old MD was just setting up the small company that I later joined. One of his more technical colleagues came to his house with his PC, in order to upgrade my boss's machine to the (then) latest Win 3.1 build.

      As I recall he was using a transfer utility via serial cable to mirror his newer system onto the MD's one. Unfortunately an incorrect directional button was clicked, and several hours later he discovered he had just downgraded his own working system...

  4. CAPS LOCK Silver badge

    Never edit the fstab table on a production system...

    ...don't ask me how I know...

    1. Korev Silver badge
      Coat

      Re: Never edit the fstab table on a production system...

      Maybe someone would make a fstab at guessing why...

      1. Anonymous South African Coward Silver badge

        Re: Never edit the fstab table on a production system...

        Gots a spare fstab in the nasty pocketses?

        1. Korev Silver badge
          Coat

          Re: Never edit the fstab table on a production system...

          Yep, fstab, a /bin, /etc

          1. Androgynous Cupboard Silver badge

            Re: Never edit the fstab table on a production system...

            I would also highly recommend against (in the days before package management) upgrading system libraries by hand - specifically, upgrading the dynamic loader library. The number of dynamically linked executables on a Linux system is quite high and, unfortunately, includes the "mv" command. Best laid plans and all that.

            1. onefang Silver badge

              Re: Never edit the fstab table on a production system...

              "The number of dynamically linked executables on a Linux system is quite high and, unfortunately, includes the "mv" command. Best laid plans and all that."

              This is why you keep a statically linked busybox or toybox around.

          2. This post has been deleted by its author

          3. Paper

            Re: Never edit the fstab table on a production system...

            Anyone going to ./share what happened?

    2. trollied

      Re: Never edit the fstab table on a production system...

      Oh, crikey. Reminds me of a balls-up from yesteryear, when I was a young scamp in my first job.

      Many 100s of LUNs presented to a large Sun box (E10k or E15k, I think it was). Spend a good few days carving the devices up and configuring them in Sybase.

      Then the machine was rebooted.

      I hadn't updated the vfstab.

      Took me AGSE to dd bits off each disk to work out which device was supposed to be mounted where... You live and learn!

    3. Flocke Kroes Silver badge

      Re: Never edit the fstab table on a production system...

      Unless you have physical access and know how to use your boot loader. The magic phrase you need to add to the kernel command line is: init=/bin/bash

      You can then fix /etc/fstab, change root's password and then realise none of your changes happened because your forgot to: mount -o remount,rw /

  5. Julian 8

    Saw similar at a previous job. A contractor had come in and written a few scripts to do some temp folder tidying on a large cluster. Unbeknown to me over the last few weekends said cluster had had some serious problems and a node had been completely vaped each weekend. - Someone else was checking these issues and it was not mentioned in our handovers.

    I was asked to look at the scripts (just windows cmd) and they all looked OK - looked.

    That weekend, down a node went again, so I took a closer look at the scripts.

    In a sandbox I copied the suspect script and ran it line by line. All ran well until it came to a delete and there was an extra space after a wildcard. So instead of deleting the intended folder, it deleted the root of the drive it was running on (and this was the system drive)

  6. Anonymous South African Coward Silver badge

    I'll leave this here for all of you to ruminate on...

    (the original can be found at https://web.archive.org/web/20090208023917/http://justpasha.org/folk/rm.html thanks to the Wayback Machine).

    Have you ever left your terminal logged in, only to find when you came back to it that a (supposed) friend had typed rm -rf ~/* and was hovering over the keyboard with threats along the lines of "lend me a fiver 'til Thursday, or I hit return"? Undoubtedly the person in question would not have had the nerve to inflict such a trauma upon you, and was doing it in jest. So you've probably never experienced the worst of such disasters...

    It was a quiet Wednesday afternoon. Wednesday, 1st October, 15:15 BST, to be precise, when Peter, an office-mate of mine, leaned away from his terminal and said to me, "Mario, I'm having a little trouble sending mail." Knowing that msg was capable of confusing even the most capable of people, I sauntered over to his terminal to see what was wrong. A strange error message of the form (I forget the exact details) "cannot access /foo/bar for userid 147" had been issued by msg. My first thought was "Who's userid 147?; the sender of the message, the destination, or what?" So I leant over to another terminal, already logged in, and typed grep 147 /etc/passwd only to receive the response /etc/passwd: No such file or directory. Instantly, I guessed that something was amiss. This was confirmed when in response to ls /etc I got ls: not found.

    I suggested to Peter that it would be a good idea not to try anything for a while, and went off to find our system manager.

    When I arrived at his office, his door was ajar, and within ten seconds I realised what the problem was. James, our manager, was sat down, head in hands, hands between knees, as one whose world has just come to an end. Our newly-appointed system programmer, Neil, was beside him, gazing listlessly at the screen of his terminal. And at the top of the screen I spied the following lines:

    # cd

    # rm -rf *

    Oh, shit, I thought. That would just about explain it.

    I can't remember what happened in the succeeding minutes; my memory is just a blur. I do remember trying ls (again), ps, who and maybe a few other commands beside, all to no avail. The next thing I remember was being at my terminal again (a multi-window graphics terminal), and typing

    cd /

    echo *

    I owe a debt of thanks to David Korn for making echo a built-in of his shell; needless to say, /bin, together with /bin/echo, had been deleted. What transpired in the next few minutes was that /dev, /etc and /lib had also gone in their entirety; fortunately Neil had interrupted rm while it was somewhere down below /news, and /tmp, /usr and /users were all untouched.

    Meanwhile James had made for our tape cupboard and had retrieved what claimed to be a dump tape of the root filesystem, taken four weeks earlier. The pressing question was, "How do we recover the contents of the tape?". Not only had we lost /etc/restore, but all of the device entries for the tape deck had vanished. And where does mknod live? You guessed it, /etc. How about recovery across Ethernet of any of this from another VAX? Well, /bin/tar had gone, and thoughtfully the Berkeley people had put rcp in /bin in the 4.3 distribution. What's more, none of the Ether stuff wanted to know without /etc/hosts at least. We found a version of cpio in /usr/local, but that was unlikely to do us any good without a tape deck.

    Alternatively, we could get the boot tape out and rebuild the root filesystem, but neither James nor Neil had done that before, and we weren't sure that the first thing to happen would be that the whole disk would be re-formatted, losing all our user files. (We take dumps of the user files every Thursday; by Murphy's Law this had to happen on a Wednesday). Another solution might be to borrow a disk from another VAX, boot off that, and tidy up later, but that would have entailed calling the DEC engineer out, at the very least. We had a number of users in the final throes of writing up PhD theses and the loss of a maybe a weeks' work (not to mention the machine down time) was unthinkable.

    So, what to do? The next idea was to write a program to make a device descriptor for the tape deck, but we all know where cc, as and ld live. Or maybe make skeletal entries for /etc/passwd, /etc/hosts and so on, so that /usr/bin/ftp would work. By sheer luck, I had a gnu emacs still running in one of my windows, which we could use to create passwd, etc., but the first step was to create a directory to put them in. Of course /bin/mkdir had gone, and so had /bin/mv, so we couldn't rename /tmp to /etc. However, this looked like a reasonable line of attack.

    By now we had been joined by Alasdair, our resident UNIX guru, and as luck would have it, someone who knows VAX assembler. So our plan became this: write a program in assembler which would either rename /tmp to /etc, or make /etc, assemble it on another VAX, uuencode it, type in the uuencoded file using my gnu, uudecode it (some bright spark had thought to put uudecode in /usr/bin), run it, and hey presto, it would all be plain sailing from there. By yet another miracle of good fortune, the terminal from which the damage had been done was still su'd to root (su is in /bin, remember?), so at least we stood a chance of all this working.

    Off we set on our merry way, and within only an hour we had managed to concoct the dozen or so lines of assembler to create /etc. The stripped binary was only 76 bytes long, so we converted it to hex (slightly more readable than the output of uuencode), and typed it in using my editor. If any of you ever have the same problem, here's the hex for future reference:

    070100002c000000000000000000000000000000000000000000000000000000 0000dd8fff010000dd8f27000000fb02ef07000000fb01ef070000000000bc8f 8800040000bc012f65746300

    I had a handy program around (doesn't everybody?) for converting ASCII hex to binary, and the output of /usr/bin/sum tallied with our original binary. But hang on - how do you set execute permission without /bin/chmod? A few seconds thought (which as usual, lasted a couple of minutes) suggested that we write the binary on top of an already existing binary, owned by me... problem solved.

    So along we trotted to the terminal with the root login, carefully remembered to set the umask to 0 (so that I could create files in it using my gnu), and ran the binary. So now we had a /etc, writable by all. From there it was but a few easy steps to creating passwd, hosts, services, protocols, (etc), and then ftp was willing to play ball. Then we recovered the contents of /bin across the ether (it's amazing how much you come to miss ls after just a few, short hours), and selected files from /etc. The key file was /etc/rrestore, with which we recovered /dev from the dump tape, and the rest is history.

    Now, you're asking yourself (as I am), what's the moral of this story? Well, for one thing, you must always remember the immortal words, DON'T PANIC. Our initial reaction was to reboot the machine and try everything as single user, but it's unlikely it would have come up without /etc/init and /bin/sh. Rational thought saved us from this one.

    The next thing to remember is that UNIX tools really can be put to unusual purposes. Even without my gnuemacs, we could have survived by using, say, /usr/bin/grep as a substitute for /bin/cat.

    And the final thing is, it's amazing how much of the system you can delete without it falling apart completely. Apart from the fact that nobody could login (/bin/login?), and most of the useful commands had gone, everything else seemed normal. Of course, some things can't stand life without say /etc/termcap, or /dev/kmem, or /etc/utmp, but by and large it all hangs together.

    I shall leave you with this question: if you were placed in the same situation, and had the presence of mind that always comes with hindsight, could you have got out of it in a simpler or easier way?

    1. Nick Kew Silver badge

      I shall leave you with this question: if you were placed in the same situation, and had the presence of mind that always comes with hindsight, could you have got out of it in a simpler or easier way?

      Yes. Take up a new career writing IT suspense stories. That one looks massively TL;DR, but had me gripped!

    2. MJB7 Bronze badge

      DON'T PANIC

      "if you were placed in the same situation, and had the presence of mind that always comes with hindsight, could you have got out of it in a simpler or easier way?"

      /usr/bin/python

      But of course, that wouldn't have worked when people were running Unix on VAX.

    3. Alan J. Wylie

      That was heroic and ingenious.

      My rescue mission to get the system back to normal after someone had typed

      chmod 444 /bin/*
      involved an 8" floppy and driving from Keighley to Peterborough (about 140 miles) and back.

      One moral of the story, which I am still trying to instil into my cow-orkers 30 years later, is use the symbolic modes to add or subtract explicit permissions.

      And for today:

      X-Clacks-Overhead: GNU Terry Pratchett

      1. Aladdin Sane Silver badge

        AT LAST, SIR TERRY, WE MUST WALK TOGETHER.

        Terry took Death’s arm and followed him through the doors and on to the black desert under the endless night.

        1. Sir Runcible Spoon Silver badge

          re:chmod 444

          Is there any justification for chmod to be made read-only..ever?

          If not, then the code should automatically exclude itself from working on it's own executable, imho :)

  7. Anonymous Coward
    Anonymous Coward

    Code repository wiped by Linux "guru"

    Circa 2004 - starting to work for a small startup, which was still "versioning" using folders. I ask to start to use at least something free like CVS (SVN was just born back then, and no Git, etc.). Linux "guru" offers to take care of it.

    After a few months the CVS server has issues - Linux "guru" discovers the root partition is full, and cleans it - wiping the whole CVS repository which was created there instead than a better place. Also, the Linux "guru" didn't setup a backup. Luckily developers have a copy of code on their systems, repository can be rebuilt, it's version 1.0 of the software, no complex branches and tags, only history is lost (plus any trust I had in the Linux "guru"). Management finds business is still afloat, no reprimand.

    Anyway, for a while, when something didn't work, it was common to say "X deleted it!" - i.e. "Can't reach Google, X deleted it!"

  8. Alan J. Wylie

    Two years ago

    Valve Steam CLEANS Linux PCs (if you're not careful)

    Dodgy shell script triggers classic rm -rf /

    rm -rf "$STEAMROOT/"*

    But STEAMROOT had not been set

    1. Anonymous Coward
      Anonymous Coward

      Re: Two years ago

      Yes, we has a QA system with such a bug that was left running tests over a weekend. It had all the user directories NFS-mounted as well. "Fortunately" the script ran as root, so was "nobody" on the NFS shares, but it still managed to delete many world-writable files. It was strange, the first thing we noticed on Monday moring were complaints from people about strange things happening, files vanishing, etc. The penny dropped when we realized the complaints were spreading alphabetically by username...

      Hurrah for an admin with (a) the wit to immediately bring down the home directory server and (b) working and valid backups. ZFS snapshots helped to restore the last few changes, so little was actually lost. The QA person responsible survived too.

      1. Sir Runcible Spoon Silver badge
        Facepalm

        Re: Two years ago

        Why the hell does rm not return an error when an argument is empty? Surely it's an obvious shortcut to wiping the entire drive if the entry is blank?

        1. Phil O'Sophical Silver badge

          Re: Two years ago

          The argument wasn't empty. "${XXX}/" becomes "/" when XXX is empty. It's poor coding, rm is doing just what it was asked to do.

        2. Adrian 4 Silver badge

          Re: Two years ago

          It does, but that argument isn't empty due to lack of programmer thought. If $STEAMROOT is empty, the argument expands to just "/*".

          1. Alan J. Wylie

            Re: Two years ago

            $ echo "rm -rf $xxx/*"

            rm -rf /*

            $ set -o nounset

            $ echo "rm -rf $xxx/*"

            bash: xxx: unbound variable

            $

            1. really_adf

              Re: Two years ago

              $ set -o nounset

              $ echo "rm -rf $xxx/*"

              bash: xxx: unbound variable

              Or, in case you think nounset is set, but you're wrong:

              echo "rm -rf ${xxx:?}/*"

        3. David Nash Silver badge

          Re: Two years ago

          It's not blank/empty, it's "/".

        4. onefang Silver badge

          Re: Two years ago

          "Why the hell does rm not return an error when an argument is empty?"

          I'm still wondering why rm returns an error when the thing you are trying to delete doesn't exist. If it had existed, it would no longer exist afterwards, so the result is the same.

          1. Doctor Syntax Silver badge

            Re: Two years ago

            "I'm still wondering why rm returns an error when the thing you are trying to delete doesn't exist."

            Your intention: to delete a file called 0nefile.

            You type in: rm Onefile

            On completion of rm, 0nefile still exists because you didn't tell rm to remove it. If rm doesn't return an error you're no wiser to this unless you then run ls. Wouldn't it be handy if rm gave you some feedback to tell you you'd typed in an incorrect filename?

        5. Long John Brass Silver badge

          Re: Two years ago

          Why the hell does rm not return an error when an argument is empty? Surely it's an obvious shortcut to wiping the entire drive if the entry is blank?

          #!/bin/bash

          set -euo pipefail

  9. Dave K Silver badge

    Penny pinching...

    It never ceases to amaze me how companies have documents, systems and other intellectual property worth millions, yet don't want to spend a comparatively paltry amount of money to ensure that information is properly and securely backed up and protected. Many don't learn until something like this happens...

    1. Anonymous Coward
      Anonymous Coward

      Re: Penny pinching...

      I work for a company that took the opposite approach - a huge, robust, multi-terabyte daily/monthly/yearly rolling backup system that significantly increased the (already impressive) storage costs.

      Back in the day I was working tech support for one of the Actuarial teams, and our support manager wandered over to me with a grin on his face. "You know how you raise backup restore requests occasionally?"

      "Yes" say I, not going into the breakdown of how often it's me who deleted things rather than the users

      "Five times in the last two years, out of 8 total."

      "Huh. I would have thought we do more than that."

      "Me too, given the company paid nearly a quarter million for those two years of backups. Per ticket, each of those backup restores cost around thirty grand."

      I just about managed a nervous laugh before he assured me he wasn't going to put it like that for senior management...

      1. Sir Runcible Spoon Silver badge
        Thumb Up

        Re: Penny pinching...

        @AC,

        just think, if you hadn't been the source of some of those deletes (and restore requests) then the cost per restore would have been a *lot* higher :)

      2. keith_w

        Re: Penny pinching...

        I think the real question should have been how much would it have cost to NOT be able to do those restores.

      3. This post has been deleted by its author

      4. Antron Argaiv Silver badge
        Thumb Up

        Re: Penny pinching...

        "Per ticket, each of those backup restores cost around thirty grand."

        However, each of those restores saved the company untold hours (at $X/hr) which would have been required to reconstruct the backed up files. Now, if those files had been critical to business continuity, how much would they have been worth?

        See? It's all in how you look at it.

        1. onefang Silver badge

          Re: Penny pinching...

          Or just count them as test restores, a backup system isn't a real backup system until you regularly succeed at test restores.

      5. Allan George Dyer Silver badge
        Coat

        Re: Penny pinching...

        It all depends how you write it...

        CV of Anonymous Coward

        Reduced per-ticket costs of backup restores by 52 grand

      6. Adam 1 Silver badge

        Re: Penny pinching...

        > Me too, given the company paid nearly a quarter million for those two years of backups

        That's a rather strange method of accounting. Is it a waste that I paid about 1.5K to insure my car last year, but I didn't even have an accident?

        The cost of your restore* is the time taken by whatever person needed to locate the right tape and find the right file(s) plus the lost opportunity cost of whatever that person+equipment would have otherwise been doing.

        *I would argue that the restore was free, the cost was on the unintended deletion or hardware failure.

  10. Unicornpiss Silver badge
    Alert

    Stress..

    Not as bad as some of these stories, but after hours I remotely rebooted a SCO UNIX server that a large call center ran on. It had been acting increasingly strange and I felt a reboot might do it good.

    The call center was about 200 miles away and after issuing the command to reboot, I waited for the machine to begin responding to pings again so I could log back in. And waited.. and waited.. and waited..

    Finally, with a sick feeling in my stomach I began leaving messages for the only other person with a key to the dinky server room and resigned myself to a long early morning drive. I didn't sleep very well that night.

    Fortunately for me, the other key holder had gotten my message and being an early riser herself, had gotten there and saw that the console was stuck at some minor error with "Press Y to continue.", which she did, and which was the only thing required to restore normal operation.

  11. Chairman of the Bored Silver badge
    Pint

    I misread part of the text as...

    "[...] one Friday afternoon five *beers* ago [...] Thats usually the start of data problems, in my experience. Not that I admit anything.

    I have a hazy recollection of being called in to back something up after the proper backup software did not work and doing something like 'sudo dd if=/dev/sda<wrong> of=/dev/<mission critical volume> bs=4M status=were_screwed'

    Honestly cannot remember if that was before or after using the intoxicants. And that, your honor, is how it happened

    1. Aladdin Sane Silver badge

      Re: I misread part of the text as...

      "[...] one Friday afternoon five *beers* ago [...]"

      That would be one hell of a start for a sys admin film noire.

  12. anothersortofleave

    That's not how that works

    It cannot have been rm -rf

    rm -rf fails after it removes itself.

    I cannot tell you how I know this, I just do.

    1. Sir Runcible Spoon Silver badge
      Paris Hilton

      Re: That's not how that works

      Does the executable not run from memory once called?

      1. Korev Silver badge
        Headmaster

        Re: That's not how that works

        It basically scans the file system and then creates a whole chain of rm commands. If you try to run it on too many files then it bombs out with an error.

        You can ask me how I know this, but now how that many small files were written to the filesystem in the first place...

        Won't somebody think of the inodes -->

  13. Niall Mac Caughey
    Paris Hilton

    No user input required, but still on the clock

    Many years ago, Autumn 1997 I believe, I was looking after the hardware of a radio broadcast network I's built , or more accurately had designed and had built by people who knew what they were doing - mostly.

    One of those knowledgeable folks had created something that was still uncommon at the time, a hard disk music playout system. It consisted of a pair of identical, mirrored servers, each running a single 20GB SCSI drive (gasp!) via Adaptec cards with the OS being Win 95 SR2 (sharp intake of breath perhaps?).

    The system went in during the summer and performed flawlessly, until 02:00 on an Autumn Sunday morning, when the overnight jock suddenly found himself a little short of material. All of the music on the servers had vanished. He called me and reverted to CDs which, since most of the music was locked in the library, involved the demented jock rushing around a large building trying to find random CDs before the next track ended.

    To make a long story boring, it turned out that Windows had automatically adjusted from BST to GMT and simultaneously the MBR on each disk had been erased. At this juncture I can't remember if if was both copies, but the audio data was still there, just not visible to the system.

    Once I had put everything back together I started making calls. Adaptec, M$ and the writer of the playout software denied all knowledge and responsibility of course, but when I pointed out to the M$ techie that copies of Win 95 SR2 would soon undergo their first "fall back" in the US, there was a looonngg silence.

    The world of Windows 95 didn't end, so I'm guessing it was a glitch in the playout software, but I've always wondered.

    Paris because, well, what would I have preferred to be doing at 02:00?

    1. Antron Argaiv Silver badge

      Re: No user input required, but still on the clock

      Paris because, well, what would I have preferred to be doing at 02:00?

      What?...or who?

    2. The Oncoming Scorn Silver badge
      Pirate

      Re: No user input required, but still on the clock

      I seem to recall that might have been Gemini Radio in Devon, one of the board members\owners was a former Westward\TSW presenter & made a big thing of all the music being stored on HDD.

  14. Anonymous Coward
    Anonymous Coward

    Del *.* - yes of course I'm sure!

    Long ago, in the days of Win3.1, I needed a blank floppy for something. I had a previously used one to hand, so popped it into the drive, dropped from windows to a dos-box, typed a: then del *.*. Are you sure asked the computer? Yes, of course I bloody am, just as i realised I had typed a; instead of a: and watched the contents of c:\windows disappear in front of my eyes...

    Over the next ten minutes Windows died a very slow painful death as parts of it fell over bit by bit until it eventually gave up completely. And I spent the rest of the day re-installing from floppy. Fortuntely it was only the os directory, all my data was safe.

    1. Anonymous Coward
      Anonymous Coward

      Re: Del *.* - yes of course I'm sure!

      About 15 years ago, a Windows 2003 server ran out of disk space. Unfortunately, the kind helpdesk person that noticed the error decided to delete the contents of c:\Windows\System32 to create a bit of space - particularly those files that ended with a .DLL extension.....

      Luckily it was a secondary DC in a call centre, so no service was lost. It was even still running, but quite possibly not for much longer.

      1. DailyLlama
        FAIL

        Re: Del *.* - yes of course I'm sure!

        I once bought two hard drives of the same spec, same manufacturer etc, and it turned out that the serial numbers were 1 character different. I used one for my C: drive, and one for the D: with all my music etc stored on it.

        I was reinstalling Windows (98 SE I believe), and was asked to choose which drive I wanted to format. Given the choice of two seemingly identical serial numbers, I (sleepily) assumed that it was showing me the same one twice, so clicked on the first one.

        I bet you can all guess what happened...

    2. Anonymous Coward
      Anonymous Coward

      Re: Del *.* - yes of course I'm sure!

      Mine pre-dates yours. Running dBase II on a 2 floppy-drive Compaq "portable" in the DOS days. The dBase program floppy was in Drive A: with no write-protect tape on the notch. I wanted to delete the program files I'd been working on on Drive B:. Entered del *.* and immediately went Not That One as the LED in Drive A: lit up. I was able to recover - mostly. DOS didn't actually delete anything with the del command, it just set a flag and wiped the first character of every filename. The IT guys had Peter Norton's utilities. I was able to undelete everything except INSTALL,EXE and UNSTALL.EXE - how to tell the difference when they're missing the first character?

      1. 2Nick3 Bronze badge

        Re: Del *.* - yes of course I'm sure!

        "I was able to undelete everything except INSTALL,EXE and UNSTALL.EXE - how to tell the difference when they're missing the first character?"

        You guess! Make a copy of the disk using diskcopy, then try it on the copy (diskcopy did a sector-by-sectory copy, so would get everything). You see if you were right by running the program and seeing what happens. If you guessed wrong there's only one other option to try. And you can redo the diskcopy, or see if Norton would let you switch the names of the files.

  15. Anonymous Coward
    Anonymous Coward

    You want to login? Not anymore..

    I still remember about 10 years ago making a mistake on a Friday afternoon with group policy. It as always wasn't immediately noticeable due to replication but on the Monday I was asked to look into "login issues" on our domains. The fault was "nobody can login".

    Normally you assume this is user created drama and it's likely someone has locked their account, but nope having checked it NOBODY could login, not even domain admin accounts were getting in. They were all denied access, no permission to login apparently..

    Then I remembered the Friday when I'd been configuring remote access accounts and locking down service accounts, somehow I figured I'd added the wrong group.. into the wrong group.

    Fortunately to save my bacon one of the other techs had left their account logged into the a domain controller over the weekend (we had no time out on accounts only used within the server room) so I was able to undo my mess and blame "transient network issues".

    Was never questioned about it as it was all working by 10am, not even my colleagues were any the wiser.

  16. Anonymous Coward
    Anonymous Coward

    This has never happened to me...

    ...so therefore I have no interesting stories to offer readers. I did once shut down a production WebLogic server, but that was deliberate, and nothing untoward came of it. I've also switched my iMac off from time to time, via the shutdown menu, and it's always restarted perfectly well afterwards.

  17. Anonymous Coward
    Anonymous Coward

    Ben was not chewed out because ...

    The senior engineer probably instituted the practice of no one checking such critical scripts before deployment.

  18. Anonymous Coward
    Anonymous Coward

    Back in several past lives...

    People have occasionally had issues with SQL commands.

    At one place, someone was very confused when the one of the tables on the test system seemed to keep losing data; whenever they tried to do something, it seemed to be nuking all the records in the table except for the one they'd been accessing.

    After a while, we figured out that their UPDATE statement didn't have any filtering clauses on it. So while all the records in the table were still present, they'd all been overwritten with the same set of values!

    Still, at least that was a test platform. A few years later, I was helping a new team member with something, when one of my other team-members came up looking sheepish. Once again, they'd managed to issue an UPDATE statement without any constraints on it. On the live platform. On one of the main tables used for billing.

    To their credit, they had realised what had happened after a few seconds and hit crtl-C to cancel the transaction. Unfortunately, as this didn't seem to be working, they hit crtl-C again, which killed the cancellation request. So the original transaction sailed through, leaving a very large number of customers with the wrong billing information....

    Thankfully, we were able to restore 99% of the data from backups and figured out how to deal with the remainder. But after that, people were very strongly encouraged to use transactions for everything, even - or especially - one-line BAU commands that get used several times a day!

    Equally, I've made a few SQL related SNAFUs. The most recent was a migration exercise for table which holds audit data: clone a table's schema, tinker with the indexes, switch the new table into place and backfill with the data from the old table.

    What I hadn't realised for this particular database is that the customer has an hourly script which refreshes all their data; in some cases, twice[*], for no obvious reason. And this has been running for years. So for some items, there were 30,000 audit-trail entries, rather than the 5-10 we'd normally expect!

    Having discovered this, I hit crtl-C (just the once, mind), but the rollback took several hours to complete. On the bright side, this did mean that I got to learn lots of interesting information about how MySQL rollbacks work, while googling to try and figure out if the rollback had hung/failed...

    1. Doctor Syntax Silver badge

      Re: Back in several past lives...

      "Once again, they'd managed to issue an UPDATE statement without any constraints on it. "

      1. Always, always, ALWAYS do stuff like this as a script. Get into the habit so that you automatically won't run it straight from the command line.

      2. Write the WHERE clause and test it in a SELECT. (If you know what to expect just select a COUNT without getting all the data streamed out to screen).

      3. Ask yourself if what the SELECT returns is sensible.

      4. When you're convinced it's right add BEGIN WORK at the start of the script and convert the test statement to your risky DELETE or UPDATE. Do NOT add a COMMIT to your script.

      5. Run the script.

      6. Check how many rows were affected. If and only if the result looks right type in your COMMIT, otherwise ROLLBACK.

      Scripts are your friend.

    2. Anonymous Coward
      Anonymous Coward

      Re: Back in several past lives...

      If I was re-inventing SQL, the absence of a where clause would imply zero rows rather than all of them; and listing multiple tables without some sort of join criteria that covered every table would be an error.

      1. Anonymous Coward
        Anonymous Coward

        Re: Back in several past lives...

        It's worth noting that MySQL has the safe-updates (aka iamadummy) option for precisely this sort of scenario, and it looks to have been around for at least 15 years. Quite why it wasn't turned on at this particular company is left as a thought exercise for the reader...

        When it comes to things like scripting and testing changes, that's all well and good if you have the time and warm bodies - and test platforms with representative data. In this case, we were distinctly lacking in all three, not least because internal politics and bureaucratic overheads meant that you spent more time begging/borrowing/blackmailing people to progress the paperwork than you did in doing the actual work.

        Which ironically, often led to things being rushed or done under the radar, simply because people didn't want to spend (literally) several weeks bouncing around the change request process for something that was actively causing live customer issues and could be sorted out in 20 seconds with a simple SQL statement [*].

        And that's how you get screw-ups.

        [*] This isn't quite what happened in this particular case, but there was at least one formal disciplinary for someone who decided to invoke the JFDI principle when dealing with something - even though in that instance, their fix worked...

        1. Doctor Syntax Silver badge

          Re: Back in several past lives...

          "When it comes to things like scripting and testing changes, that's all well and good if you have the time and warm bodies - and test platforms with representative data."

          As a DBA you are responsible for the records on which your employer depends to conduct their business. To quote Len Deighton, the price of survival is eternal paranoia, vigilance is not enough.

          What you do not have is time or sufficient warm bodies for getting it wrong because when (not if) you have to fix it after an over-hasty operation it will take a lot more of those resources. At the very least put a potentially destructive operation in a transaction and only commit if the number of rows affected looks right. As to test platforms, how often do we hear, right here in el Reg comments, of people thinking they're on test and then finding they've just done something terrible on live.

          Scripts don't - shouldn't in this context - mean something with a high ceremony sign off. They simply mean typing something into a file instead of into a command prompt which can give you a chance of rehearsing it non-destructively - give that expression to ls, not to rm -rf, count the rows a WHERE clause returns etc - and and a chance to take a second look before it's too late.

          More haste less speed. Or, if you prefer, the carpenters' motto: measure twice, cut once.

          1. Anonymous Coward
            Anonymous Coward

            Re: Back in several past lives...

            "As a DBA you are responsible for the records on which your employer depends to conduct their business. To quote Len Deighton, the price of survival is eternal paranoia, vigilance is not enough."

            Oh, I fully agree. Except... the company's only DBA only looked after the "physical" database: day-to-day maintenance of the actual data was the responsibility of an infrastructure team comprised entirely of junior engineers - and it was just one of several dozen jobs that they were expected to manage on a daily basis.

            "Scripts don't - shouldn't in this context - mean something with a high ceremony sign off. They simply mean typing something into a file instead of into a command prompt which can give you a chance of rehearsing it non-destructively - give that expression to ls, not to rm -rf, count the rows a WHERE clause returns etc - and and a chance to take a second look before it's too late."

            And we did indeed have standard documented processes which used safeguards such as those above.

            But here's where I get a bit ranty ;)

            It's all very well and good to declare that everything should be done The Right Way; it's a principle I strongly agree with and I generally do my best to adhere to. But in the real world - at least in some companies - that isn't always possible.

            If you underpay people, allow internal politics to take priority, balk at upskilling, refuse to replace people when they leave, constantly favour New Shiny over platform maintenance and upgrades, *and* continually add more red tape to every internal process, then don't be surprised when you get a drop in morale and the quality of work, while at the same time getting a rise in the frequency of incidents and the number of resignation notices. Including mine.

            I'm very much not sorry about having left that culture behind!

  19. Hans 1 Silver badge

    He who uses a variable without making sure IT IS SET deserves all he gets ... yes, I learned that lesson as well ... but on rm -rf YOU BLOODY MAKE SURE THE VAR IS SET ... yes, even drunk, on a Saturday early morning with a picked up babe in my bed I make bloody sure that var is set ... rm -rf without due diligence is like driving without a seat belt ... I feel naked ... I cannot drive for more than 5 yards without seat belt ... I cannot hit enter without making 100% sure everything is ok ... well, tbh, it has bitten me so many times ... so yeah, that could hapen to me also, on the test server, though ;-)

  20. Bucky 2

    All clients do this

    My best guess is that clients resent when there is a problem so intensely, that they gather all their cognitive resources to determine the most effective way to transmit their anger and frustration to the developer.

    The obvious solution is to wait until end of day Friday.

    Clients who limit themselves to emotions no stronger than simple personal hatred only wait until exactly lunchtime.

  21. Oh Homer
    Trollface

    Never rm -fr relative links

    Always explicitly name the directory.

    This hindsight is brought to you by a recording made yesterday, which was accidentally deleted and thus you never saw it.

    Carry on...

    1. Alan J. Wylie

      Re: Never rm -fr relative links

      And be especially cautious about

      find -L ... | xargs rm
      and absolute links in chroot/container trees.

  22. BinkyTheMagicPaperclip Silver badge

    There was a large company with a very resilient back end

    This was the days of a piece of data polling software with a Novell SFT III back end - so two servers with RAID, and both mirrored to one another.

    At this point there was some sort of hardware issue, and one of the SFT III servers was removed, with no interruption in service - lovely!

    The server was fixed, re-introduced, and SFT III mirroring was turned on again. In the wrong direction. An hour later there was a highly resilient mirror of blank systems.

    We advised them to restore from backup - there wasn't one. 'How about the single floppy configuration backup job?' (that would at least have preserved system scripts) - that wasn't running either.

    An engineer had to sit on site for a week, recreating the config from printouts and specifications..

  23. Doctor Syntax Silver badge

    Something odd going on here

    rm -r simply unlinks the data (including, of course, the directories themselves) from the directory tree. The data is still on disk if not organised in a friendly way; simply running od on the device sees it. Even if /dev is gone it should be visible if the drive is hooked up to another machine. If the data recovery company couldn't find any data then either they were not fit for purpose or something considerably stronger than rm hosed the drives.

    1. Brewster's Angle Grinder Silver badge

      Re: Something odd going on here

      But the blocks used would be marked as free. And, in ext2, the allocation information for files larger than 12 blocks is "data" that is dynamically allocated. So if these indirect blocks get reused before `rm -rf /` completes (and while you're busy "backing up" the filesystem, other processes are appending to logs and writing data) then you'll only be recovering text files.

      Apparently ext3 is even worse: "In order to ensure that ext3 can safely resume an unlink after a crash, it actually zeros out the block pointers in the inode, whereas ext2 just marks these blocks as unused in the block bitmaps and marks the inode as "deleted" and leaves the block pointers alone." (?I presume ext4 inherits this?)

      Contrasts that with FAT, where there's a good chance the allocation chain is intact. So even if the directory entry has been overwritten, you can probably recover the file; all you have to do is work out which file it is. (And if you're really lucky, you can find an old directory that tells you.)

      1. Doctor Syntax Silver badge

        Re: Something odd going on here

        "So if these indirect blocks get reused before `rm -rf /` completes (and while you're busy "backing up" the filesystem, other processes are appending to logs and writing data) then you'll only be recovering text files."

        But that's still data. According to the article there wasn't any.

    2. This post has been deleted by its author

  24. John Styles

    My favourite accidentally deleting things story, which I'm sure you will agree is the platonic ideal of a dull historic IT anecdote (I have probably told you all this one before, too)...

    At some point in the 80s we were using both PCs and some strange things called Sages running CP/M 68K.

    The Sages were configured with the hard disk partitioned into 2MB chunks (the limit) with

    A = operating system and toold

    B = source code

    C / D / E etc. = customer data or more copies of the source code

    P = floppy drive

    (the idea being that the number of had disk partitions depended on the size of the hard disk but P was always the floppy)

    So in summary

    PC = A: floppy C: system

    Sage = P: floppy A: system

    Now, a moment's inattention on a Sage, you format the floppy FORMAT A: - bye bye OS and tools.

    The other fun property of Sages was hard disks very averse to the computer being dropped.

    1. Doctor Syntax Silver badge

      "The other fun property of Sages was hard disks very averse to the computer being dropped."

      Back in the day I remember a salesman dropping a Fortune (also a 68k box) from waist height - and it survived. I still can't understand how although the fact that it wasn't running probably helped.

      OTOH I've dropped a hard drive about 6" whilst installing it and it was completely dead.

      1. onefang Silver badge

        "Back in the day I remember a salesman dropping a Fortune (also a 68k box) from waist height - and it survived. I still can't understand how although the fact that it wasn't running probably helped."

        I remember an ancient hardware fixing technique, drop a computer from a height of about six inches. If some of the socketed chips (in those days that would be most of them) where a little loose, this reseated them, thus fixing the problem. Have to do it the correct way around, or the chips did the opposite and fell out. Yes, you do this while the rust isn't spinning.

        1. Doctor Syntax Silver badge

          "f some of the socketed chips (in those days that would be most of them) where a little loose, this reseated them, thus fixing the problem."

          All except those which are jolted out of their seats completely. Phase 2 is to gather up the loose chips & decide which sockets to put them back into.

  25. lowjik

    the great shutodwn of 2008

    Many moons ago as a fledgling junior dev in a guest access solutions startup company:

    Friday at beer o'clock and everyone leaves the office just a tad early as the bosses weren't in that day. Got about 2 miles down the M62 before one of the bosses was calling asking why we just had a spike in support calls and logs all claiming login failures of some kind ...

    Turning around at the next junction and heading back to the office in time to see our network admin and said senior dev pulling in to the car park too - all got the same call obviously.... worrying

    We start looking for logs and find that nothing can talk to auth1 our central authentication server (auth2 was not quite configured as a master-master replica yet but the db cluster was just fine) and there was no response when trying to SSH in to this machine.... panic growing

    A call to Rackspace fanatical support to find out more reveals the machine has indeed been shutdown - would we like to start it back up? Yes. Yes we would, very much please

    Patiently waiting for said Gentoo production server (ask our network guy) to respond to a network ping eventually it did! However many services are not running ... a few simply needing starting, one or two had issues that were trivial to resolve - got it back running eventually with around 1500 authentication attempts having failed during the ~1hour of downtime, ouch

    The logs reveal that it was indeed our senior dev who typed shutdown -h now on the the machine. He was then ridiculed for some time and the resident gentoo expert aliased shutdown to echo "I'm sorry Dave, I can't do that"

    This then lead to a series of "hilarious" aliasing wars across various workstations and dev servers. Try to stop someone from being able to fix their alias problem after everything useful is aliased to something funny is quite difficult on such flexible systems. Fun though, for a given range of fun

    1. 2Nick3 Bronze badge

      Re: the great shutodwn of 2008

      Had a coworker who was famous, or rather infamous, for his pranks, log into my Fedora workstation and did a bunch of aliasing and keyboard remapping. A box the team used for running scripts to produce customer reports - very much in production. At the time I was very light on Linux skills, and in an attempt to bypass his "fun" thought logging in as root would do the trick (Yes, yes, I know - I already admitted I was very light on Linux skills...). The only thing that did was show me the IP address of the last root login, which resolved to his workstation.

      A quick note to him, copying our boss, with a screenshot of the last login info as the body and the subject line of "Fix it" had me back up and functional in just a few minutes, and showed my machine was off-limits for that kind of shenanigans.

  26. onefang Silver badge
    FAIL

    Some time last century a client was complaining to me about some rather important specialised OS they where using, that had managed to resist all methods he had tried to back it up. Penguinista that I am, I boasted that I could do it using what ever version of Linux I was running at the time. So he hands me his one and only copy of said impossible to backup OS on it's hard drive, and a suitably sized new and empty hard drive on which to back it up. Naturally while typing the dd command I thought would do the trick, I got i and o the wrong way around...

  27. DougS Silver badge

    Can hardly blame him

    First of all he's a software developer that's been pressed into service as a sysadmin. OK, pretty much all sysadmins start out as something else since you can't get a college degree as a sysadmin, but still. Second, he's not even a senior guy, as can be inferred since he consulted a guy more senior than him after this happened.

    The blame is with the company who are too cheap to get proper sysadmins, or if it was a small enough company that maybe that wasn't feasible should have at least made allowances for that and given him more time to develop and properly test the scripts.

    Though I do blame him for having a script execute ANY 'rm' command involving a variable without doing a LOT of sanity checking to make sure the variable is set and set to a value that was expected, and if removing a relative path that the current working directory is as expected.

  28. Stevie Silver badge

    Bah!

    Some years ago I worked at a site with a manufacturing system underpinned by a network database (the sort that has sets organized by linked lists). In this case, the design was complex because parts could be, well, parts or sub-assemblies made of - you guessed it - parts. Classic "employee table" conundrum from Databse 101. At this point we were piloting and had no users, only programmers and chief programmers (long ago, trad DP department org tree).

    To construct some of the reports for the factory it was necessary to walk the entire inventory of parts - expensive with a brute-force page-by-page walk over the hundreds of thousands of parts involved. So there was a set defined as having all "part" records as members and a made-up "gpart" as the owner. One could issue a find on gpart, then fetch every part in gpart-part set.

    My colleague, recently promoted to "the" DBA, issued a DELETE ALL GPART to test out his mad DML skills.

    (It is probably helpful at this point for all those "relational or doesn't happen" DBAs to understand WTF is going on to say that the ALL is equivalent to CASCADE, and to see the salient line from the DMS manual: Care must be taken in the use of the DELETE ALL option as large portions of the database may be inadvertently deleted).

    The machine, a brand spanking new Sperry Univac 1100-60, slowed to a crawl and database accessibility dropped to nil. About an hour in, the DBA realized he had Made His Mistake and cancelled the transaction with @@X - the "kill -15" of the Unisys world - graceful death please.

    This initiated rollback, so no improvement seen. Now Mr Genius, analyzing the problem as "it didn't work" in his panic, issued an @@X TIO which is the "kill -9" of the Unisys world - "die please and discard all blocking input, output and error messages".

    So the rollback abruptly terminated leaving the database not only with large portions inadvertently deleted, but inconsistent into the bargain.

    Had to completely restore the database from a (static) save taken after the D/B was built out by a very tired and grumpy Sperry-Univac tech. Oh how we laughed.

  29. Stevie Silver badge

    Bah!

    And this just in - someone appears to be testing on the live Register server because I'm getting daft capcha requests on previews now and the comments allow over an hour to get your final edit in - but only on the last comment made.

    Nice to know the VC DC is as infested as everyone else with danger and excitement.

    1. Doctor Syntax Silver badge

      Re: Bah!

      "someone appears to be testing on the live Register server because I'm getting daft capcha requests on previews now"

      Did you try typing in something that might have looked like a critical file name? I've seen that in the past and I just got the same thing by typing in the full pathname of the Unix password file and got the same thing. By calling it "the password file" I got round it. I've noticed it before. It seems to be some input sanitisation. My experience in the past is that the capcha doesn't work, possibly because I have NoScript firmly tied down.

  30. Anonymous Coward
    Anonymous Coward

    This from an old boss I had. He said the customer is not always right and if some thing looks slightly funny asked,get permission and document it . He told me when he first was starting out a client for some reason asked him to make a change to the SQL server. He said some thing did not look right and he asked some more senior . Him response was wtf would you do this, he then showed him the email. Apparently this would of fubared the data base. He told me this was the same company that thought raid was a back up solution and would use the cheapest UPS you could find for the servers. He was glad when his company lost the contract to a screw up the client made but blamed them for.

  31. Dr. Ellen
    Boffin

    And now for something completely different:

    I had an inverse experience of this sort. Way back in the 1960s, I had a night shift on the CDC 3100. (It had a magnificent 12K of 24-bit words!) It wedged. Nothing I could do made it work again. Frustrated, I began pouring curse-words into the console typewriter. Somewhere in all this foulness I must have done something right, because it began working again -- properly. The digital gods answered my prayer.

    They really shouldn't stick grad students with the night shift, but I came out okay that time.

  32. vincent himpe

    @#$% keyboard designers...

    win95 era... The first keyboards with power / sleep / wake buttons. Positioned between the cursor keys and Delete/End/Page down keys ...

    I was working on a critical machine that ran tests. This thing had been running for a while and it was time to save some data. I was keying in a command and , while reaching for the 'Delete key' i hit the 'power down key' ( i was typing blind and was used to HP workstation keyboards where the arrow keys 'touch' the row containing Delete/end/Page down, so for me the bottom left key above the arrow keys was 'delete', while ont hes keyboard it was 'power down')

    -click- pieuwww ( drive spins down )

    $*#(@* who in his right mind designs a keyboard that can shut down the machine and places the keys there !

    I took the keyboard, cut the ps/2 cable and then grabbed the keyboard firmly on one side and whacked it into the edge of the bench. this essentially cracked the damn thing in half, while showering my nearby colleagues in key-caps... After which i threw the broken in half keyboard on the desk of the guy responsible for buying our IT equipment while stating in a clear voice : if you ever order this type again ... your head is next ...

    Fast forward to a few years later when we were moving the lab .. we moved some cabinets and someone found a few keycaps.... and then the questions came.

    This is now part of company history known as The exploding keyboard incident...

    here is a picture of such a keyboard ... [img]https://i.ebayimg.com/images/g/vDYAAOSw5ShZzY9y/s-l1600.jpg[/img]

  33. Anonymous Coward
    Anonymous Coward

    Shifting over to PC land where REM/F/ takes on a whole new meaning - the lack of (l)user back up never ceases to amaze me. We all have stories of spilled soda or clicking noises are followed by weeping and gnashing of teeth - but I will never forget the day when the head of IT for a huge organization came into our store with a clicky-clicky problem and a sob story of a major court case where the data on the drive (in a laptop) was critical and needed Monday (It was Friday afternoon) - I may not be allowed to be a BOFH, but there is some satisfaction in being able to be a B(Manager)FH who gets to skewer the above by asking if they had a backup of the data, knowing full well that I was looking at a person mentally getting his CV in order and planning on changing his name and relocating immediately to another state.......

  34. martinusher Silver badge

    Been there, done that....

    XENIX system, late 1980s. Lousy keyboard, root user, typed 'rm -rf *.lst', got a rouge space in, realized the problem, stopped the command -- but not before it had removed the system files. Since this was a critical development machine and it was back in the days when restoring an *ix system was a lot of work the only solution was to leave the system running 24/7 and hope that there wasn't a power glitch before the work was completed.

    Lesson learned. You only do something that dumb once. Hopefully.

    1. Shall I raise my aspect now?

      Re: Been there, done that....

      Ahh memories.

      Many beers ago, I worked for a tla from Massachusetts that does not now exist.

      Sent to a customer who could not boot their vax.

      Could not solve it on-site so took the disk back to the office and started to step thru the boot sequence.

      Got to the part where it loaded dcltables.exe and it failed. File missing!

      Spoke to the customer and he mentioned that they needed some space on the disk so they had deleted it!

      Dropped another dcltables.exe in and the system would come up to the point where they could do some work (minus stuff that may have been added to dcltables.exe)

  35. Doctor Syntax Silver badge

    In the real old days there weren't Unix commands for adding users, you just edited the password file.

    Up and down arrow keys and the like on terminals issued escape sequence which tended to contain tildes. A bit of a timing error in transmission and the escape sequence would get misinterpreted and if you were unlucky vi applied the tilde to change toe case of the character under the cursor.

    That's how I came to have the first name in my password file spelled "Root". I can't remember the reason why I couldn't then su back to root - possibly su didn't like the spelling. But also in those real old days if the first character you gave to a login prompt was upper case the terminal driver assumed you were on a TTY that only had upper case and obligingly changed all your characters to lower case so trying to login as Root failed because you were effectively logging in as root. Needless to say that back in those days sudo wasn't a thing. I had visions of crashing the whole machine with the power switch to gain root access in single user on reboot.

    Fortunately I found someone had still got a root session open so we were able to fix it. But for a long time after that I got into the habit of having an alias of root - an entry with a different name but still with a UID of 0 - well down the password file.

    1. onefang Silver badge

      "Up and down arrow keys and the like on terminals issued escape sequence which tended to contain tildes."

      Modern USB keyboards on desktops running Unix variants still do that in the terminal program. Dunno about Windows.

      1. Doctor Syntax Silver badge

        "Modern USB keyboards on desktops running Unix variants still do that in the terminal program."

        It's not restricted to USB keyboards, in fact it's not the keyboard itself. The keyboard driver converts the key presses and releases into emulation of some sort of traditional keyboard input, typically that of the VT100 or one of its descendants.

    2. Peter Mc Aulay

      I worked at a place that did this (duplicate uid 0 user). Until someone did a "userdel -r" of that user, not knowing what it was for, following an audit recommendation.

  36. Paul

    Many years ago I inherited a Sun workstation to look after because I'd shown some interest in learning about it. I was fairly proficient in DOS, and had used unix a bit at university. It was the only unix box in the company, and had been installed with the new fangled Mosaic web browser and had a 4800 baud modem to attach to the internet, wow! People would login and spend 15 minutes doing this new email thing!

    One job was to apply a bunch of patches to upgrade the system. The instructions said to reboot into single user mode, and make sure the /tmp directory was empty. I did that, and deleted all the files in /tmp. But I did "ls -la" and there were a load of files starting with a dot. So I did "rm -rf .*". After a long time, I wondered what it was doing. I ran "ls" and it failed, I forget, but it was clear the machine was f**ked. The rm command had traversed to .. and had deleted pretty much everything.

    The boss asked why I'd not backed it up, because the machine had a floppy drive! I think it had an amazing 40MB drive, so I said it was impractical to feed 60 floppies in (720K), even if we had them. He wasn't impressed. He later bought an adaptor card to use with a Plasmon (I think) optical disk storage thing.

    So, I ended up reinstalling the system from scratch.

  37. JeffyPoooh Silver badge
    Pint

    "...backups taken some three months ago..."

    Isn't that -^ (ancient backups) an offense worthy of immediate termination?

    Mistakes happen. But months of intentional negligence is inexcusable.

    Perhaps I'm misunderstanding...

  38. jms222

    We had some old build scripts from before virtualisation and containers became all the rage and fakeroot got adopted that had to run as the super-user (don't ask) and did this

    rm -rf ${BUILD_DEST}/

    but didn't check the variable first.

    Now one advantage of proper hard disks is that you have a few seconds to realise what's going on.

  39. Herby Silver badge

    Lesson learned...

    This is a general comment about lessons learned. If the "cost" of the failure is large, it is a lesson learned quite thoroughly. If the "cost" is minimal, you gloss over it and learn it many times before it sinks in. When it comes to backups, the initial "cost" is quite small and the lesson isn't learned, but when it becomes "large" it is like being hit on the head with a 2x4.

    So goes human nature (*SIGH*).

  40. Maelstorm Bronze badge

    Clients...

    Recently, I had a client who runs a bunch of Macs for their business. This was an industrial outfit where they repaired industrial equipment. One of the ladies who *used* to work in the office screwed up the secretary's workstation (there was two). The complaint was that all icons on the desktop vanished. Being a Unix guru, I open the terminal program and go looking. I find that somehow, the owner of the desktop folder changed to a different user. Now mind you, this is inside the user's home directory. Furthermore, you have to be root to even be able to run chown, which the user account wasn't.

    Once I logged into the terminal as root, I was able to change the owner back to what it was supposed to be. A logout and login later, icons were on the desktop again. Now I am no expert on Macs, but I'm still left wondering how that even happened since Unix systems (Which Mac OSX is, btw), is supposed to prevent something like that from happening.

  41. Maelstorm Bronze badge

    And apache marches on....

    Recently, I was a member of a web development team writing a custom application for a client using the LAMP stack. Part of the design was that web pages that requires a huge amount of processing to generate on the fly but didn't change very often was regenerated on request by a manager through the web application. A large amount of the processing entailed many SQL queries and server processing to match up all the data. So, a manager made this request. They changed some of the data, and made the request again. The second request failed with a filesystem error. You know when they say hindsight is 20/20? The manager came to us so we were looking at the generated file. We tried generating it and it was giving us the same error. Remember, this system wasn't online yet. So I tried to make changes to the file directly and we found that we couldn't save the changes either.

    After a short investigation, it was discovered that the owner of the file is www. Then it dawns on me that since the file is auto-generated, the web server is the owner of the file, and we, the developers, didn't have permission to alter it. Additionally, for some strange reason, the apache web server software was configured to use a umask of 0222 instead of 022. We had a long talk with the sysadmin who set the server up.

    It was minor, but still caused problems nonetheless. After this happened, I managed to get the root password of the server from a very reluctant sysadmin. Eventually, he saw it my way. I am not going to disclose the techniques that I used to get that password though in case he might be reading this.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019