back to article College student with 'visions of writing super-cool scripts' almost wipes out faculty's entire system

Monday has once more reared its ugly head, but brings with it the charming face of Who, Me?, El Reg's weekly look at cringeworthy events of readers' pasts. This week, we meet "Ted", who tells us of a time many years ago when he was at a local college taking a course in computing. "At the time, we were one of the first to go …

  1. Anonymous Coward
    Anonymous Coward

    Never gets old the tales of some fresh out of college/university student with bright ideas only to kill off a system or two.

    It's a good teaching tool. Never make that mistake again. *cough*

    1. big_D Silver badge

      No matter how careful you are...

      I was making changes to some code, I made some changes, then had a brainwave. I stopped what I was doing and made a copy (.cobol) of the source code I was working on (.cob), then went back to work on the .cob files, putting my brainwave into effect. Tested it, it worked perfectly.

      I then "cleaned" up the directory, wanting to delete the partially complete code (.cobol), only I got as far as del *.cob;* and hit enter... Then screamed! Then copied the .cobol back to .cob and did all the changes again, at least it only took 2 days the second time around, because I knew what I was doing... Lesson learned.

      1. Anonymous Coward
        Anonymous Coward

        Still in college, two weeks into my first part time development job* as the only employee ( literally just boss with an idea and me ) and before I'd gotten around to setting up VCS or even backups and I deleted the whole lot of my work by mistake.

        I had previously had all the files open but for reasons lost to the mists of time I had closed the text editor. Fortunately it occurred to m to dd if=/dev/mem of=~/mem and I stumbled across my files in one pretty much contiguous block. Two weeks of awful, terrible code saved.

        That was the day that I learned that you cannot be overzealous about backups.

        * How that company didn't go bust I don't know. I had very little idea what I was doing and the "firm" eventually hired another developer who was even less competent than me. It's still going and doing well as far as I know. I feel shame that people are likely still working with/on the code that dribbled out of my eighteen year old brain.

        1. Richard 12 Silver badge

          This is why I love git

          The moment I start a project, even a throwaway, it's in local source control and I commit as often as I save. It's trivial to do and (with some IDEs) can even happen automatically.

          If it turns out to work, I can make the result "public" without letting anyone see the early rubbish...

          1. Michael Wojcik Silver badge

            Re: This is why I love git

            Yes. Source control / change management Is Your Friend. Want to experiment? Take a private branch and bang away to your heart's content. Commit early and often.

            Though personally I always try to use remote source control, even if it's just to a server on a machine in the next room, for redundancy. I've had drives fail hard with no warning, so it's nice to have a copy of what-I-was-doing-5-minutes-ago to hand.

      2. Yet Another Anonymous coward Silver badge

        >Never gets old the tales of some fresh out of college/university student with bright ideas only to kill off a system or two.

        Some student in Finland wiped out his minix install when he accidentally connected his dialup script to /dev/root_partition instead of /dev/modem.

        And instead of reinstalling he decided he would get this home made alternative really working

    2. PK

      I still remember the day I found out chown -R .* goes up as well as down.

      1. Naich

        find is your friend

        These days I tend to use find for anything simpler than doing stuff in the current directory. You can make sure that it's getting exactly the files you want and then add "-exec..." to do the work on them.

        1. stiine Silver badge

          Re: find is your friend

          if you can remember the syntax...which I cannot, ever.

          1. Yet Another Anonymous coward Silver badge

            Re: find is your friend

            I always use: find | awk 'print the command' > doit.sh

            Then examine the doit for Murphy and his law, fix any odd edge cases manually and ten run that.

          2. Dan 55 Silver badge

            Re: find is your friend

            Stick echo after -exec and run it so it shows you what it's going to do. Then re-run without the echo if you agree with that.

        2. ciaran
          Boffin

          Re: find is your friend

          User the magic command "xargs". So for example

          $ find . -name "*.bak" | xargs echo rm

          Then actually do it...

          $ find . -name "*.bak" | xargs rm

          Its a must-have tool. Well worth practicing...

          1. JoelLkins

            Re: find is your friend

            $ find . -name "*.bak" -exec echo rm "{}" \;

            $ find . -name "*.bak" -exec rm "{}" \;

            Rule for exec: whatever is found replaces the curly brackets. Surround by quotes in case the filename has spaces or other junk, always end command with \;

            1. Michael Wojcik Silver badge

              Re: find is your friend

              Surround by quotes in case the filename has spaces or other junk

              That has no effect - the quotes are eaten by the shell when it word-splits the find command line. Unless your shell interprets {} specially, there's no point in quoting it.

              Find's -exec logic should already prevent word-splitting of the filename. Note, though, that this is not guaranteed by the Single UNIX Specification. The Rationale section for the find command in the SUS suggests that correct word-splitting will be done if you use the plus-sign terminator rather than the semicolon, but as that changes processing in other ways (it basically emulates xargs) it may not be appropriate in a given use case, and I think the Rationale is non-normative anyway.

              You might think that you can use escaped quotation characters around {} just in case you don't trust find, but whether {} is expanded as part of an argument to -exec is implementation-defined.

          2. Michael Wojcik Silver badge

            Re: find is your friend

            [Use] the magic command "xargs". So for example

            $ find . -name "*.bak" | xargs echo rm

            In general, it's a good idea to use -print0 on the find, and -0 on the xargs, just in case some ninny has used a filename with a space somewhere in the tree. There's a performance hit (you're back to one execution of the command line per selected filesystem item, just as you are with find's -exec), but on modern systems it's usually negligible because process creation is so fast.

            Some might say that there's no point in find -print0 | xargs -0, and you should just use -exec. Mostly it's a matter of taste.

            1. Michael Wojcik Silver badge

              Re: find is your friend

              Some might say that there's no point in find -print0 | xargs -0, and you should just use -exec. Mostly it's a matter of taste.

              .... except, I should note, that find pays attention to the exit code of the program executed by -exec, which affects any find clauses after -exec, and the final exit code of find itself.

              find is powerful, but there are subtleties.

          3. Jamie Jones Silver badge

            Re: find is your friend

            $ find . -name "*.bak" | xargs rm

            *ouch* :

            $ mkdir -p 'game /etc/passwd over.bak'

            rm: ./game: No such file or directory

            rm: /etc/passwd: Permission denied

            rm: /over.bak: No such file or directory

            Even "cleverer" scripts can be thrown with the following variation:

            $ mkdir -p $'game\012/etc/passwd\012'

            $ touch $'game\012/etc/passwd\012/over.bak'

            $ find . -name '*.bak' | xargs rm

            1. bombastic bob Silver badge

              Re: find is your friend

              yeah this is where you need to investigate the '-print0' and '-0' options for find and xargs, respectively

        3. JLV

          Re: find is your friend

          for the SQL minded, a DELETE is easily previewed with a SELECT

          Select *

          — delete

          from foo where bar...

          Updates are unfortunately not quite as amenable.

          1. Anonymous Coward
            Anonymous Coward

            Re: find is your friend

            for updates i'm not sure about ( or deletes )

            select id from t where f='b'

            -

            update t set f='c' where id in (

            select id from t where f='b'

            );

          2. bombastic bob Silver badge
            Devil

            Re: find is your friend

            oh good point - so swap in 'ls' or 'echo' for the 'rm' as a dry run...

      2. Microchip

        chmod was what got me, same problem. Cue a good few hours long past midnight manually chmodding enough back to get the system to function, so I could do a proper restore off a known good set of ACLs... never made that particular mistake again!

      3. Rufus McDufus

        Yep - did that on a Solaris box a long time ago, and it didn't just go up one level but all the way to the root directory (and everything underneath). That might have been an early Solaris quirk.

        1. Dr. Mouse

          I found a Solaris "quirk" which ended up requiring a restore of several key directories on a main production server.

          My plan was to take a backup of some stuff and restore it it in a different directory to examine the contents. So, a simple tar cvf backup.tar /path/to/important/directory gained me the backup. My problem came when I tried to restore it, on another server, for examination.

          cd ~

          tar xvf backup.tar

          "Why is the production server throwing errors?"

          "Erm..."

          Turns out that, while in Linux the tar command will strip off the leading /, Solaris' version doesn't. So, when I thought I was restoring in my home directory, it was in fact restoring it in its "original" position, overwriting many important files in the process. D'Oh!

          Luckily, this place had a fantastic backup procedure in place, no work was lost and, importantly, the guy who discovered the problem was the backup guy and was discrete...

          Lesson learned: Just because you know something similar to what you are working on does not mean you know the thing you are working on!

    3. stiine Silver badge

      I don't think there's a better way to learn than real life fuckups.

      1. Nunyabiznes

        @stiine

        Preferably someone else's.

        1. Mark 85

          Re: @stiine

          Preferably someone else's.

          Human nature says otherwise. We may hear it, laugh or be appalled by it but it won't stop us from doing something similar ourselves. Sort of like being told not to touch a hot stove as kid.

          1. Anonymous Coward
            Anonymous Coward

            Re: @stiine

            Mmmm, I can honestly say that the worst c@ckup I've ever seen will never be replicated by anyone who saw it happening. Thankfully it was nipped in the bud due to quick action, but it could have hosed hundreds of systems, including those critical for national infrastructure.

            (Details anon for very good reason.)

    4. Mark 85

      It's a good teaching tool. Never make that mistake again. *cough*

      I think making the "big" mistake makes us better at our jobs. We all have our "oh crap" moments and most of us learn from them. Most bosses usually understand only if we can recover from the mistake without too much permanent damage being done and never make that or a similar mistake again.

      1. Guido Esperanto

        then there are..

        those teflon coated mofucks, who seem to manage to screw up the simplest of tasks and then expect you to clean it up...

        often your boss....

        still revenge was a choice internet history served cold...

    5. David Woodhead
      Unhappy

      Re College student ...

      Looking at all the responses below the posting: this UNIX stuff is a piece of cake, isn't it?

  2. A.P. Veening Silver badge

    To err is human

    To really screw up, you need a computer.

    Ted had the right idea as such repetitive jobs are ideal for automation. Unfortunately, it is too easy to screw as he discovered.

    1. Lee D Silver badge

      Re: To err is human

      It's all too easy to screw up... if you don't plan and test.

      I'd have made the script generate a list. Source files, and final destination filename. I literally wouldn't have a command in the script capable of inflicting damage. But I'd still test it only on a sacrificial test user first anyway. And I'd have a little arrow to correctly indicate source-> destination. And I'd only name the variables things like source and destination so I knew.

      After a handful of runs on that one sacrificial test user, checking to see that it did what I'd wanted, I'd apply it to a real user. That would then tell me what it was intending and highlight anything that was a problem - i.e. that it didn't pick up the different user, or just the current folder (I'd still be seeing my test data, etc.). And then I'd check that it didn't actually do anything (modified dates on their folders). Then I'd try it on a handful more real users. Where I'd - speaking from experience! - pick up things like Unicode filenames, spaces in filenames, etc. that throw the script for six.

      Once those were combated, I'd *actually* run it, with a copy command, on a real user, that I'd copy elsewhere first. I'd expect to see a copy of their folder appear in the destination, and the last modified of the source folder stay the same. I'd then df or properties the folders in question to ensure they were the exact same size (i.e. it didn't miss anything!). Hell, that check might as well be in my script so it can yell if it sees a difference, no?

      Then I'd do small groups of ten or so folders (depending on the amount of folders, this could be done by doing all the A's, then the B's, and so on, or some similar division). Which are small enough to back up somewhere locally somewhere safe (i.e. NOT within the folders you're moving!) on either source or destination. There's no shame in automating a task only down to, say, 26 manual executions rather than hundreds or thousands, because you're making sure you're doing it right.

      But, of course, I would not be doing ANY of that on a system that wasn't backed up first.

      And how did I learn these lessons? By trashing a thousand user's home folders? No. By thinking about what would happen if I did... and by also "nearly" trashing one of the folders along that testing route because of a trivial bug (e.g. not taking account of spaces in the filename). People who *choose* to learn by disastrous experience alone after leaping into the problem without thinking "well, this is actually live data, I need to be careful" are the problem. Especially if they are inexperienced.

      It also suggests that such people are hired and put into these positions without having it drummed into them that they *don't* risk data. I had an 18-year-old apprentice who was subject to minimal-required-privilege permission delegation at all times. He never once lost data, despite re-imaging machines, handling storage, resetting user profiles, etc. And just a year or two later he was in charge of the site briefly while I was away and successfully restored an entire hypervisor setup from scratch after a site-wide power failure (crossed phases on an UPS made it hard-shutdown) without guidance - including a folder which remains in place to this day, because it makes me smile every time I see it: A folder on the main storage, dated, named with an expletive ("Help! It's all gone fecking wrong!", say), containing copies of all the existing data that was in place, including copies of all the VM images before he started trying to restore them, so that worst-case, he still had whatever was recoverable there before he needed to dive into backups.

      In 20 years of working network management, I have never deleted a file permanently. It's just that simple. (GDPR's right-to-be-forgotten is going to be a fecking nightmare, however, and I may have to resort to only a four-year backlog where I actually have to delete stuff rather than keep it on encrypted offline storage that I retire and destroy only when it's not been accessed for several years in a row!).

      I can describe every time I've lost data. First ever was our friend pressing "Y" on a CHKDSK without checking on my brother's first ever PC many years ago, even when it mentioned "cross-linked chains" and then proceeding to immediately delete a folder that happened to loop back to include the root (so you had, say, C:\WINDOWS\SYSTEM\WINDOWS\SYSTEM\ etc. etc.) and thus taking out the root of the hard drive... because C:\WINDOWS\SYSTEM actually "linked" with C:\... even then we didn't lose anything. We SCREAMED at him, launched ourselves across the desk, Ctrl-C'd the deletion as we saw DOS and other root-folder filenames whizz past, and literally recreated a bare CONFIG.SYS and AUTOEXEC.BAT from memory (in case the thing rebooted), restored the missing DOS files (the process couldn't take out the COMMAND.COM etc. for obvious reasons) but had a copy on a floppy of everything somewhere.

      I was once accused of losing data at a school I worked at. Turned out that the person in question claimed they had 10 years worth of lesson plans that suddenly "went missing" when I upgraded all the system - an upgrade which hadn't touched any storage whatsoever, only local clients, and for which I had EVERY original local client disk in a box still from before I'd put in fresh blank disks to upgrade the OS with a clean image. As in I literally pulled a dated, numbered hard drive which was originally in their PC and searched it.

      Nothing to do with them being found to have used the same lesson plan for 10 years in a row (which they claimed was just a mistake and obviously they had them all "somewhere"). And absolutely they *must* have had 10 years of more recent lesson plans, obviously! I mean... their lesson plan that they'd written specifically for THAT year wouldn't have included a website that archive.org says has been offline for 7 years, would it? So, yes, I was never actually held to account for that, because we don't think they "lost" any data at all, and it was amazingly specific to have only lost lesson plans for one user, on non-centralised storage that nobody was ever configured to use, for only those years that they'd re-used previous lesson plans and got into trouble for, and not one single other file anywhere. Amazingly, they weren't on the backups either.

      If you work in IT, and you learn only from your own cockups, you need to read more. I can remember even back in the days of PC Pro, reading columns about the "sledgehammer" test - consultants hired to ensure the system is resilient, so laying a sledgehammer on the meeting room and saying to the IT guy "Quite how sure are you?" Even getting verbal permission from the CEO to trash one server and chalk it up to expenses only to see the IT guy gulp... If I remember those stories as a kid, I assure you that you can learn from them before you ever have to. It doesn't take anything special.

      Just think things through. Don't take chances with people's data. Ask if unsure. Don't assume anything ("Of course the destination will have enough space for all those files!"). And test first.

      Hell, I can remember my first Exchange Powershell foray. My first profile move. My first tweak of a GPO and login setup. Everything. Because I isolated, tested, and virtually always found something that suggested "Wow... have to watch out for / code around that". I've had a few heart-stopping moments, but they are of the order of "OMG, that was nearly an 8 hour backup restore there."

      It's ignorance to believe that you can just tinker unchecked with complex systems because you do so all day every day.

      1. sum_of_squares

        Re: To err is human

        tl;dr

        The guy could have done this:

        1) Copy the files via script

        2) Check if all files are there

        3) Delete all files

        No need to create test users, fancy screen displays and whatnot.

        If anything, I'd do a "dry run" with a "echo" before each copy command.

        Testing is for the faint hearted.

        1. dak

          Re: To err is human

          Downvoted for your last line.

          You describe a sensible programme of events that includes testing and then say that testing is for the faint-heated?

          1. I3N
            Pint

            Re: To err is human

            Make sure Emergency Stop button is withing reach ...

            G01 Z1.0 [as in clearance for even the simplest tasks] ....

            Zero Z ...

            Click start ...

        2. Lee D Silver badge

          Re: To err is human

          Not if the reason you're moving them is because you're low on storage space.

          And then you still have the problem... you're issuing a kill-everything delete command over an entire swathe of storage. That's far too easy to go wrong.

          And, as pointed out, you only need to switch source and destination and you've wiped out your originals in one fell swoop via a poorly-written script, and in doing so given yourself a "complete restore" job to get it back, rather than a "Okay, my test on the first guy went wrong, I'll just copy back his storage which I made a safe copy of".

      2. jmch Silver badge

        Re: To err is human

        "I'd have made the script generate a list. Source files, and final destination filename. I literally wouldn't have a command in the script capable of inflicting damage. But I'd still test it only on a sacrificial test user first anyway. And I'd have a little arrow to correctly indicate source-> destination. And I'd only name the variables things like source and destination so I knew.... etc etc"

        All well and good. Thing is, this is what you would have done had you been assigned the task now. I doubt that it's the same things you would have done as an 18-year-old.

        For sure if it was me let loose on that system at 18, I doubt it would have been recoverable!

      3. Stevie

        Re: To err is human

        It's all too easy to screw up... if you don't plan and test.

        Well where's the fun in that?

      4. Steve the Cynic

        Re: To err is human

        > If you work in IT, and you learn only from your own cockups, you need to read more. I can remember even back in the days of PC Pro, reading columns about the "sledgehammer" test - consultants hired to ensure the system is resilient, so laying a sledgehammer on the meeting room and saying to the IT guy "Quite how sure are you?" Even getting verbal permission from the CEO to trash one server and chalk it up to expenses only to see the IT guy gulp...

        I remember that one. The consultant (singular) was Jon Honeyball, and the tool was his well-used industrial-weight chainsaw, not a sledgehammer. It was a board-room power-battle, not (really) a technical issue.

        Honeyball was consulting *for*the*CEO*, who didn't think the CIO was as sure of the redundant servers as he claimed. And he was right. Honeyball suggested the chainsaw test, and the CIO raised a reasonable objection related to the cost of the server that would be destroyed. The CEO said that it wasn't a problem, and that he would sign an authorisation to replace it, seeing as how he had asked for it to be destroyed. The CIO then bottled out and had to admit that he wasn't as sure of the redundant architecture as he had claimed.

        1. tiggity Silver badge

          Re: To err is human

          Not the greatest idea to use a chainsaw on metal *

          Once saw the unfortunate effects of a neighbour using a chainsaw to take down an old timber structure that had a few substantial size nails used in its construction - results of the chain hitting the nail could have been very nasty, and my neighbour was very lucky to escape unscathed.

          actually * Unless it was using carbide bladed chain (or similar) that are designed for dealing with metal. Even "professional lumberjack" use chainsaws do not cope with non trivial amounts of metal.

          1. Michael Wojcik Silver badge

            Re: To err is human

            There's a program on the US cable channel DIY Network named Renovation Realities. It shows people attempting and often failing at some DIY project.

            Sometimes it's because they're idiots who can't be bothered to learn anything about what they're trying to do. (DIY could have named it The Dunning-Kruger Show.) Sometimes the DIYers are well-intentioned and do know something about what they're doing, and just run into the sort of unexpected problems that all DIYers are familiar with.

            There's one episode where a homeowner wants to remove the stump of a tree that had grown into a chain-link fence. So he rents a chainsaw and starts carving away.

            When the saw kicked back, it cut through his shirt at his left shoulder and cut him just enough to get a good bleed going before he stopped it. A centimeter further and he'd have been looking at surgery, at best.

            They showed the clip several times, in slow motion. An excellent demonstration for the viewing audience.

            In one of Robert B. Parker's Spenser novels, the eponymous character admits he's "a little afraid of chainsaws". Smart man. I only have a little electric one myself, but the battery pack doesn't even go into it until I have all my safety gear on and I know exactly what cut I'm going to make, where and how I'll be standing when I do it, and what might happen in the process.

            1. Kiwi
              Coat

              Re: To err is human

              In one of Robert B. Parker's Spenser novels, the eponymous character admits he's "a little afraid of chainsaws".

              I can understand that. I spent a little over a year in "timber processing", running one interesting saw.

              Number of seconds working that saw : roughly 8,640,000 (50hrs x 48wks)

              Number of seconds being involved in an accident with that saw : 3

              Number of seconds spend in ER : 18,000

              A friend has a table saw that he uses quite often. In the 30+ years I've known him where he probably averages an hour a week on it, he's not had so much as a pucker-moment. But because of those 3 seconds of mine where I nearly had a 2x1 enter my chest cavity at high speed (lucky not to even break a bone!) we both give it some very healthy respect. I'm also the same with any powered saw as well. Or grinder. Flesh takes a long time to slow the momentum of those blades!

              (Yes yes, I know I'm late, on my way now...)

    2. Nick Kew

      Re: To err is human

      Scripts are good, but play in a sandbox. The real lesson: don't play with root, or give it to kids.

    3. sisk

      Re: To err is human

      Ted had the right idea as such repetitive jobs are ideal for automation. Unfortunately, it is too easy to screw as he discovered.

      Indeed. I learned long ago that the first run of any automation script needs to have the line that actually does the thing commented out and replaced with an echo. Once you look over the resulting output and confirm it's going to do what you want it to you can go remove the comment and let it do its thing, but you NEVER run an automation script without first confirming that it's going to do what it's meant to do rather than something that's going to take a week of work to fix.

  3. Wellyboot Silver badge

    We've all been there

    Ted learns the 'Know where you are before hitting execute' lesson - it's not a good feeling.

    Nice boss though, keeping it together when faced with ongoing FUBAR, after all, the pre-change backups would have him covered wouldn't they?

    1. Anonymous Coward Silver badge
      Boffin

      Re: We've all been there

      Write the script to check its location (`pwd` and/or `hostname` output) before doing anything dangerous.

    2. bombastic bob Silver badge
      Facepalm

      Re: We've all been there

      worst case scenario: the boss does it

      a) prototype device being tested at the last minute, several engineers doing hands on stuff on a weekend or late at night

      b) no other device available for the test except for that one prototype (i.e. no spare). And we're gonna fix the firmware problems as we find 'em.

      c) The test required comparison of 2 devices, though, one of which had 5V power, the other 3.3v power. And both had the same shaped power plug, and only one could be plugged in at a time.

      d) after an hour or two of tests, the boss plugs 5V into the 3.3v unit, which had no voltage regulator. we all went home. Yes, we saw the blue smoke. No, we couldn't put it back.

  4. Giovani Tapini

    I too have had that

    Oh S**T moment when you look at the command you just submitted and ... realise it was on the wrong system, or you forgot the important switch setting...

    The key of being a professional is knowing what to do next... Sometimes stopping the process can be worse than letting it finish and fixing stuff later.

    In terms of boss panics... I think I cause them more issues politically than technically by... accidentally telling more of the truth than they expect :)

    1. Saruman the White Silver badge

      Re: I too have had that

      I agree. Once logged in to a Sun workstation as "root" with the intention of cleaning out "/tmp" (it was really cluttered and causing problems). I entered the immortal command:

      rm -rf / tmp/*

      Note *very* significant space. Ater a couple of seconds my brain caught up with my fingers. Unfortunately it was too late and I spent the rest of the day reinstalling the OS.

      Lesson learnt, I have never made that mistake since!

      1. PerspexAvenger

        Re: I too have had that

        Similarly, mine was "/fileshare /dirtodelete".

        Huh. This is taking quite a whFUCKFUCKFUCK^C^C^C

        *Confess to boss, schlep off to find backup tapes and enumerate just how far down the alphabet we got*

      2. Rufus McDufus

        Re: I too have had that

        I found a script at a certain bank which did that. Well I found the script after it had destroyed a few systems. It had the immortal "rm -rf /$variable" where of course $variable hadn't been set.

        1. Nick Kew

          Re: I too have had that

          Whoa, that looks like the executive summary of a proper story. A script that didn't test $variable sanity before use, and out there in the wild!

          1. Korev Silver badge
            Linux

            Re: I too have had that

            And use set -u

    2. Anonymous Coward
      Anonymous Coward

      Re: I too have had that

      On the other hand, I did once react quickly enough to Ctrl-C before the filesystem that was about to be wiped had finished mounting.

      What matters is written in big friendly letters on the cover of everybody's favourite guide.

      1. doublelayer Silver badge

        Re: I too have had that

        Another way to do this wrong is to read out terminal commands to people. I remember one particular occasion rather well, when I was at university and helping a beginner student who had a disk quota problem. They had run out of space because their code had many segmentation faults and they hadn't been deleting the resulting core dumps. The problem was that we had a few tools that produced core dumps with different names, so "rm core.*" wouldn't necessarily get them all. So I started reading them a command "rm, then a space, then asterisk core dot asterisk". They didn't quite type that exactly as stated.

        You probably know what happened next. I no longer read out terminal commands. I type them in myself or I'll write them down for you with a written notation to confirm before you run it. By the way, I did have extra rights and was able to recover a relatively new copy of their code from the grading system.

        1. Olivier2553

          Re: I too have had that

          So I started reading them a command "rm, then a space, then asterisk core dot asterisk".

          I start to find out that youngsters are more careful when typing commands that are voiced to them, maybe the new generation learned to listen more carefully than try to interpret what is being said.

          It mostly occurs when I give my WiFi password, which is in a foreign alphabet (to them Thai youngsters, older persons, I type it for them) and includes spaces, usually they get it correct.

    3. Doctor Syntax Silver badge

      Re: I too have had that

      "accidentally telling more of the truth than they expect"

      ...or need to know.

    4. Anonymous Coward
      Anonymous Coward

      Re: I too have had that

      I did that yesterday. The command was "remove server from domain". The server 1) had the wrong prefix (i.e. was in the wrong data canter) and 2) had the wrong suffix (i.e. wasn't the right one of two and best-of-all 3) was the server I was going to be using for the next 5 hours...

      As it turns out,, if you remove a server from a domain, you can add it back to the domain without breaking anything...

      1. LeahroyNake

        Re: I too have had that

        Yeah.... that works great with Exchange :0

    5. Guido Esperanto

      Re: I too have had that

      you need a stark *"oh sh**t" moment at some point in your life to make you realise you are not invincible, god of the opposite sex or god of computers..or perhaps all 3 at the same time.

      It adds a dose of realism and consequence to your lack of attention.

      Like upgrading an ms exchange environment using the disc at hand and only realising at the point the mailboxes don't all convert that somehow you have a standard version rather than enterprise.

      oops

  5. Saruman the White Silver badge

    Head of Faculty was being very fair

    So the only consequence of his screw-up was being made to buy the Head of Faculty a beer and being on the receiving end of a first-class bollocking. He got off lightly, however I suspect that the Head of Faculty realised that he had scared himself completely witless, but at least had been able to repair the damage.

    1. alain williams Silver badge

      Kudos to the Head of Faculty ...

      for helping to fix the problem *first* ... then think about bollocking the student.

      I know too many managers who will rant, rave & place blame (not them) as the first thing that they do.

      1. Anonymous Coward
        Anonymous Coward

        Re: Kudos to the Head of Faculty ...

        The best boss to have is one that has been there, done that, and been allowed to survive. Makes them both more empathetic, and better able to help with recovery sans panic, before they make you buy them a beer, and then zap you with the enhanced voltage bovine sizzle stick.

    2. Gene Cash Silver badge

      Re: Head of Faculty was being very fair

      No kidding.

      My lab manager ripped a roommate a new asshole when he installed that famous hacker tool, rsync.

      Complete over-the-top almost nuclear response to grabbing some GNU sources and compiling them.

    3. Olivier2553

      Re: Head of Faculty was being very fair

      If we admit that every one of us will make such a mistake at least once, better be it while we are still junior and hopefully cannot cause too much damage. After that close call we should be more immune to a repeated mistake, so why should a manager fire en employee that had just improved himself?

      Damage is done and cannot be undone. If money is to be lost, it has already been lost, firing the employee will not magically restore the money, the only thing firing the employee does is helping to release the manager frustration, if the manager needs that sort of release, he is the one whose position should be challenged.

  6. defiler

    Drive mirroring.

    Yeah, you know the old one. Replace the dud drive in a mirror set and accidentally mirror the new drive onto the existing data...

    Never did it myself, but in my NetWare days, my boss did it. To the backup tapes!

    1. Alister

      Re: Drive mirroring.

      accidentally mirror the new drive onto the existing data

      <holds up hand>

      Yep, I've done that, and spent way longer than I should have trying to work out where the data had gone, too.

      Thankfully, I'd done a full backup first, so wasn't a complete idiot.

      1. Yet Another Anonymous coward Silver badge

        Re: Drive mirroring.

        Done that on a fancy new "can't lose any data with this beauty" raid system.

        It seems that WindowsNT and the Raid bios number drives differently.....

      2. Montreal Sean

        Re: Drive mirroring.

        @Alister

        I've done the whole replaced the failed drive in a mirror set and copy the new drive to the old while out on a warranty repair case.

        I hadn't done a backup first (not part of the warranty service), but I had asked the client if he had a backup before I headed to site. And again when I got there. And again after shutting down the non-hotswap "server" before pulling out the failed drive. And again when I inserted the new drive. And finally before putting power to the system.

        When the mirror completed rebuilding there was no boot device.

        Reboot, check BIOS boot order is correct.

        Boot with a live Linux CD, both HDD completely blank. Crap.

        Ask client for install dvd and backups.

        They don't have either one, they had never done a backup.

        I hope they learned their lesson and took backups from then on.

    2. steviebuk Silver badge

      Re: Drive mirroring.

      I did that but only to my home Netgear ReadyNAS. That claimed it was hotswappable. So when one drive failed I put a new drive in. It should of started to copy the data over but it didn't. For some reason it decided to copy the blank drive to the data drive.

      When I realised what had gone wrong, I sat R-Studio sitting there all week checking the drive so I could restore it. Pretty much got everything back.

    3. John Brown (no body) Silver badge

      Re: Drive mirroring.

      "Yeah, you know the old one. Replace the dud drive in a mirror set and accidentally mirror the new drive onto the existing data..."

      Coming from a DOS/Windows background, experimenting with new installs just involved disabling the primary master "C" drive in the BIOS so the installer would see the other drive as a "C" and do the install. Fine, no issue. Then I started playing with FreeBSD. Moral of the story: Never assume an OS will honour the BIOS config settings and ignore "disabled" drives.

      Luckily this was my own system and I did have a backup of my data.

  7. Jimboom

    Macro's

    I had my own tale when I was still in school and helping out at a company that one of my parents worked at. My job was to go delete the thousands and thousands of rubbish or dead entries at the end of a database.

    So, after I had found the macro function I thought I would be really clever by just setting it to go into entry, delete, then next entry and delete, recursively until I stopped it. Then running it and sitting back. I then thought I would be even cleverer by going into the middle of the rubbish entries and start a 2nd instance of the macro (so that they didn't overlap and cause a problem). I then went to lunch very proud of myself. Came back to the desk to find that when you get to the end of the entries then it automatically jumped back to the beginning of the database and the good objects... which is what the 2nd macro did. It was halfway through the A records by the time I stopped it.

    Que one rather panicked call to IT and a restore later and I was back at square one (with all the deleted entries back) with a promise that I would do it the manual way and no more macro's would be used.

    Fun times.

  8. Shadow Systems

    I rebooted the universe...

    If anyone gets a "divide by cheese error" please file a trouble ticket with the help desk & I'll trigger another reboot.

    1. IHateWearingATie

      Re: I rebooted the universe...

      …. or just give the ants some jam to eat

  9. Anonymous Coward
    Anonymous Coward

    I almost pulled the wrong drive out of a raid 5 array once on an on-site many years ago. You learn a lot from almost mistakes.

    1. Rich 11

      I did the same with a drive with a faulty drive activity light. The orange LED would happily flash when tested but not the green one, even though the drive itself was perfectly fine. After ten months of getting used to this I wandered in one Monday morning in a semi-asleep state, having seen an alert about an actual failed drive, and without thinking pulled the one showing no light from the bank of drives. Fortunately it all rebuilt OK, since the failed drive was in the other array.

      1. Anonymous Coward
        Anonymous Coward

        I'm guessing you know ACME, they are brilliant but at the start they were poor.

    2. John Brown (no body) Silver badge

      "I almost pulled the wrong drive out of a raid 5 array once on an on-site many years ago. You learn a lot from almost mistakes."

      Arriving at a customers site at 7:30pm (yes, it was that urgent, out of hours rate) to replace a drive in a three drive array, there was only the security guy left on site. It was so important that no one with any authority bothered to stay behind to make sure everything went to plan. But someone had kindly pulled a the faulty drive out for me. Except the LEDs indicating activity or fault were to the LEFT of the drive, not the RIGHT.

      I left a written note, countersigned by the security guy, telling them that without username/password or knowledge of their system I was unable to even touch their system without further authority in case of any data loss etc No way in hell was I putting the good drive back in and swapping the bad drive out in the hope that it might be alright.

      When they called my boss in the morning, screaming and ranting, he told them in no uncertain terms that they were in the wrong and he'd be happy to terminate contract once they'd paid for the call out. Apparently it all went very quiet, before they agreed I would come out again and work with their IT guy to swap out the HDD once he had brought the degraded system back up.

      Personally, I still to this day think there was something else going on with their system, something catastrophic, and we were being set up to fail and end up paying for data recovery or some such because theoretically it should have been a simple case of replacing the good drive and bringing the system back up with the RAID in a degraded state, at which point replacing the bad drive would instigate a rebuild. Or the IT guy had screwed up and was trying to cover his arse.

      Oh, and once the bill was paid my boss cancelled the co tract with them.

  10. howy
    Mushroom

    Live system - Whilst training

    Did something very similar in late 80's - training on Unix and trained the hard way to create new accounts/directories etc ( editing /etc/passwd directly ) - demonstrated how to create homeless logons etc.

    Finally, on last day of course we were introduced to "easy" way to add/delete users - aka helpful unix scripts adduser/deluser.

    Thought I'd tidy up !!! So I duly run deluser on the homeless account created above.

    Helpful script prompts "Are you sure you want to delete ?" (the spaces here are critical and are hiding a fatal flaw for those uninitiated in Unix)

    Duly press Y and enter and wait patiently.

    After a couple of seconds "rm not found" starts scrolling up the screen ????? Call over the trainer who promptly goes white.

    Yup - deluser found no home directory and promptly issued rm -rf / <SIGH>

    Full rebuild from scratch required - system unavailable to go live following week.

    So not ALWAYS a noob error in scripting.

    1. Giovani Tapini

      Re: Live system - Whilst training

      Good story, and from my experience, and indeed a number of the other stories, always check scripts.

      They are generally rough and ready and do not include failsafes like parameter checking to ensure they don't target root etc.

      I have always told my people to be careful when deploying scripts to "dummy proof" them even for one-off's...

    2. J.G.Harston Silver badge

      Re: Live system - Whilst training

      That's why I usually put paths in quotes in prompts, so you get something like:

      Delete directory '/user/fred'? (Y/N)

      Delete directory '' (Y/N)

      1. Nick Kew

        Re: Live system - Whilst training

        Not good enough.

        If the task is nontrivial enough to be worth scripting, the user will be multitasking with something else, and may not be paying much attention. Your second question comes across as "Delete [implied, unnamed] directory?"

        And of course if it's doing anything in bulk, the user will soon get fed up with repeatedly hitting "Y" (there might just be some buffered "Y"s when your question scrolls up), and want to automate it away. Your script needs to take responsibility for sanity checks.

    3. DeadpoolsITguy

      Re: Live system - Whilst training

      Ha i worked at a company that had us training on their live telephony system and the trainer had not used this version of Mitel. queue her asking me to run a command that did nothing and then asking me to run it again to no avail.

      5 mins later SD manager comes rushing in asking if one of us did something to bring down the system. we all look blankly as we had moved on and were on the next steps.

      2 days later my account is restricted and im getting grilled about why i took the system down.

      still to this day i dont know what command the person wanted me to run and i ended up leaving the place as the manager was jut batshit crazy.

      1. Diogenes

        Re: Live system - Whilst testing

        Live system while testing -

        A large Australian telco which will remain nameless was introducing a new variant of AXE exchanges somewhere in the late 90's or early 00s. My quick & dirty (will only be needed for 6 months until the xxxx system is changed to handle what *** does" which was the embodiment of "there is nothing permanent than a temporary system", and was only retired after 23 years of use .

        Anyhoo, had tested on the 'model' exchanges & all was good, so I was given blocks of telephone numbers & had a small block of LI3s reserved to play with on the live 'pilot' exchange (one of the biggest metro exchanges in the country)... issued the command to create a PBX group, and tried to add the 'root' number ... the terminal into the exchange hung, came back after a minute ... tried entering the command manually so that I could continue the script .., terminal hangs again, hmmm , go get a cuppa while waiting for it to unclag ... come back ... try again hangs again ... tries a different command to hook that number up as an ordinary subscriber, no hang, disconnect the number , tried the PBX command - terminal hangs... phone rings "Diogenes what the expletive expletive expletive are you doing to xxx exchange? You are making the expletive drop ALL calls and restart". Me "Just trying to hook the master number up to the PBX group I just created" "What's the command?" , recited the command , flip flip flip - "ah PB..... SNB= .. add number to PBX group blah bla yeah exactly as per the book " tappity tappity tap "oh expletive expletive expletive - stop testing until we tell you!"

        Ahhhh the good old days :-)

  11. Version 1.0 Silver badge

    We've all been there.

    Great story to start the week El Reg!

    Mistakes are useful - we learn from our mistakes - once this happens you learn to make a backup first, test the script, and write a debug mode into the scripts that reports everything it would do but doesn't do any of the things until you edit a variable at the start of the script.

    I have told folks for years, "There are two types of users in the world, those who have permanently lost their data ... and those who are going to lose their data."

    1. Yet Another Hierachial Anonynmous Coward

      cast the first stone

      Let he, who has never been logged onto multiple machines and multiple sessions, cast the first stone..... Or something like that.

      1. stiine Silver badge
        Pint

        Re: cast the first stone

        I can only upvote you once, so have a virtual drink on me

      2. Stevie

        Re: cast the first stone

        All my colleagues use Putty for ssh connections into the enterprise farm. I use Attachmate Reflections which the department will supply on demand. Typically we each have several sessions open at a time to both dev and prod servers.

        I have a tool that can assign pre-set color profiles to any session. They have pre-set profiles that they can never remember how to use.

        Guess who *doesn't* get confused between production (black text on whit b/g - usually unless I feel like a change), dev (green text on black b/g, or black text on a yellow b/g, or green text on purple b/g*) and AWOOGA! EMERGENCY NEEDS SORTING NOW! (White text on red b/g or red text on white b/g depending on how tired I am and how late in the day it is)?

        Not only that but my Reflections keyboard can be mapped to save much wear and tear on the old fingers and make zipping back and forth over the filesystems of machines easy, especially over a slow and sometimes glitchy remote connection.

        But my terminal emulator isn't "cool" and lacks street cred or something.

        * I call this one "chainsaw" after one such device I saw in Home Despot sporting this virulent color scheme.

        1. Anonymous Coward
          Anonymous Coward

          Re: cast the first stone

          Stevie,

          Thanks for the reminder about Attachmate ...... last used it over 30 years ago !!!

          (Did not know it still existed ??? ..... although on checking it is a 'MicroFocus' product now !!!)

          The colour by session idea was always useful ..... but be aware that an 'accidental' control character or three can upset this. :)

          Also use the ability to change the command prompt to a unique ident for each system as well.

        2. Anonymous Coward
          Anonymous Coward

          Re: cast the first stone

          I use Reflections as well. Production systems get a red background, development a blue one.

          Red means stop and think about what you are about to do...

  12. Unicornpiss
    Pint

    Dawning horror

    I'm not going to share a specific anecdote, but let me say (and I know we've all been there), that there's nothing more horrifying in IT than running a script like this that you expect to take a few seconds to complete, and seeing it run.. and run.. and run.. while you think: "What is it doing??" and then you realize it's doing something very, very bad. Something that's likely to skyrocket your blood pressure and wreck the rest of your day, at the least.

  13. Waseem Alkurdi

    This happened to me a few days ago!

    A classmate, female, presented to me with an iPhone 5 that had symptoms of, well, a locked iPhone whose passcode is forgotten. She asked me to bypass the passcode, so I thought, "Better borrow a laptop then" (Mine is on Linux).

    Restored the phone, gave its owner a thumbs-up, only to find the owner broken and in tears.

    Turns out she wanted me to bypass the damned thing FOR THE DATA.

    Cue an hour trying to explain encryption to her, to no avail, because she believes I was too incompetent to break Apple's encryption.

    Sigh.

    1. Doctor Syntax Silver badge

      Re: This happened to me a few days ago!

      The lesson to learn here is always to find out what you're actually being asked to do.

      1. Waseem Alkurdi

        Re: This happened to me a few days ago!

        Or more precisely, get the luser to really confrm it. I actually told the girl I was going to restore the phone, but she said a'ight, perhaps not knowing why the hell would Apple made the word "restore" translate into "wipe my childhood memories to no return", or possibly because my vocal cords weren't at their best, making my confirmation sound more like a whisper to self.

        Anyhooooo, lesson learned! (Get lusers to sign an accident indemnity form! xD )

    2. Loyal Commenter Silver badge

      Re: This happened to me a few days ago!

      She asked me to bypass the passcode, so I thought, "Better borrow a laptop then" (Mine is on Linux).

      My first thought would be "nope", followed by "why don't you know your passcode? Is this actually your phone?"

      1. Anonymous Coward
        Anonymous Coward

        Re: This happened to me a few days ago!

        I've successfully bypassed the passcode on my own iPad, having failed to write it down anywhere. Apparently if it performs an unencrypted backup to a computer, it creates a plaintext file with the seed and passcode hash, and there's only 10000 possibilities...

        1. Waseem Alkurdi

          Re: This happened to me a few days ago!

          For that to work, it must have been synced to that computer before ... identifying you as a possible owner.

          And you don't need to hack the passcode for that ... the backup alone is enough, provided you get iTunes to make one.

      2. Nick Kew
        Facepalm

        Re: This happened to me a few days ago!

        @Loyal Commenter - would your judgement necessarily be that clear if you had a pleading girl in front of you?

        1. Waseem Alkurdi
          Angel

          Re: This happened to me a few days ago!

          Precisely! xD

        2. Loyal Commenter Silver badge

          Re: This happened to me a few days ago!

          I'd advise thinking with your brain and not your gonads...

  14. Locky
    Mushroom

    Don't forget to test all use cases

    Automation fubars are all too fun.

    Back in the early days of Powershell I had the bright idea that any user moved to a "Staff Leavers" OU to have their our of office set to a "I have now left the company" script, which I created and was tested, or so I thought.

    All was going well, until a day when there were no users in the folder, and I found a feature of Powershell 1.1 where if no results were found all mailboxes were selected, setting every out of office from the MD down.

    Oh how we laughed in the forthcoming meetings....

    1. Nick Kew

      Re: Don't forget to test all use cases

      A bad workman blames his tools.

      But sometimes the tools really deserve blame.

  15. Doctor Syntax Silver badge

    If you think it's going to be cool you're probably doing it wrong. If you think it's going to be super-cool you're certainly doing it wrong.

    Eventually you're old enough for your criteria to be that it works and doesn't do any damage.

    1. Groaning Ninny

      Oh so true

      All those wonderful plans for that uber-admin swiss army knife script. Nope, just something small that does one thing right.

      1. Alister

        Re: Oh so true

        Nope, just something small that does one thing right.

        You wouldn't like to tell Lennart Poettering that, would you?

        1. Doctor Syntax Silver badge

          Re: Oh so true

          "You wouldn't like to tell Lennart Poettering that, would you?"

          Wouldn't make any difference if you did.

        2. Groaning Ninny

          Re: Oh so true

          I'd not realised what I said could have been a stab at systemd. Opportunity missed ;-)

  16. Sureo

    On a complex transfer to production I unfortunately left out one element .... I can still hear my project leader proclaim "You did WHAT?"

  17. Andytug

    Right-click, reset permissions on all child objects......

    on one folder, or so I thought......oops no that would be the entire workgroup!

    Managed to stop it before it had done more than the first two folders, and put them back before anyone noticed.

    Also, my old boss had perfect timing. Big room re-org, I take one look at the spaghetti junction that is the patch panel and think "If I try to move any of this it will be even worse than it is now". So I data capture it all, then rip the whole lot out as a big tangle on the floor. Which was very satisfying, until at that exact moment my boss walks in, sees the empty patch panel and does a complete "WTF!!" I explain, he takes a very deep breath, says, "I'll trust you......don't bu&&er it up!" and walked out again.

    It did go back fine (and a lot neater), but the look on his face was priceless.

  18. hmv

    Don't Shout "Fuck" in The Data Centre

    A long time ago, I was moving an AlphaServer 4100 from FDDI to Ethernet; around about the time that auto-negotiating the speed, etc. was "problematic". After realising that autonegotiation hadn't negotiated anything mutually workable, I let loose a loud "fuck" (as stress relief) and started nailing it in the firmware environment (can't guarantee that but it definitely required rebooting the whole thing - which wasn't that quick). At each failure, I'd let loose another "fuck" until the network manager came across and suggested I not do that because I was making the director nervous.

    That plus other events leads me to the firm conviction that senior managers should be allowed nowhere near the DC when important work is going on.

    1. Ken 16 Silver badge
      Trollface

      Re: Don't Shout "Fuck" in The Data Centre

      It's a mystery that f-uck isn't a valid Unix command

      1. fnusnu

        Re: Don't Shout "Fuck" in The Data Centre

        and unf-uck...

      2. Waseem Alkurdi

        Re: Don't Shout "Fuck" in The Data Centre

        Well, it is, it's spelled 'fsck' though.

  19. Ken 16 Silver badge
    Devil

    That sounds Agile to me!

    Fail fast, fail often, leave not one stone standing upon another, burn the crops, salt the Earth etc. I forget exactly how it goes, but by their works shall ye know them.

  20. Anonymous Coward
    Anonymous Coward

    Doing system-wide upgrades from WinXP to Win7. Users had been told for six months to ensure everything was in H and not on C. In the appointed week we went around at the end of business ensuring all the PCs were left switched on so they could be re-imaged over the network. Went quite well, users would come in the next day with a new system up and running.

    We would do a morning walk-around to find the odd handful where the imaging had fallen over and do a manual build You're guessing where this is going... the handful of users screaming "where's all my data gone??????" Err.... it's in your home directory, y'know, drive H for Home, where you've been putting it, where you've been told for the last five years to put it, and where you've been told for the last six months to ensure all your data is put.

    "Restore my drive C to what it was!!!!!!!!"

    Sorry, no can do, the contents are completely low-level overwritten by the new system.

    The next roll-out project I was on was the old tedious method of removing the hard drive and installing a new hard drive, and keeping the old hard drive in a cupboard for two weeks before being wiped and re-used, just to protect against the Drive C idiots.

    1. Nunyabiznes

      Same, but luckily we have statute to point at when the user asks where their locally saved data went.

    2. Nick Kew

      Your drive C idiot (being your vindictive boss) was on a long holiday. Better make that two weeks at least long enough to cover leave, including extended things like maternity.

      Or just back the d*** thing up!

    3. Luiz Abdala

      "old tedious method of removing the hard drive and installing a new hard drive, and keeping the old hard drive in a cupboard for two weeks before being wiped and re-used,"

      Yep. I upgraded my father's machine like this for 10 years in a row, except the older drive would be hanging about as slave, along with the newer drives.

      At one point, his machine had a sata 500GB, an 80GB, a 40GB and a 10Gb hanging in there on IDE cables. All bootable, each one with a valid Windows setup.

      How did I find they all worked? Two words: bios battery. The battery was long gone, he pulled it off the plug and - surprise! - it was running windows 98 instead of XP as it turned back on.

      I call it "resilient mode". Even if it kicks the bucket, it loads an ancient working version of the system. hahahah

    4. Trixr

      The big failure there was assuming the users followed instructions.

      It's actually not that hard to ask a user where they keep their crap if you're doing a manual rebuild and have a script ready to copy it somewhere temporary on the network (that's not their usual home drive, since 9 times out of 10, it IS crap, like copyrighted media files, and they won't be allowed to store any of that stuff in their real home drives. Or family snaps that they can bring in a USB for you to copy to.).

      Yes, it makes doing those workstations slower, but you can get on with other stuff while they have to wait the extra time. For you, it saves a lot of whinging and negative reports to management.

      1. Anonymous Coward
        Anonymous Coward

        Very often the users *don't* *know* where they save their crap. It's just "on the computer". The biggest task of the walk-around on the next day is walking users through File-Open dialogs as they all just fetched files via the recent files dialog. "Why aren't my files in the recently-opened files list?" Because you haven't yet recently opened any files yet!

        On some systems we discovered users saving files inside application directories - which really bit them when the OS or application was upgraded or, worse, the license expired and the application killed itself.

    5. defiler

      Group Policy is your friend.

      Redirect My Documents etc. Hide the C/D/E drives. Map a home drive to hold their crap. Do this early before they start filling the local drive with pish.

      On the other hand, fixing these little loopholes just means that the user find something even more ludicrous and obscure to trip up over.

      1. John Brown (no body) Silver badge

        "Redirect My Documents etc. Hide the C/D/E drives. Map a home drive to hold their crap. Do this early before they start filling the local drive with pish."

        Ideally, so early that it was all pout in place when networking was introduced. In the real world, networks often grew organically as PCs were added. It might even be that the business grew enough that it became possible to standardise on make/model of PC, but all the network magic was set up back when networks specialists were expensive to hire and even more expensive to contract in. Or the company grew and bought other companies or was itself bought and each part of the various mergers over time all do things in their own way and no one knows how it all works, but it does, so don't "fix" it. If you do, it'll break or it will cost a lot. Or worse, both.

        Only last year I was dealing with a large org with a fleet of 4,500 PCs and they were still doing imaging from a boot device (they had recently upgraded to USB sticks from DVDs), not over the network. There was a plan in place as they were about to upgrade everyone to Win10 and bought the relevant licences for network deployment.

      2. Anonymous Coward
        Anonymous Coward

        "Do this early before they start filling the local drive with pish."

        That often requires a time machine.

  21. David Robinson 1

    For reasons lost in the mists of time, I'd made a copy of /usr on a system. Time came to remove the copy.

    root@server # rm -rf /usr

    Yep, absolute path not relative path.

    There's the one rule we live by in our office: "No five-minute changes on a Friday afternoon." Violating this rule usually means you'll be staying back 3 hours to undo the chaos such a change visits upon the system.

    1. Olivier2553

      No five-minute changes on a Friday afternoon.

      Make it even "No system change at all on Friday," So if you fecked up on Thursday, you have the next day to repair your mistake.

  22. Will Godfrey Silver badge
    Facepalm

    Can I join?

    rsync was my nemesis with --delete set

    1. GrumpenKraut

      Re: Can I join?

      > rsync was my nemesis with --delete set

      Don't you know about --dry-run ?

  23. Wenlocke

    SQL

    The classic "Forgotten your WHERE clause" is something that I have done at least once.

    Bonus points for fully intending to type it, thinking about typing it, but executing on autopilot

    1. Anonymous Coward
      Anonymous Coward

      Re: SQL

      This is another "this is why we dont do updates on a Friday afternoon" story: someone in editorial had published a newer version of (ironically) a report on the state of the beer industry, 2005. The file was copied to the servers fine, and then a developer had to update the DB table mapping categories to the rtf and html files on disk.

      30 minutes later, I get a ticket saying "whatever category I click on, I just get the 2005 Beer report", and "i-am-a-dummy=1" got added to the [mysql] section of every my.cnf we could find.

    2. PK

      Re: SQL

      I like the way SQL Server lets you highlight and execute part of a command - so you can accidentally miss out the last line which you HAVE typed, checked, rechecked and ... ooops!

      1. Wenlocke

        Re: SQL

        and of course, for readability on very long lines, your WHERE clause is on the next line.

        1. Nick Kew

          Re: SQL

          And if you're on a system (like a mac) that makes cut&paste a great feat of dexterity, who knows what command your fingers might end up with?

    3. Down not across

      Re: SQL

      The classic "Forgotten your WHERE clause" is something that I have done at least once.

      Especially when using database like Sybase for example where default transaction mode is unchained mode and you did not remember to explictly begin a transaction.

  24. markfiend
    Facepalm

    There was the time that I typo'd "DELETE FROM table WHERE id=1" as ..."WHERE id-1"

    1. ibmalone

      At least one record survived!

  25. Joe Dietz

    There was the time after a bit of drinking (well a lot actually) I might have wandered over to my university job as a part-time student solaris admin for a small research college and 'upgraded' the YP system to NIS+. The next morning (a sunday) I had this hazy notion of doing something disgraceful the night before... Weirdest thing though - it might have been the one and only time that the YP->NIS+ migration scripts 'just worked'. But yeah... you probably shouldn't let 19 year--olds do stuff on things you care about.

  26. Anonymous Coward
    Anonymous Coward

    Sounds like it was the fault of stupid instructions

    At least it is if he was told to do one directory at a time. A manual process for one directory at a time would never be followed 100% correct over hundreds of directories, there would be a step missed or typoed in there somewhere - that's WHY you automate things.

    The important difference is, when he tried to automate it the problem was obvious. If he'd done the manual job he'd have screwed stuff up but wouldn't have realized it until angry users called up about login errors or missing files. If he managed to combine the files of two users, which seems like a potentially likely outcome, it would be very difficult to split them up - and possibly lose data if they had files of the same name (especially dot files) that were overwritten.

    This is why you don't assign "simple" manual tasks to peons, but have the experts write a script for it. If the script needs babysitting (i.e. control-Z and call me immediately if you see any error messages) THAT'S what the peons are for. That, and being the first line of defense when people call, so they can take care of the stupid problems like "your printer isn't plugged in".

  27. TomPhan

    It never happened

    I mean, a higher up who understood what what happen, total fantasy.

    1. John Brown (no body) Silver badge

      Re: It never happened

      "I mean, a higher up who understood what what happen, total fantasy."

      The story sounds to my like the PFY screwed up and his boss was the BOFH. "Higher up" is relative and doesn't always mean the PHB.

  28. short a sandwich
    Go

    Experience

    First the test, then the learning!

  29. Trixr

    When recovery is not an option

    A lot of the time these days, you'd just restore from backup - and it's handy the more sophisticated products let you restore just ACLs if they've been hosed.

    However, I learned the hard way in the late 90s that you need to watch your backups as well. We had a HP tape jukebox that held about 24 tapes, I think, and only the database backups were exported and stored in a safe (our mail and user files weren't deemed that important back then, but of course you'd ship them offsite now).

    One day I needed to do an inventory to find the tapes that needed exporting. I think it was the second or third time I had to do the job, so it wasn't routine yet. The jukebox control panel had 8 buttons on the front, in two columns, with the actual button names displayed on an LCD panel between the physical buttons (like many ATMs). The INVENTORY button was in the top row.

    So I pressed that and waited for it to finish - it normally took about a minute to scan the barcodes and then we were ready to export. Well over five minutes later, it was still "scanning"... in fact, it was loading the tapes and the drive was active for a minute or so after a tape loaded. ...Oh bugger.

    Turns out the top row of the control panel had the INVENTORY button, sure enough. But the other button in that row was INITALIZE. The room was dim, the LDC panel was very low contrast, my eyesight is not fabulous, and the words began with the same letters and were about the same length in the fixed-width font...

    So yes, I managed to single-handedly wipe 3 months' worth of backups in less than 10 minutes. Thankfully we didn't actually need to restore anything during that interval by the time those backups aged out. Boss was not happy, but at least I'd fessed up straightaway and we did an immediate backup of everything again.

    1. Down not across

      Re: When recovery is not an option

      Boss was not happy, but at least I'd fessed up straightaway and we did an immediate backup of everything again.

      He should've been. For fessing up, rather than trying to dodge it. Mistakes happen. Most bosses (at least any decent ones) will appreciate owning up, Not to mention that trying to dodge and getting caught is almost certain to get you dismissed.

  30. Anonymous Coward
    Anonymous Coward

    AC because

    Reminds of the time I deleted a mid-sized airline's entire flight operations database during the middle of the day (back in the 90s, dBase files, Clipper and all that).

    Fortunately the ops team had a fallback, because the system wasn't 100% reliable (indexes would sometimes flake out and need to be rebuilt, that kind of thing) and so they switched over to paper and pencil for a while.

    Even more fortunately, the flight ops database was so intertwined with other stuff (load factors, revenue management, customer database) that every piece of data could be reconstructed from somewhere else, if you knew where to find it and what inferences to make. Luckily for me, I'd worked there long enough to know my way around the weeds.

    Blessedly, my boss—to whom I immediately confessed my ghastly mistake, while offering that I thought I could fix it—after picking herself up off the floor, grinned naughtily and said "You'd better be right, hadn't you?"

    Short of actual combat, I can recall no few hours of my life when I was more existentially focused on anything, than that fateful afternoon, cranking out a metric shitload of Clipper before teatime.

    Dear Reader, it worked: and I had another character-building, albeit metaphorical scar to add to the collection.

  31. Wemb

    While running an inadequately tested script on live data is obviously a recipe for disaster, I do have sympanthy for the PFY here. There's nothing guarnanteed more to induce mistakes (and ill-advised workarounds) that giving someone bright a boring and repetitive job to do - especially one that really is crying out be automated. I'd have tried exactly the same as he did back in my youth - in fact, I did - I was given several hundred MS exchange email accounts to setup at a school I worked out - and my PHB expected me to manually logon to each account, run some bit of MS code to configure some client-end settings, and then logout again. All manually. Well bugger that... Boring and repetitive jobs are what computers were invented for.

  32. kirk_augustin@yahoo.com

    Teacher at fault

    There was nothing wrong with the student trying to automate a script to do it right. Doing it by hand individually is the wrong way to go, and is more likely to cause problems. The problem is the teacher did not explain anything to the student first, so that he would have known how to do it safely, accurately, and more quickly.

    And it is foolish to pretend there was any danger, risk, or that the source files has to be rebuilt by hand.

    ALL multi user systems are always backed up every night.

    And restoring the back up was the correct way to fix the mistake.

    The student should have been told about that.

    And in fact, the student should have been told to make a script first to do a localized back up of all that he was supposed to copy, first.

    He also should have been warned about side effect, and his scripts should have been checked by hand first by the teacher.

    The fault is entirely with the teacher and not the student.

  33. Matt in Sydney

    Ahhh, nostalgia. We need all these stories in print, BOFH style, maybe different coloured bindings for fact and fiction. At least it wasn't a real time inode reconstruction...

  34. David Crowe

    I learned (the hard way) never to do anything important with shell scripts. They are like fire. You think you can control them then whoosh, you just burned your house down.

  35. Jimbob 3

    RAID5 testing

    I remember a long time ago an install engineer who came to site to build our call centre server and get it all configured. Five days work, all going very well until day five when it was fault and resiliency testing on the server. Three disks in RAID5 and I was in the background whilst he pulled a disk out to test setup.....Then he grabbed the second disk and before he heard my scream he had pulled it out. Start again on the server build and setup buddy!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like