back to article You deleted the customer. What now? Human error - deal with it

Everyone I speak to about system security seems to panic about malware, cloud failure system crashes and bad patches. But the biggest threat isn’t good or bad code, or systems that may or may not fail. It’s people. What we call Liveware errors range from the mundane to the catastrophic and they happen all the time at all levels …

  1. Anonymous Coward
    Anonymous Coward

    And ideally

    The documentation should point you to tested scripts or some other form of automation that makes the mundane repeatable, without the risk of fat fingers messing things up.

    1. Anonymous Coward
      Anonymous Coward

      Re: And ideally

      And this is also why monolithic, non-patchable/scriptable/integrable "my way or the highway and here is a support contract" so-called "applications" are the devil's work.

      Lego bricks is the only way!

    2. This post has been deleted by its author

  2. Shadow Systems Silver badge

    I concur with the procedure guides.

    I was once put in charge of creating all the "How To Guide" manuals for the company I worked for at the time. I thought it a serious pain in the arse, but I was being paid by the hour so sat down to start writing. It took months to go through everything we did, how we did it, & create step by step, "If you do this instead then you'll get this", complete guides for everything. Including such mundane tasks as solving printer issues, email issues, "I can't connect to my network share!", and other such problems.

    I was later told by a coworker that the guides were the biggest boost to productivity they had seen in years, because the staff could grab the appropriate binder, flip open to the right page, & fix the problem themselves in less time than it took to ring up the Help Desk, explain what was wrong, & listen to them walk you through their script... The same scripts that I had included in the guide so the folks didn't HAVE to call the Help Desk unless it failed.

    About a month after I left the company I learned from a different (now Ex) coworker that Manglement had axed the guides I had so painstakingly created. Why? Because it made the Help Desk folks look bad "because they had nothing to do".

    I still shake my head in disbelief at the stupidity of Manglement & their inability to figure out that the immediate drop in office productivity exactly matched the increase of Help Desk call volumes. Strange coincidence, no?

    *Rude noise*

    So I concur that the procedural guides can be a boon. They can help the normal folks to do their own troubleshooting BEFORE having to call up the Help Desk. If you've already wiggled all the cables, checked all the settings, & a reboot hasn't done the trick, you can tell the Help Desk person to skip past those steps in the script to save time. "I've already done steps one through seven to no avail. What's next?" tends to derail them, but it beats having to sit there & pretend to be doing what they ask while drumming your fingers on the desk waiting for them to catch up.

    1. Doctor Syntax Silver badge

      Re: I concur with the procedure guides.

      Guides should also say why things are done this way and the risks involved if they aren't, especially if regulatory or legal requirements are involved. Not only does it mean the readers have a better understanding of what they're supposed to be doing, it enables a review if circumstances change. It also pre-empts manglement's bright ideas - and is evidence to deflect the inevitable shit-storm when they ignore it.

    2. Barry Rueger

      Re: I concur with the procedure guides.

      "So I concur that the procedural guides can be a boon. They can help the normal folks to do their own troubleshooting BEFORE having to call up the Help Desk."

      If you're working with intelligent people there's a lot to be gained by teaching them to be self-sufficient, both in terms of saving you time fixing routine problems, and in terms of making them feel more in control of their work environment.

      And, in a perfect world, they quickly become the defacto Help Desk for everyone sitting with two cubicles of them!

      1. Alan W. Rateliff, II

        Re: I concur with the procedure guides.

        Having more feeling of control is extremely important as I have found providing off-site IT services for numerous customers. Just the mere act of power-cycling a modem and firewall is often enough to not only reduce the calls but to make the customer feel like they are less dependent upon you.

        I have heard in the past "I just didn't want to bother you" or some similar sentiment, but what is really being said is "I don't want to be forced to call you to free me from the shackles of technology every time some 'little' thing goes wrong." Some customers will feel that they are being held hostage, at the mercy of some outside contact with the keys to the kingdom, knowing it is a 80/20 gamble on if you answer right away or they may have to wait 10 or 15 minutes for a return phone call -- when a simple reboot would have been enough to resolve the issue.

        Really. Something as simple as "reboot the computer" is not only empowering to the customer or user as having the ability to resolve many issues, it also lessens the frustration of having a critical call to return, or divert from another job, only to find the solution was as simple as rebooting. Now you have one customer or user waiting for you to return to them, and one customer or user who has had to wait for you.

        Amazingly, a simple document with these lines is like gold:

        "Problem: QuickBooks won't open

        Error: QuickBooks cannot find the data file, or similar

        Resolution: Check on Q: drive by clicking START then 'Computer.' If Q: drive is not present, restart the computer and try again. If Q: drive is present, please note if a red ' is present on the drive before proceeding, then double-click on the drive. If the Q: drive opens and you can see files, close the window and open QuickBooks, again.

        If the error given is different than above, or any given step results in another error, please call xxxxxxxx."

        Pictures help, too.

        Of course, you will always have a user who just does not want to troubleshoot. Really, that is fine, too, as their job has other things on which to focus, and only a small percentage of those users makes life happier for all involved.

        While customers like to know they can depend on you, most do not like being dependent upon you.

      2. dan1980

        Re: I concur with the procedure guides.

        @Barry Rueger

        "If you're working with intelligent people there's a lot to be gained by teaching them to be self-sufficient . . ."

        Ah, what a fine world that would be.

        1. Mandoscottie

          Re: I concur with the procedure guides.

          @Barry Rueger

          "If you're working with intelligent people there's a lot to be gained by teaching them to be self-sufficient . . ."

          I agree, except dealing with research scientists, so intelligent its scary, but no common sense whatsoever,

          I had one who thought due to IPA not containing water, it was totally ok to "sanitise" a 2 day old laptop and couldnt understand after cleaning, the chiclet laptop keyboard all the keys were blank.......I gave them a sheet of A4 with the alphabet and numbers 0 to 9 on it, a pair of scissors and a pritt stick.

          "No I cant call that into Dell for on drugs? " "thats not accidental damage!" thats user retardation!!"

          another blinder is our on market support scientist, the guy should be at NASA, his record is a week of new laptop before he bricked the lan port (snapping connectors inside the port...the mind boggles!!) then a week after replacing the backplane, he managed an Ollie 720 and landed laptop on its lid while powered on onto the hard lab floor......he thought he had put it on his lab bench properly but had only put 1/3 of laptop on it and walked away.........BANG! WTF!

          they brighten up my system support job in a biomedical research company daily with their clear lack of common sense. :D

    3. Skoorb

      Re: I concur with the procedure guides.

      Procedure guides. They are nice when they are up to date.

      I once had to be rotated temporarily into a different unit to cover for staff being on maternity leave/quitting/being seconded elsewhere.

      "But it's OK", I was told, "just follow these signed off Standard Operating Procedures"!

      So I do. Until it turns out one is now out of date due to some system change and actually following it leads to silent data quality errors in a (random natch) small percentage of records.

      Which of course was my fault, as I was the one who pressed the button, and obviously I should have known better than to follow that particular SOP.


    4. Matt Bryant Silver badge

      Re: Shadow Systems Re: I concur with the procedure guides.

      ".....I still shake my head in disbelief at the stupidity of Manglement & their inability to figure out that the immediate drop in office productivity exactly matched the increase of Help Desk call volumes....." I had a similar experience, but the good work was undone by crafty consultants pulling the wool over the eyes of duh manijment. A colleague and I wrote up the procedures library over the course of a year, and they were much welcomed by the staff, reducing helpdesk calls and freeing up the IT staff's time for other work. Then a well-known UK consultancy outfit sailed in and offered to provide a "one-stop-shop" for support with an offsite (as in waaaaay offsite in Bangalore) helpdesk, centralized remote builds, etc., at a bargain price. Our internal IT team was gutted to fund the deal. The first thing the consultants did when they got the contract was delete all help files from the desktop and server builds and remove access to the process library we had written. Now staff had to call their helpdesk for even the simplest of issues. The consultants' justification was that the staff were hired to do their jobs, not IT work, which sounded good to manglement. But the real reason was helpdesk contract had a threshold for call volume, and removing all the help files and our process library pushed the volume of calls over the threshold and meant additional charges, making the service eventually cost almost twice what the old inhouse IT team had.

    5. Anonymous Coward
      Anonymous Coward

      Re: I concur with the procedure guides.

      I completely agree that process guides save so much time so it really does surprise me when no-one - especially management - do not seem to grasp the concept of creating and maintaining them. Yes, there does need to be someone designated as the owner/creator/updater for them and it can be a chore if your library is large, but if the guides are well maintained and written clearly it is definitely time well spent.

      Of course, the problem then is when someone comes along later and skims through one of the documents instead of actually reading it and misses out a critical step. You know, like skipping one of the numbered bullet points you've put in to make the process steps easy to follow.

      Personally, I always try to read through a guide I've never seen before at least twice just so I can get my head around it before I even attempt any of the instructions contained in it.

  3. frank ly

    Belt and braces

    "The information must have been important as he kept that disk for years – just in case."

    But not important enough to have two physically separate copies, it seems.

  4. jake Silver badge


    I think you mean "wetware".

    Kids these days ...

    1. Anonymous Coward
      Anonymous Coward

      Re: "liveware"?

      Back when I was working as a contractor at IBM, watching other (and ourselves) train up overseas replacements, we called it "OUTSOURCEware".

      Because management would literally terminate the knowledgable people's contracts when 60-70% of the knowledge transfer had been done, instead of when the outsourced replacements we actually functionally competent.

      Cue the accidental destruction of mission critical telco (main national Oz carrier) database. More than once (different databases though fortunately). The shit show in penalty costs for them would have nuked all savings.

      But that's IBM for you, where actual competence in staff is optimised out.

      Especially if the costs can be pinned on someone else's department.

      1. jake Silver badge

        Re: "liveware"?

        We actually called it "meatware" when I was at Berkeley and Stanford. One of the professors suggested that that name wasn't conducive to the reality of funding. So we butted heads over pizza & beer and came up with "wetware". Probably 1979.

  5. Anonymous Coward
    Anonymous Coward

    A cautionary pair of tales (pt1)

    I have been very fortunate to have only been bitten twice with mis-clicking errors of a monumental scale:

    The first involved an outdated backup procedure we ran twice a week. A manual process between two machines. Take a database snapshot of the production database, verify it, copy it to the shared storage. Switch to the backup database server (stupidly on the same KVM). At this point I got called away to deal with a fault.

    Upon return I merrily followed the next step - to type those famous words "Drop Database". Just before hitting enter I saw the desktop wallpaper. Someone had switched the KVM back to the production server while I was away! This has obviously bitten someone before as the backup has plain blue wallpaper, whereas the production server has bright red wallpaper with pictures of bombs on it! Somewhere in the region of 35,000 asset records saved. Luckily it was only a short(!) time before the obsolete, slow, clunky dust-puppy nests were replaced with a new pair of servers which could be driven by the automated nightly backup system.

    1. Yet Another Anonymous coward Silver badge

      Re: A cautionary pair of tales (pt1)

      NAS with a tabbed web management interface.

      The backup NAS I'm cloning TO decides to go away, leaving the main production NAS tab on top.

      Click clone, yes I'm sure, yes this will erase the target, yes I know that's what I'm trying to do....

      .... ooops ....

      1. Mayhem

        Re: A cautionary pair of tales (pt1)

        Oh god yes. The number of times I've mistyped something critical is fairly low.

        The number of times I've deliberately killed something critical because I thought I was looking at something else? That is definitely an embarrassingly higher number.

        Protecting users from themselves is a lofty goal, but the most important user to protect is YOU.

        Visual cues are a valuable help - different wallpapers, different coloured terminals, a change in text colour when you log on as superuser ... anything to say think twice.

        1. swarfega

          Re: A cautionary pair of tales (pt1)

          How about the following - bank submits many network user account delete requests daily. Requests submitted in a common format where the requester name is formatted in the exact same way as the deletee name in the request.....

        2. Yet Another Anonymous coward Silver badge

          Re: A cautionary pair of tales (pt1)

          Yes - I learned that one a long time ago (NT3.5?) - always have a red background on the production system,

  6. Anonymous Coward
    Anonymous Coward

    A cautionary pair of tales (pt2)

    This one *did* catch me out.

    We had a system outage on the automation system. I work in television, so the system failing to run a programme - especially a Soap Opera - gets the Points of View mailbag bulging. Needless to say we went into manual fairly quickly once it became clear the fault affected both main and backup transmission systems and ran the programme from tape.

    I was still dealing with some of the fallout to make sure other channels were not going to suffer a similar fate when I get someone who should know better demanding to know what happened. It was obvious he wasn't going to leave Mission Control until he had and answer so, a little flustered, I went to the automation logs and opened the verbose logfile (which goes down to keystroke granularity). Unfortunately, missed out a crucial step - copy it to an offline terminal first. On the offline we have a tool which allows you to open the file without bringing the machine to a juddering halt.

    In my desire to get rid of this person, I accidentally double clicked to file. Cue the server attempting to open a 3gig logfile in NOTEPAD. I couldn't even get Task Manager open to try and kill it. A few calls on talkback to warn everyone we were about to fall off the air and I could almost hear my P45 coming out of the printer in HR.

    In the Incident Report I put it down to human error and held my hand up, as it would be pointless to mount a full investigation and waste a day, just to find the obvious. Cue some 'suits' descending and tearing a strip off me in front of my team. "How could a Senior be so stupid?!" etc.

    Needless to say MY boss was none to pleased and had "a quiet word" with them along the lines he would deactivate their passes from Mission Control if they did that again as it was unprofessional on their part. He also explained that had they bothered to listen, I had already changed the system so if you tried to open a log on the live server you got a dialog box instead telling you you couldn't.

    1. Anonymous Coward
      Anonymous Coward

      Re: A cautionary pair of tales (pt2)

      Simple fix for the memory inyensive text file using Notepad.

      Change the default program for opening .txt and .log files to Large Text File Viewer.

      You protect the machine from blowing memory and you can still 'right click open with' if notepad is needed buy you have to think about it. Better still, Notepad++ for editing.

      1. Robert Carnegie Silver badge

        Re: A cautionary pair of tales (pt2)

        I think Notepad++ struggles with files of even a few hundred megabytes.

        OTOH I think it now has a "tail" mode i.e. when the file grows on disk, its view in NPP is updated.

        Alternative suggestion though - have your routine editor be one that quickly fails out, SAFELY, on oversized files. MS-DOS EDIT or EDLIN may qualify, may not.

  7. Alan Brown Silver badge


    backup everything, preferably frequently and automatically.

    manglement always disputes the cost of a backup system until they actually need it.

    No amount of money thrown at the problem AFTER you've lost the data will bring it back quickly.

    If there are scratch disks on individual systems you can guarantee that despite any amount of warnings these are not backed up, someone _will_ put critical data on them and then demand they be restored when the drive goes toes up. This happens regularly where I work.

    Background: the scratch disks were supposed to be NFS cache disks for a fairly slow NFS server, but rhel cachefilesd didn't work, so manglement decided in their infinite wisdom that they should be put to use as scratch. Bad BAD BAD idea - what has been done is hard to undo, even when the NFS server is now significantly faster than local spinny disks.

    1. Anonymous Coward
      Anonymous Coward

      Re: BACKUPS

      Ah, it's not just about cost, I just wish manglement would listen and show some common sense sometime...

      Even after paying an agency to come up with a 'Disaster Management Strategy' which mentioned things like 'maintaining offsite backups', manglement did sweet Fanny Adams to implement them.

      So, off my own back, I did, told a.n.other where the offsite backups were held, and an unofficial system was thus in place. and I was a lot happier.

      Fast forward about 20 months, manglement find out about my unofficial offsite backups thanks to the brown stuff hitting the rotatey thing when a couple of important files go amiss, they're not on the normal backups for some reason, so I go get the offsite backups (full dumps and incremental changes over the 20 months worth), restore the files, then get it in the neck for actually performing said backups in the first place..despite saving their arses by recovering these bloody files deleted 'by accident' by a soon-to-be-ex member of staff (admin rights really should be pulled well before end-of-contract)

      You can't win..

      1. Yet Another Anonymous coward Silver badge

        Re: BACKUPS

        Most of my fsckups have been with backups

        A RAID system where the management console numbers the drives physically and an OS that numbers them logically. Physical drive 0 fails, OS boots from drive 1. OS asks if you want to rebuild the 2nd drive from the first?

  8. DougS Silver badge

    Write temporary scripts when you're doing something potentially dangerous

    I'll write a little shell script if I'm doing something that might be risky, like deletion or configuration changes that can''t easily be undone. Add in something that makes you hit a key to continue after showing the state of things for multi-step processes.

    This lets you test things by making the risky statements print the command line they would execute rather than executing it for a dry run - important if you are using variables or loops or such to insure what you expect to happen actually happens.

    The extra time required to write the script forces you to figure out exactly what it is you're trying to do, preventing the fat finger or 'in too much of a hurry' type of errors.

    Obviously writing a script is a bit much if you are just going to delete one directory, but it is still a good idea to replace 'rm' with 'ls' and try that first, just to confirm what you are deleting. If you are using 'rm -rf' and expecting to delete a couple dozen files and see screen after screen scrolling by you'll be saved from a potentially costly mistake.

    Another favorite habit was aliasing dangerous commands, like reboot. I'd alias it to echo "use reboot`hostname`" and alias reboot`hostname` to the full path of the reboot command. That prevents accidental reboots of the wrong server (this can be a problem in a major rollout where you are doing a lot of active work on some servers while developers are already working furiously on the test/dev/GM servers)

    1. Flocke Kroes Silver badge

      Re: Write temporary scripts when you're doing something potentially dangerous

      Mine are:

      type 'm superfluous_thing', proof read, '<home>r<enter>'.

      A script with echo in front of anything dangerous. Run the script, then remove the echos.

      Finally: restore from backups regularly.

      Two days work in 6502 assembler on someone else's computer. Tested, working, and saved twice to 5¼" floppy disks (IAVO) on Friday afternoon ready for demonstration to the customer on Monday. Clean up everything on the borrowed computer, then find both floppy disks are unreadable. Suddenly I was not looking forward to the weekend any more. I have not lost data since then.

      No project is complete until it has been restored from backups, preferably twice, the second time by someone you trust to deal with problems while you are on holiday.

  9. Tom 7 Silver badge

    Never delete anything.

    At $50 a TB you dont need to. Just mark it as not of interest at this level. Indexes are cool!

    1. a_yank_lurker Silver badge

      Re: Never delete anything.

      At some point, the information in the documents only is historical at best. Either archive it or delete it. Not so much make space but to not have to keep track of it.

      1. Robert Carnegie Silver badge

        Delete some things.

        Sometimes, responsible use of data includes deleting it when you don't need it any more, such as when it's the law regarding personal data or credit card numbers. Keeping what you shouldn't keep means it also can be stolen and misused and it's your fault.

    2. find users who cut cat tail

      Re: Never delete anything.

      We are no CERN but a TB of experimental data is still not that hard to produce...

      There is a huge difference between the scale of data humans can produce themselves and data than can be acquired by some automated process.

    3. DougS Silver badge

      Re: Never delete anything.

      Keeping everything is a terrible idea. It doesn't matter how good your search is, trying to find what you need is more difficult the bigger the haystack. Especially if you have dozens of different versions of the same dataset but only care about a few of them.

      Already the volume of data is becoming a bigger and bigger problem because of attitudes like yours...

      1. Sam Liddicott

        Re: Never delete anything.

        Keeping everything IS a terrible idea, but not as terrible as deciding which files might need recovering and which files won't -- especially before the urgent need for recovery occurs to grant resources needed to make all those very many decisions.

        If everything is kept, it is then a simple matter for the owner to decide whether or not it is worth trawling through everything.

        1. JerseyDaveC

          Re: Never delete anything.

          The problem with keeping everything is that you're sometimes not allowed to.

          For example, the data protection laws covering personal data (both those that have been around for years and the new GDPR ones) make clear that you are OBLIGED to delete personal data when you no longer have a legitimate purpose for keeping it.

          Defining and agreeing a good retention policy is a pain in the nuts, but it's a pain worth enduring. If someone complains that you deleted something three years ago that they now need, that's tough - because more often than not they WANT the information rather than NEEDING it and may well not have a truly legitimate reason to be using or processing it.

          We hold on to data because most of the time we are obliged to keep it for AT LEAST a given period (e.g. keeping tax-related information for six years). It's easy to forget that there is often an upper limit to how long we're allowed to keep stuff for, whether it's a static measure of time or it's in the context of "the requirement has gone away".

  10. Anonymous Coward
    Anonymous Coward

    Salvage on Novell is all well and good, but if i remember rightly it took two keystrokes to delete a volume and there was definitely no "are you sure" window.

  11. Strahd Ivarius

    Documents are a good idea, but...

    When the teams are already understaffed, nobody has the time to write them...

    So no documents available for new people starting after a staff member left because of overwork, not even the list of basic permissions to grant to a new team member.

    Which leads to frustration, overwork for the other people in order to explain everything to the new hire, then another member leaving, and the cycle repeats.

    But management being not interested in the day-to-day operations and only in big projects providing lots of notoriety it is not a problem.

    Up to the moment the users start billing back IT for lost time...

    1. Oengus

      Re: Documents are a good idea, but...

      Nobody bothers to read them when the shit hits the fan....

      I remember documenting a procedure that had a flaw if the overnight processing went past 07:00 (the "start of a new day" in the scheduling system). There was a documented and simple process (one "ad-hoc" program needed to be run after overnight processing was completed) to recover the situation. Sometime after I was let go from that job I was having a catch up with the people for drinks and they related the issue that occurred during the End of Financial year processing where this process failed because of the overnight processing running late and no one was able to get it restarted (no one read the doco). The night shift person came in and was advised of the situation. He told them "there is a process for that in the documentation", opened the documentation, ran the job and it all started working. It had been down for over 15 hours because no one read the documentation that we had put together.

      1. Anonymous Coward
        Anonymous Coward

        Re: Documents are a good idea, but...

        So why did you not automate running the process when the conditions were met? You know computers are good at that right?

  12. Tromos

    Change management no use

    The new management was just as bad as the old.

  13. Terry 6 Silver badge

    Confirmation dialogues

    An "Are you sure?" message might save the day.

    But when the original error is unrecognised or the user was confused the probability of pressing either key becomes 50%.

    The only dialogue worth having is the one that says "If you continue the entire contents will be wiped completely. Are you sure you want to wipe this file/disc/hard drive?"

    Followed by "Are you really sure? Really really?"

    And even then a "Would you like to think about this?" wouldn't hurt.

    1. Sealand

      Re: Confirmation dialogues

      Confirmation is (or was) not enough to protect you from me.

      Back in the days of floppy disks, I was learning how to use the DOS commands by doing. Deleting multiple files using wild cards (like file*.* or file*.dat) was handy until I wanted to delete a number of files ENDING with the same characters.

      I told the OS to delete *file.* and was asked to confirm.

      Yes, dammit, that's what I typed, wasn't it?

      ... and the disk was empty.

      I would have liked a couple of those "Are you really, really ..."

    2. Doctor_Wibble

      Re: 'ha ha you are too late' dialogues

      > An "Are you sure?" message might save the day.

      But not with the genius Windows 7 folder re-selection 'feature' which changes the selected folder to the containing folder without any visual clue and you don't know what's happened until it asks 'are you sure you want to delete desktop.ini' just as it refreshes the explorer window to show there is nothing else there now.

      And yes, this is the kind of crap that happens as you are sorting what needs to be backed up (method now changed, too late for vanished data) just like the disk that failed the night before that stack of blank DVDs was due to be delivered. Though that last one in particular is a clear demonstration of the temporal and contextual awareness of devices that exists at the most fundamental level of reality, and even CERN's toasted weasel only barely scratches the surface.

      1. moiety

        Re: 'ha ha you are too late' dialogues

        the genius Windows 7 folder re-selection 'feature' which changes the selected folder to the containing folder without any visual clue

        Been there. Done that. Watched /Documents and Settings/ disappear in front of my eyes. Downloaded Explorer ++ and never used the Windows one again.

    3. Doctor Syntax Silver badge

      Re: Confirmation dialogues

      'Followed by "Are you really sure? Really really?"'

      Followed by "We're recording that it's you who's doing this. If this is a mistake it'll be on your head."

  14. J. R. Hartley Silver badge

    That comma

    Is giving me the, creeps.

  15. swampdog

    Dodgy right shift key a problem which plagues me now we all have to use cheapo keyboards which too easily fill up with crud.

    eg: "rm *.txt" mutates into "rm *>txt" (ditto "mv").

    1. John Brown (no body) Silver badge
      Thumb Up

      Re: Dodgy right shift key

      Not so many years ago, it was a final warning if not a sackable offence to eat or drink near a computer/PC/work station. Nowadays, keyboards, when tipped over, could just about feed the 5000. The hardware is cheap so no one cares any more. They forget how valuable the data is. Cheap nasty keybpards don't help.

  16. Mark 85 Silver badge

    Ah... human error..

    Just starting as a full-time programmer. So I created some stock libraries and parked them in my share. The other programmers could and did, grab one as needed and move it to their share to modify or compile as needed.

    Cue the new hot-shot coming in 6 months later. Grabs my disk read and fill 10 pointers and uses it in one of his programs. However, he only changed my library in my share and then compiled his program. A week later, someone asked if I could speed the program I'd written up a bit. So, I go change the number of pointers to be filled on each disk read to 20, compile and test. Next thing I know, the entire day's work for the company (a massive customer database) is disappearing...

    Hindsight... young lad had changed the line for my memory release for the pointers to "delete file" in a directory his program created. He hadn't copied it over to his share as was standard practice (and he damn well knew about that) but just changed the file in my share. I should have rechecked that library but several hundred lines of code for various functions just wasn't about to get reviewed everytime I compiled. Luckily, the nightly backup from the night before was still on site, and the incremental back-ups were set to "write but never delete". Instead of taking several days to recover, we were able to recover overnight.

    After that, we (with manglement's approval) locked down the shares such that they could be read but not changed by anyone other than the share owner.

    1. Anonymous Coward
      Anonymous Coward

      Re: Ah... human error..

      Git or another source control system could have helped you here.

      1. Richard 12 Silver badge

        Re: Ah... human error..

        I assume this dates from before real source control, when at best, only file-control tools existed.

        Like VSS.

      2. tony2heads

        Re: Ah... human error..

        The here was the one who changed the original version and not use a copy

  17. inmypjs Silver badge

    "Processes that are consistently..."

    "applied make life easier and help us all to avoid making the same errors"

    Oh yeah. Every time some idiot screws up we rewrite the process to make sure it can't possibly happen again.

    So you end up that passing a design/document from one department to the next or from one phase of development to the next requires 14 different people to sign off and at least two of them will be on holiday or working at some other location and the other 12 will be picky as hell because making something happen is your problem not theirs. .

    You end up losing the will to live faced with processes which seem to be designed to make sure nothing ever happens again.

  18. x 7

    "Then there was the small business where a staffer accidentally pressed the delete key for files held on an Iomega ZIP and then clicked 'yes' to confirm. Unfortunately, the Recycle Bin doesn’t always save you and the business owner was unable to recover the data"

    I find that hard to believe, unless the drive was overwritten. The usual recovery tools should have worked

    1. ecofeco Silver badge

      Iomega was notoriously unreliable.

      1. Jeffrey Nonken Silver badge

        Why the downvote? Zip disks were horrible. I would explain why except out it would likely turn into a rant. With foaming at the mouth.

        1. TheFirstChoice

          Zip discs may have been a bit horrible, but they were nothing compared to the game of Russian Roulette played with your data when you used LS-120 so-called "SuperDisks".

  19. circusmole

    When I was in...

    ...this line of business, when I investigated calls like "Our system is down, we've lost XXXXX customer records, our DB is suddenly corrupted, we can't log on - we MUST have a VIRUS - help!!!!" I found that 90% of the time it was an operator error. More often than not some normally sane and reliable operator had brain fade and done something stupid because they did not document their process and procedures with sufficient rigour - if at all.

  20. Version 1.0 Silver badge

    And then of course there's "system maintenance"

    I had a DEC engineer visit a customer of mine (this was back in the 80's) to fix a minor hardware problem on their 11/73 - in the process the engineer "lost" the application virtual disk, said "Oops sorry" and left the customer - a hospital in Houston, TX - to sort it out.

    So they called me about 4:30pm with patients booked for pre-surgery work the following morning - and I logged into the machine via a 1200 baud modem and spent the rest of the night rebuilding the applications and restoring everything from mag-tape reels. I'd forgotten all about that until I read this story - I remember thinking at the time, "Damn, I'm glad they upgraded to a 1200 baud modem."

  21. Justicesays

    sometime it's the OS (here looking at you linux)

    Work on some files, processing them.

    In order to differentiate the input and the output I capitalized the first letter of the outputs and stored them in the same dir.

    Finished processing, files look good, delete the input files.

    rm [a-z]*

    All files gone?

    Some time spent reading up on LC_COLLATE and swearing.

    Luckily nothing that important lost, just some time.

    1. circusmole

      Re: sometime it's the OS (here looking at you linux)

      I still miss file versions from my days working with VAX/VMS and before that, RSX-11M. Every time you change (edit ?) a file and save it you simply create a new version with an incremented version number, the previous version is still there. VMS operators/programmers/users soon learnt never to delete files but always purge old versions as necessary with "$purge /keep=3". (for instance).

  22. Adrian Midgley 1

    A human factors tip

    Arrange that nobody interrupts someone who is doing something complicated and important.

    This is amazingly difficult to arrange in quite safety and mission-critical areas.

  23. Matt Bryant Silver badge

    The old OMG face events.

    Not fun at the time, but definitely a good chuckle looking back at those times when your colleague's usual deadpan was replaced by that look of pants-filling terror, accompanied by the whimper of "<insert expletive>, I typed 'rm -r *'!" Strange how even the most experienced admin seemed to do that at least once, despite it being the most joked about error possible.

    But I think the best was when a trainee electrician pushed the emergency power breaker button (the one behind a flip-up plastic shield, labelled in red "Only use in event of electrocution!") and dropped the power to a whole datacenter hall.

    1. Alan W. Rateliff, II

      Re: The old OMG face events.

      I just have to say that if you have never filled your pants with terror, or some other matter, then you may just not be worth your salt. I know I have learned a couple of damn good lessons in moments of sheer terror followed by thoughts of what it might be like to live in Belize under an assumed name.

      Taking an entire dial-up ISP off-line by a deny filter not understanding that once a deny is in place you damn well better have an explicit allow, then having to haul ass across town to fix the error via serial console. Orphaning a 48GB Exchange database during an Intel-to-AMD hardware upgrade because the logs were still stored on the system drive which is now wiped and reloaded loaded with a new SBS 2003 installation, and the subsequent weekend learning the magic of eseutil. Using tab completion on the target of a cat /dev/null > and missing the target, completely killing a customer portal website and the time in restoring from off-site tape.

      I am certain I have a few other little ones not so serious which have taught the value of proof-reading, testing, and testing the tests, and how quickly one can spin up a replacement dust-box when really necessary.

      I have said in the past if ever in a position to hire, I would never hire anyone who answers "no" the question of "have you ever crashed a server or lost critical data?" I want to know first how you react to a disaster (especially of your own creating,) secondly how you work under subsequent pressure, thirdly what you did to recover, fourthly what you did or now do to ensure that particular mistake or similar mistakes never happens again, and last but not least how you reported the incident.

  24. Sam Liddicott

    swapoff too slow?

    On a production solaris box with no failover,, swapoff was too slow and I needed the disk space consumed by the swap file more than I needed the virtual memory.

    > /swap.1

    was effective at truncating the swapfile and recovering the disk space. The prompt even came back.

    It was with a poignant mix of sad humour and annoyance that IT support drove me to the data centre so that I could suffer with them the inconvenience that I had put them too in my thoughtless carefree manner. They were great guys.

  25. Johan Bastiaansen

    not human error, but sheer stupidity

    One of my clients was a company, run by two brothers. One of them died. I made a note of that in our CRM software, indicating he shouldn't be contacted anymore. That comment got removed by HQ and they continued to contact him. The company complained to me and I put the note back in. Again, it got removed by HQ and they continued to contact him.

    I put the comment back in with a warning to a Dutch twat and the physical damage I had in store for him if he removed the note again.

    So they deleted him as a contact person and with that went all information, emails, visits and contracts that were linked to him.

  26. Baldy1138

    User Rights and Admin Wrongs

    "users often have more rights than they need – and it is a no-brainer to rein them back."

    Unfortunately, what IT regards as allowable user rights is sometimes drastically lower than what users actually need. In my workplace, IT crippled workstations in the interest of efficient maintenance and damage control, but wound up crippling innovation instead. It was a different kind of no-brainer to re-elevate permissions.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019