Re: Ah... human error..
Git or another source control system could have helped you here.
Everyone I speak to about system security seems to panic about malware, cloud failure system crashes and bad patches. But the biggest threat isn’t good or bad code, or systems that may or may not fail. It’s people. What we call Liveware errors range from the mundane to the catastrophic and they happen all the time at all levels …
Git or another source control system could have helped you here.
I assume this dates from before real source control, when at best, only file-control tools existed.
The here was the one who changed the original version and not use a copy
"applied make life easier and help us all to avoid making the same errors"
Oh yeah. Every time some idiot screws up we rewrite the process to make sure it can't possibly happen again.
So you end up that passing a design/document from one department to the next or from one phase of development to the next requires 14 different people to sign off and at least two of them will be on holiday or working at some other location and the other 12 will be picky as hell because making something happen is your problem not theirs. .
You end up losing the will to live faced with processes which seem to be designed to make sure nothing ever happens again.
"Then there was the small business where a staffer accidentally pressed the delete key for files held on an Iomega ZIP and then clicked 'yes' to confirm. Unfortunately, the Recycle Bin doesn’t always save you and the business owner was unable to recover the data"
I find that hard to believe, unless the drive was overwritten. The usual recovery tools should have worked
Iomega was notoriously unreliable.
Why the downvote? Zip disks were horrible. I would explain why except out it would likely turn into a rant. With foaming at the mouth.
Zip discs may have been a bit horrible, but they were nothing compared to the game of Russian Roulette played with your data when you used LS-120 so-called "SuperDisks".
...this line of business, when I investigated calls like "Our system is down, we've lost XXXXX customer records, our DB is suddenly corrupted, we can't log on - we MUST have a VIRUS - help!!!!" I found that 90% of the time it was an operator error. More often than not some normally sane and reliable operator had brain fade and done something stupid because they did not document their process and procedures with sufficient rigour - if at all.
I had a DEC engineer visit a customer of mine (this was back in the 80's) to fix a minor hardware problem on their 11/73 - in the process the engineer "lost" the application virtual disk, said "Oops sorry" and left the customer - a hospital in Houston, TX - to sort it out.
So they called me about 4:30pm with patients booked for pre-surgery work the following morning - and I logged into the machine via a 1200 baud modem and spent the rest of the night rebuilding the applications and restoring everything from mag-tape reels. I'd forgotten all about that until I read this story - I remember thinking at the time, "Damn, I'm glad they upgraded to a 1200 baud modem."
Work on some files, processing them.
In order to differentiate the input and the output I capitalized the first letter of the outputs and stored them in the same dir.
Finished processing, files look good, delete the input files.
All files gone?
Some time spent reading up on LC_COLLATE and swearing.
Luckily nothing that important lost, just some time.
I still miss file versions from my days working with VAX/VMS and before that, RSX-11M. Every time you change (edit ?) a file and save it you simply create a new version with an incremented version number, the previous version is still there. VMS operators/programmers/users soon learnt never to delete files but always purge old versions as necessary with "$purge foo.bar /keep=3". (for instance).
Arrange that nobody interrupts someone who is doing something complicated and important.
This is amazingly difficult to arrange in quite safety and mission-critical areas.
Not fun at the time, but definitely a good chuckle looking back at those times when your colleague's usual deadpan was replaced by that look of pants-filling terror, accompanied by the whimper of "<insert expletive>, I typed 'rm -r *'!" Strange how even the most experienced admin seemed to do that at least once, despite it being the most joked about error possible.
But I think the best was when a trainee electrician pushed the emergency power breaker button (the one behind a flip-up plastic shield, labelled in red "Only use in event of electrocution!") and dropped the power to a whole datacenter hall.
I just have to say that if you have never filled your pants with terror, or some other matter, then you may just not be worth your salt. I know I have learned a couple of damn good lessons in moments of sheer terror followed by thoughts of what it might be like to live in Belize under an assumed name.
Taking an entire dial-up ISP off-line by a deny filter not understanding that once a deny is in place you damn well better have an explicit allow, then having to haul ass across town to fix the error via serial console. Orphaning a 48GB Exchange database during an Intel-to-AMD hardware upgrade because the logs were still stored on the system drive which is now wiped and reloaded loaded with a new SBS 2003 installation, and the subsequent weekend learning the magic of eseutil. Using tab completion on the target of a cat /dev/null > and missing the target, completely killing a customer portal website and the time in restoring from off-site tape.
I am certain I have a few other little ones not so serious which have taught the value of proof-reading, testing, and testing the tests, and how quickly one can spin up a replacement dust-box when really necessary.
I have said in the past if ever in a position to hire, I would never hire anyone who answers "no" the question of "have you ever crashed a server or lost critical data?" I want to know first how you react to a disaster (especially of your own creating,) secondly how you work under subsequent pressure, thirdly what you did to recover, fourthly what you did or now do to ensure that particular mistake or similar mistakes never happens again, and last but not least how you reported the incident.
On a production solaris box with no failover,, swapoff was too slow and I needed the disk space consumed by the swap file more than I needed the virtual memory.
was effective at truncating the swapfile and recovering the disk space. The prompt even came back.
It was with a poignant mix of sad humour and annoyance that IT support drove me to the data centre so that I could suffer with them the inconvenience that I had put them too in my thoughtless carefree manner. They were great guys.
One of my clients was a company, run by two brothers. One of them died. I made a note of that in our CRM software, indicating he shouldn't be contacted anymore. That comment got removed by HQ and they continued to contact him. The company complained to me and I put the note back in. Again, it got removed by HQ and they continued to contact him.
I put the comment back in with a warning to a Dutch twat and the physical damage I had in store for him if he removed the note again.
So they deleted him as a contact person and with that went all information, emails, visits and contracts that were linked to him.
"users often have more rights than they need – and it is a no-brainer to rein them back."
Unfortunately, what IT regards as allowable user rights is sometimes drastically lower than what users actually need. In my workplace, IT crippled workstations in the interest of efficient maintenance and damage control, but wound up crippling innovation instead. It was a different kind of no-brainer to re-elevate permissions.
Biting the hand that feeds IT © 1998–2017