back to article First-day-on-the-job dev: I accidentally nuked production database, was instantly fired

"How screwed am I?" a new starter asked Reddit after claiming they'd been marched out of their job by their employer's CTO after "destroying" the production DB – and had been told "legal" would soon get stuck in. The luckless junior software developer told the computer science career questions forum: I was basically given a …

Page:

  1. Lee D Silver badge

    Question:

    Why did first-day-worker have write access to the production database anyway?

    It's not even a question of backups (that goes without saying) - but how did someone walk in and get given production database write access by default without even a hint of training?

    And why are THEY setting up their dev environment? Why is that not done for them, especially if it involves copying the production database?

    The problem is the culture of a place like that, not what the guy did - even if you assume he's being quite modest about how careless he was.

    1. gnasher729 Silver badge

      "Why did first-day-worker have write access to the production database anyway?"

      Because he was given documentation that he was supposed to follow, and the documentation included the password. Password = write access. No password = no access. Everyone with access to the documentation, including you and me, had write access to the database.

      1. Lee D Silver badge

        AND WHY?!

        Why would you do that? Is he going to be committing to the live database on his first day? No. Read-access, yes. Write? No.

        Least privilege principle. If you don't have write access to it, you can't damage it.

        And what prat just puts passwords for write-access on the production database in a document that's going to end up just-about-anywhere in six month's time?

        This is my question, not "how", which you answer. WHY!?

  2. zanshin

    Count me among those wishing...

    ...that "whoever wrote the docs" was an option in the survey.

    Really, though, I don't think anyone *needs* to be fired over the actual failure. It's a small company, and they just experienced the results of a comedy of collective errors across various people that are probably the result of technical debt accrued as they got up and running. This never has to happen, but it's pretty common. It's bad, but you learn, adjust and move on. Well, you move on unless the lack of backups in this case means they are so thoroughly screwed that they have to shutter the business.

    All that said, I do think the CTO needs to go, not necessarily because this happened, but for how he's handled this. Politically, he might have needed to go anyway by virtue of falling on his sword for owning the collective failures of the team(s) under his control, but the obvious scapegoating of the new hire for what's a really broad failure of IT processes says to me that he's not going to learn from this and get his ship right. One way or another he needs to own this, and firing the new guy who copy/pasted commands right from an internal document doesn't convey that kind of ownership to me.

  3. DubyaG

    Last place I worked...

    One of the analysts working with a Prod DB ran an update query without the WHERE clause. Much hand wringing and gnashing of teeth followed, possibly by some braid pulling.

    1. Dave Schofield

      Re: Last place I worked...

      s>One of the analysts working with a Prod DB ran an update query without the WHERE clause. Much hand wringing and gnashing of teeth followed, possibly by some braid pulling.

      Way back in the depths of time (mid 90s), I was by default the DBA of an Informix database and was learning SQL. I managed to set the primary key field for a large subset of records in the database to the same value through a badly configured UPDATE query. That took some sorting out, but I managed it eventually.

      1. Doctor Syntax Silver badge

        Re: Last place I worked...

        "I managed to set the primary key field for a large subset of records in the database to the same value through a badly configured UPDATE query."

        The primary key didn't have a UNIQUE index on it?

    2. tokyo-octopus

      Re: Last place I worked...

      "One of the analysts working with a Prod DB ran an update query without the WHERE clause."

      That analyst clearly has never heard of something called "transactions". Or he/she was working on MySQL with MyISAM tables.

  4. Rosie Davies

    Nearly No-one

    Who has:

    - Ultimate responsibility for ensuring all documentation is correct?

    - Ultimate responsibility for ensuring back-up work (including testing regime)?

    In both cases it's the CTO's job.

    If the tale had stopped before it got to the bit about the new chap being fired I'd have gone with "No-one, it was a wake-up call for everyone." Even C level folks are only human. Things can get over-looked; over-reliance on assurances from the management chain can combine with a monstrous workload to help this along.

    But it didn't stop there. The actions of the CTO tend to suggest someone lacking the emotional maturity, willingness to take personal responsibility and with the primary desire of covering their own behinds at all cost to ever be trusted with that responsibility.

    Rosie

  5. Anonymous Coward
    Anonymous Coward

    ehm, one thing

    I agree that first one to be fired should be documentation author and second person who approved it, then backup admin and finally CTO should follow them too but one thing: this guy was supposed to use credentials he got from script output, not the one from the doc. There were creds in the doc and shouldn't be, true, but it doesn't mean this guy is completely innocent. He didn't follow his instructions. The fact that keys to the safe are laying on the safe doesn't mean that you are allowed to open it and destroy whatever is inside.

    I guess that credentials given in the doc were given as an example - it's unbelievable idiocy to give real credentials as an example but example is example, it doesn't mean that these are the values you should use.

    1. DavCrav

      Re: ehm, one thing

      "He didn't follow his instructions. The fact that keys to the safe are laying on the safe doesn't mean that you are allowed to open it and destroy whatever is inside."

      This is more like the Big Red Button for a data centre being next to the exit button for the door, as was in another Reg story recently. You cannot blame the hapless person who hit the BRB.

    2. Anonymous Coward
      Anonymous Coward

      Re: ehm, one thing

      This is assuming the credentials were for the production server at the time they were written? Perhaps those really were for a dev server and the production server used other credentials then the credentials were changed for some reason aided by foggy memory and so on? Does anyone know how long it had been since the document was checked? Is there a legal requirement for keeping the document current? If so, how long is that limit under the law?

      1. Doctor Syntax Silver badge

        Re: ehm, one thing

        "Is there a legal requirement for keeping the document current?"

        Very unlikely in most legislations. Would there even be a legal requirement for the document to exist? There may be a requirement if the business were ISO9000 accredited or something similar. If the latter I'd say this was a clear fail of that.

  6. Tony S

    I worked at a place where we had a major cock up like this. As IT Manager, I took full responsibility.

    I insisted that the production environment should be kept separate, and access limited; but I was overruled by the directors.

    I said that the consultants should not be given admin access; but I was overruled by the directors.

    I demanded extra backup and continuity resources; but I was overruled by the directors. They also cancelled 3 separate planned DR tests in a 2.5 year period.

    When inevitably the consultants screwed up, the entire system went titsup. We were able to get it back up and running, but it took 8 days just to determine the problem. As it was not in the data, restore from backup did not fix the issue.

    Shit happens; how you deal with it shows the character of the individual.

    1. ecofeco Silver badge

      Yep, you can't fix the stupid when it comes from the boss/client. All you can do is cover your ass.

      1. Doctor Syntax Silver badge

        "All you can do is cover your ass."

        Which is just what the CTO seems to have been doing.

    2. Androgynous Cupboard Silver badge

      It's not an airline by any chance?

  7. Flakk

    The CTO told me to leave and never come back. He also informed me that apparently legal would need to get involved due to severity of the data loss.

    "Do what you feel is right, Sir. However, I promise you that if you get legal involved, I'll make you famous."

    Somehow I don't think I'd be hearing from the company ever again.

    1. dlc.usa
      Devil

      > "Do what you feel is right, Sir. However, I promise you that if you get legal involved, I'll make you famous."

      I would never say that. You need to complete the live test of the madhouse by including legal. If they decide to go after you, the lawyers also have earned their share of the notoriety.

  8. Anonymous Coward
    Anonymous Coward

    separation of responsibilities

    Devs should not get credentials for production databases.

    All stuff they do should be on various dev databases.

    With test deploys of development changes, done on a db that should be essentially a clone of any production database (with necessary anonymization done if actually populated from a production db). The testing / deploy should be done by non dev staff to make sure all is clear and simple & no hidden undocumented stuff needed.

    I work in dev and have no access / knowledge of any customer database credentials, nor should I (not even access to production servers, never mind the databases therein)

    All new dev updates are deployed and tested, by test engineers (again, do not have production access, so cannot accidentally run on wrong system), running against a test db that mimics typical customer production databases (and can be easily restored if things break horribly)

    If testing is OK, updates are deployed on customer production sites - before deploy database and old application backups are taken (even in cases where customers have "realtime" backup db replication )

    People can make mistakes, especially someone nervous, overwhelmed on first day of new job, mistakes happen, no back covering system is ever perfect, but at least you try and have half decent procedures in place to reduce likelihood of nasties (so where I am, non dev people do have access to production DBs, and they are the ones who have to be careful, as they could wreak havoc, hence others do testing first).

    The scenario described, where production db creds are flying around for all and sundry to see is just deranged

    If I do need to look at a problem on a production system (e.g. if it looks like an odd data / some corruption issue causing problems) then I work on a clone of that system that reproduces the problem, not the system itself

    AC for obv detailed real world reasons.

  9. Anonymous Coward
    Anonymous Coward

    Oh, the stories I could tell...

    Oh, the stories I could tell, if only I were allowed to tell them.

    Such as the time, while working as a code librarian, that I wiped out 600 man-years (Yes, man-years, as in 30 programmers having worked for 20 years) of work, while doing library maintenance, coupled with a failing disk. Whoopsie! Fortunately, we had multiple backups, including some which were stored off-site. Whew! It shut us down for a day, while we recovered the library, but that (and a round of beers for the system administrator), was a small price to pay. Whew!

    Dave

  10. Anonymous Coward
    Anonymous Coward

    Production versus Test

    I once worked on a medical prescription handling project. We had the production system, along with a test system that the developers used, as well as a handful of clueless developers who couldn't seem to differentiate between the production and test systems. So, we'd occasionally get a call from a local pharmacy, wanting to know why Dr. Feel Goode had sent a prescription for 10,000 Xanax for patient John Doe to them. Ack!

    Anon Y. Mus

    1. Anonymous Coward
      Anonymous Coward

      Re: Production versus Test

      I have a similar story from one of my first dev jobs, where everything was ASP classic, nothing had any tests. We did have local environments but you couldn't realllly test anything properly without doing it on production.

      The website sold sex toys, and another developer put a test order through to his home address (lived with his mum) and a 12" black rubber dildo turned up at her door.

      1. DryBones
        Coffee/keyboard

        Re: Production versus Test

        I think this is the first actual, physical manifestation of a cock-up that I have heard of.

  11. clocKwize

    So.... there is no data security, if the production credentials are in a dev guide...

    So.... there are no backups of production data...

    So.... they let a junior developer who is totally new to their system set up it up on their own...

    We all mess up once in a while. That is why we do things in such a way that its really damn hard to do things like this, without knowing what you are doing.

    Sure at my company I can connect to our production system, and in theory could wipe it, if I wanted to. It would have to be very very deliberate. If it did happen, we have several layers of backup we can fall back on to. Fortunately it has never happened.

    If something like this can happen so easily by accident, it is not the junior developers fault, it is the CTO for not ensuring that the systems are built with consideration for such things.

    Hopefully the CTO gets fired. He deserves it. I'd like to say the junior dev could file for wrongful dismissal, but try explaining the above to a judge who has no idea how to computer. It'd be a waste of everyones time.

    1. Ken Hagan Gold badge

      "but try explaining the above to a judge who has no idea how to computer"

      It really does depend on the judge and I'm sure there's a car analogy you could use in an emergency.

      The points you make don't actually depend on understanding how to protect or run a computer system. It should be clear even to the most out of touch judge that if the company depends on "things working" then efforts should be made to ensure that "things work" and "things can be repaired" if they stop working. Then, of course, there's the possiblity of dragging in an expert witness and just letting them laugh their arse off in open court when asked whether the company's setup was either "fit for purpose" or "best practice".

  12. cyclical

    When I first started at my current company, access to all the live environments was firewalled from my IP with the message 'f**k off tom, don't you f**king dare'. I guess my predecessor Tom did some damage at some point.

  13. Destroy All Monsters Silver badge

    When DevOps goes too far

    .... The tears are one junior programmer away.

  14. Anonymous Coward
    Anonymous Coward

    British Airways Wot Dunnit Innit?

    Late to the party....

  15. John Smith 19 Gold badge
    Unhappy

    Sounds like plenty of blame to go around.

    At all levels.

    The best test environment I've used had a level above "company" in the DB hierarchy. Although there was only 1 real company (no separate divisions) there was a second test company and it duplicated all the structure of the main company, so anything could be tested on it. If it didn't work, no problem. Scrub the stuff you created (or copied off the live branch) and start again. Your (faulty) code just nuked the (test) database? Well your co-workers are going to be a bit p***ed and Ops will have to recover a version so everyone else can get on with their work but otherwise not a disaster.

    It's an obvious tactic (once you've used it or think about it) but it takes someone on day 1 to realize you want to have this feature in your system and make sure it's put in.

  16. Jonathan 27

    Yeah...

    This story implies the people involved were so stupid it makes me think it might be a troll. If it is real, the idiots who think it's a good idea to print product credentials in training materials should be canned, along with the people who supervise backups (or rather don't) and probably the CTO for being a totally clueless fool. This is just an after-affect for massively incompetent management, you can't blame the first-day junior dev for it.

  17. jgarry

    rulez

    1. Always follow instructions exactly.

    2. Never follow instructions exactly.

    3. See rules 1 and 2.

    1. John Smith 19 Gold badge
      Unhappy

      "rulez"

      Yes.

      What they really mean is "Don't use your initiative, unless the instructions are wrong, in which case do."

      When I put it that way it sounds like bu***hit, because it is.

  18. Fritzo2162

    This is all the CTO's fault. Probably fired the Dev to cover his ass. What a terrible, fragile infrastructure!

  19. Anonymous Coward
    Anonymous Coward

    My money

    Is on the document having been written by a BOFH type of person. This wasn't a mistake, this was someone who likes keeping people on their toes.

    That's why I do all my development on the production server. As root. >:)

    PS: Backups are a sign of weakness.

    1. Wensleydale Cheese

      Re: My money

      "My money ... is on the document having been written by a BOFH type of person."

      My money is on a scenario such as the current sysadmin being asked to write the instructions for his successor.

      Then at some point down the line someone else sees that document and asks for a copy to edit and redistribute for more general use, not realising that it contains production-specific information.

      1. Anonymous Coward
        Anonymous Coward

        Re: My money

        > My money is on a scenario such as the current sysadmin being asked to write the instructions for his successor.

        No credentials should have been mentioned anyway.

    2. John Smith 19 Gold badge
      Coat

      "PS: Backups are a sign of weakness."

      Ahh.

      Klingon DevOps.

  20. This post has been deleted by its author

  21. Stevie

    Bah!

    Too many "Oops" moments in this Buster Keaton movie script.

    Disbelieve.

  22. Schnauzer

    Fire the hiring manager

    If the developer had enterprise database experience, this wouldn't have happened. My fingers wouldn't let me blithely type commands that alter the origin database when the point is to create a local copy.

    Where I work, erroneous, unmoderated and outdated home-brew docs are commonplace. In time, you learn not to follow them to the letter, but new hires generally get bitten once before they gain this wisdom.

    It's not the dev's fault for not knowing, it's the fault of the hiring manager for not properly qualifying the applicant. Punt said manager, move the applicant into a more suitable role, then make this incident an interview question to ensure that future candidates are screened for database skills.

  23. esoteric

    Protecting Production....

    In the 180 posts above I assume the following is repeated many times, and I have not read it all.

    I just feel compelled to add my mere two cents worth.

    The level of skill at the small end of practice is often abysmal as in this case. The CTO is an inexperienced unqualified idiot and almost certainly a bully. If for example he genuinely had an ITIL Manager certificate, none of this could have happened if he was following event the basic aspects of best practice.

    In over 40 years of working for large multinationals, and one government department, for the multinationasl, production systems were not accessible to development staff, at all. One of them even had separate networks, as in several air gaps and military style access control to production worlds. A contrast is provided by the Government department , controlling driving license data. It of course was dominated by unqualified bullies, who where even happy to have offshore contractors access the internal network to build and maintain systems despite being warned. Oh and of course the trouble making people objecting, no longer work there.

  24. michael cadoux

    As a newbie I was given some JCL to test a program change and specifically assured that it only read a more-than-usually important database, used in thousands (literally, it was a large company) of jobs. After 2 or 3 runs I noticed that the output showed a run number that increased by 1 each time and reported it, only to be told that was impossible as my Dev sign-on didn't have update authority. Oh dearie me. Turned out there'd been a special dispensation for another project to update using Dev sign-ons, and the authorisation hadn't been cancelled.

    My supervisor was visibly worrying about whether he'd keep his job! Luckily what I was running didn't update client data, only the run number for audit.

  25. tedleaf

    Oops

    1976,us a group of A level mixed tech students in Kent being taught the very basics of computing,invited to go see a big all floor mainframe just up the road(litetaly) from college,shown around for an hour or two with brief descriptions of what each chunk of hardware was doing,then left alone for a few minutes,nobody at all except us 12 students in computer room,someone decides to tap a bit of nonsense on a keyboard,nothing obvious happens,fella comes back,finishes tour with a thank you and maybe he will see us in a few years as employees,we all say cheers and bye,walk back to college,an hour later,paniced tutors pull all us visitors out of various lectures and gather us in common room,to tell us that the mainframe we just visited appears to have died and did any of us touch anything,everyone shakes heads vigorously,deny everything,find out next morning that entire data system is gone,and we have probably just bankrupted a major American oil/mineral exploration analysis company used by lots of smaller companies !! oops..

    1. Stevie

      Re: Oops

      Wow. 1976 and no backups.

      Incredible.

      1. tedleaf

        Re: Oops

        From what we got told/heard over the months was that it was all the tapes with fresh data on that got killed,the ones from small companies that had fresh data about new exploration holes etc was lost !

        You would have hoped that it would have been standard practice for someone somewhere to take copies,but apparently not.

        The firm involved went into lock down for months,all sorts of rumours about big oil firms sabotaging the systems etc etc,nobody ever outright accused us of killing the system/data,so it could have been their own cock up.this was an American firm,staffed solely by Americans,who had very little interaction with local population,so all we ever heard was gossip..

        They did survive,but it took 3 years apparently to get over the problem..

        I think they used it as an excuse/reason to upgrade/rebuild the entire system,and it was big,floor was 70 yards by 30 yards,one entire wall was reel to reel decks,rest of the space was well crammed with the rest of the hardware,the ac system was massive,took up another entire floor..

      2. tedleaf

        Re: Oops

        Am racking my brains trying to remember the name of the company,I have occidental stuck in my head at the moment,but am almost certain it was something different.

        Ashford,Kent,UK,1976,there cannot have been many big main frames in the area,this one was right in the town centre,the building has gone now,but someone must have some memories from then.

        But as I said,it was a totally American firm,with emoyees all Americans..

        It's all a long time ago,I can only remember one other name of the 12 of us involved and she was polish !!

    2. Anonymous Coward
      Devil

      Re: Oops

      Student Days;

      Ooh such a joy to type format c:\

      or something like that on every available terminal

  26. Anonymous Coward
    Anonymous Coward

    scott

    tiger

  27. Terry 6 Silver badge

    Step away from IT for a minute - Induction 101

    1.) No new member of staff in any area of work should be given an instruction document and told to get on with it. Because there will always be ambiguity, uncertainty and complexity that some new staff will trip over. That's why organisations have a proper induction ( supposedly).

    2.) No new member of staff in any area of work should be given any kind of task that brings them close to a destructive error until they know the systems and where the risk factors are. Because you can't avoid the trip hazard until you know where the trip hazards are. (see induction comment)

    3) No new member of staff in any area of work should be given any kind of instruction (written or verbal) that hasn't been looked at and potential dangers removed - because they don't have the background to recognise those dangers. ( See ditto)

    4) No new member of staff should be expected to discover what the "unknown unknowns " are. because you can only do that by making a mistake, possibly a catastrophic one. The Staff responsible for induction should always ensure that they know what those risks for the new employee are likely to be, ( which effectively means knowing what the job is and what the organisation's culture is ) and preempting these. Perhaps by giving them a mentor who can still remember being new. (another ditto).

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like