back to article IT 'heroes' saved Maersk from NotPetya with ten-day reinstallation bliz

It's long been known that shipping giant Maersk suffered very badly from 2017's NotPetya malware outbreak. Now the company's chair has detailed just how many systems went down: basically all of them. Speaking on a panel at the World Economic Forum this week, Møller-Maersk chair Jim Hagemann Snabe detailed the awful toll of …

  1. Anonymous Coward
    Anonymous Coward

    'internet was not designed to support the applications that now rely on it'

    10 days and only “a 20 per cent drop in volumes”.

    Did he say how it ended up costing them $300m?

    1. Anonymous Coward
      Anonymous Coward

      Re: 'internet was not designed to support the applications that now rely on it'

      "...in the near future, as automation creates near-total reliance on digital systems, human effort won't be able to help such crises."

      Its good that Maersk acknowledge this. Since the media often downplay 'Automation' risks and ignore how breaches, hacks and malware will factor into things. Overall, I suspect many corporations are still just thinking: "Its ok, I won't get that malware 'AIDS' shit, that only happens to others".

    2. Paul Johnston

      Re: 'internet was not designed to support the applications that now rely on it'

      Seems a reasonable figure

      (10/365) * (20/100) * annual turnover

      Not got an exact figure for the turnover but revenue was over $35 billion.

      Add overtime and other sundries and its a big bill.

    3. Anonymous South African Coward Silver badge

      Re: 'internet was not designed to support the applications that now rely on it'

      So 20% drop = $300m less?

      Boggles the mind...

    4. MMR

      Re: 'internet was not designed to support the applications that now rely on it'

      Overtime, lots of takeaways, overtime, consulting and probably lots of overtime.

      1. rmason Silver badge

        Re: 'internet was not designed to support the applications that now rely on it'

        Imagine signing off on $3.8million for energy drinks and pizza.

        :)

        1. This post has been deleted by its author

        2. Anonymous Coward
          Anonymous Coward

          Re: 'internet was not designed to support the applications that now rely on it'

          rmason,

          How do you think 'Just Eat' got so big so quickly !!! :) :)

      2. John Brown (no body) Silver badge

        Re: 'internet was not designed to support the applications that now rely on it'

        ...and maybe late delivery penalties or ships overstaying their welcome in port due to the manual process.

        1. netminder

          Re: 'internet was not designed to support the applications that now rely on it'

          I know that FedEx has penalty clauses built into contracts where they provide services. I have no way of knowing if they ever paid a 'fine' but I did hear that they had every available employee had sorting. With a revenue of $50B a cost of 0.6% would not be surprising.

          Maybe if all these companies would listen to their security people and patch they could have saved most of that money.

  2. TonyJ Silver badge
    Pint

    I hope

    That all the staff that pulled this off were well rewarded.

    Because frankly that's a phenomenal effort that deserves it.

    1. Korev Silver badge
      Pint

      Re: I hope

      I agree.

      Also, I hope for IT's sake that there's a "Let's make sure this doesn't happen again" rather than finger pointing exercise .

      Another one -->

      1. macjules Silver badge

        Re: I hope

        ...rather than finger pointing exercise .

        Like WPP did.

        1. Anonymous Coward
          Anonymous Coward

          Re: I hope

          @ WPP 3 global AD forests, 1000s servers, dozens backup environments, 10000s workstations all encrypted. Networks are still wide open. They will get hit again.

          1. Anonymous Coward
            Anonymous Coward

            Re: I hope

            "@ WPP 3 global AD forests, 1000s servers, dozens backup environments, 10000s workstations all encrypted. Networks are still wide open. They will get hit again."

            I mean it's not as if WPP hasn't had these type of issues in the past with the constant stream of new companies/offices being taken on or consolidated into existing offices. Of the networks that were hit, some used to have systems in-place to stop this sort of thing however they were probably left to rot or unmanned and unmonitored while IBM waited to employ someone in India a year or two after the onshore resource was resource actioned. Or IBM have offered an expensive solution that WPP don't want to buy and neither side has the expertise to come up with a workable solution...

            And there is some network flow data being sent to QRadar systems in-place now to identify issues but whether they would identify issues fast enough to stop senior IBM managers from making bad decisions is a different story. Unless it was a temporary solution and it's been removed pending agreement on costs.

            Still, I'm sure WPP wouldn't want to upset such a valuable client in IBM by making public what the damage actually was.

      2. tonyszko

        Re: I hope

        I can't tell all the details as it is covered under many agreements, but as a person who was/is involved in this process with our managed service team (who was part of this recovery process as well) I can just say that there is no finger pointing.

        There is a lot of constructive changes and solid plan which is being implemented how to lower the risk (you can't rule it out) of such incident happening in the future.

        It was really exceptional to see how Maersk team handled it and how all involved parties (Maersk IT, managed service teams - ours included and external consultants) managed to pull it together and recover from it.

        Also it is really exceptional how Maersk is sharing the knowledge about it and what happened and how they handled it.

        We will be covering our lessons learned and experience from this event soon (next week) on our blog -> https://predica.pl/blog/ if you want to check it out.

    2. Kevin Johnston

      Re: I hope

      They have at least been acknowledged which is already a huge leap forward from the normal management responses of "why did you not prevent this" and ""that's the IT budget blown for the next 5 years, cancel the refresh program and forget asking for any overtime money"

      1. Doctor Syntax Silver badge

        Re: I hope

        "cancel the refresh program"

        It looks as if the refresh programme was brought forward.

    3. Halfmad

      Re: I hope

      That all the staff that pulled this off were well rewarded.

      Because frankly that's a phenomenal effort that deserves it.

      Annoys me that companies don't shout about how well their IT departments recover in situations like this. If they'd had a fire etc they'd be thanking those staff who helped PUBLICLY but IT is seen as a shadow department, we can't possibly talk about those people..

      1. tonyszko

        Re: I hope

        But they do - they are speaking about it on multiple events and Maersk IT got prised for what they delivered in this case.

      2. Anonymous Coward
        Anonymous Coward

        Re: I hope

        > Annoys me that companies don't shout about how well their IT departments recover in situations like this.

        Err... This guy just did. Hence the headline.

    4. Anonymous Coward
      Anonymous Coward

      Re: I hope

      Their main IT support is via IBM so you can guess the chances of reward were between Buckleys and none.... (unless you were a manager.)

      They had lots of heroes including the local techie who had the presence of mind to turn off one of the DC's once they realised what was happening - that saved their AD.

      We'd heard bits and pieces of what had gone on during the recovery (usual stuff you'd expect - block switches, block ports until each host was confirmed a clean build in a segment then slowly remove the blocking/site isolation.) We didn't, however, see any emails publicly acknowledging their efforts.

      1. Ken Hagan Gold badge

        Re: I hope

        "They had lots of heroes including the local techie who had the presence of mind to turn off one of the DC's once they realised what was happening - that saved their AD."

        Hmm. Yes. I imagine the rebuild might have taken more than 10 days if it had included typing in a new AD from scratch.

        1. tonyszko

          Re: I hope

          Main challenge was to have data from AD to start recovery.

          Main question here is - how many organisations have procedure for forest recovery, which is mostly logistics task with good understanding of AD as a service.

          My consulting experience from last 20 years tells me that 99% of organisations doesn't have it and never thought about it as something which will never happen.

          1. InfiniteApathy

            Re: I hope

            >Main question here is - how many organisations have procedure for forest recovery

            You just made my butthole clench so fast I'm missing a circle of fabric from my chair.

        2. tonyszko

          Re: I hope

          And BTW - good forest recovery plan has "typing in a new AD from scratch." planned somewhere along the path of recovery if it is not meeting business requirement of recovery time. It was included in each "GOOD" recovery plan I saw and had a chance to build or read.

          1. Anonymous Coward
            Anonymous Coward

            Worse things happen at sea.

            Compared to a ship sinking with possible loss of life, IT isn't their biggest problem. One ship and cargo probably cost more than $300 million.

            Their DR plan must be good; comparing how the reality compared to their plan would make a good case study.

          2. Anonymous South African Coward Silver badge

            Re: I hope

            I had to recover an AD site once, it had only one PDC, no BDC's, but luckily there was a recent backup made of the PDC (Server2k3 and NTBackup).

            Process was to reinstall Server2k3 on a clean server, run ntbackup to restore the AD backup, and we were back in business again. Only niggle was hoping that Windows Activation would go through as I was not in the mood to faff around with that - but it went through just fine. I then set up a BDC just in case, but still continue to make backups from the PDC juuuust in case.

            And recovering the forest is no biggie as there's about 60 users - but a backup and BDC makes things so much easier.

            But yes, forest recovery, especially with multiple sites and domains need to be addressed. Setting it all up from scratch by hand leads to errors and mistakes if due care is not taken.

          3. Stork Bronze badge

            Re: I hope

            My time at Maersk Data (IT subsidiary until 2005) made me paranoid about backups. They considered to make mirrored systems kilometers apart.

        3. J. Cook Silver badge
          Pint

          Re: I hope

          I've (thankfully) never had to restore AD from a backup, and Bog as my witness, I hope to never need to.

          Pulling the plug on a DC was *definitely* a heroic measure- even if it's not a FSMO, if it's a global catalog server, it can be promoted to one and used to rebuild from.

      2. tonyszko

        Re: I hope

        This wasn't handled through standard IT support. IBM probably had it role there but it was mostly recovered by Maersk IT team, consultants from vendors (you can imagine which one) managed service teams like ours and external consultant.

        Lots of people were working there on shifts spending time on-site for several days to make it happen.

        I can't say about public e-mails from Maersk, but Maersk IT team is very open on what they did and how, I saw members of the team speaking about it on few sessions.

        I said this earlier in this thread but just for the information - I can't provide official information on it, but our managed services team who was part of this recovery effort spending couple of weeks on site is cooking a blog post with details to be published next week on -> https://predica.pl/blog/ if you want to check it out.

        1. Anonymous Coward
          Anonymous Coward

          Re: I hope

          > consultants from vendors (you can imagine which one) managed service teams like ours and external consultant.

          Job well done, Tony. Congratulations. :-)

        2. Stork Bronze badge

          Re: I hope

          Sounds like the attitude has not changed much from when I was Maersk Data (separate, but very close to The Client). Get it sorted, we do paperworks later. And recognise effort.

  3. Anonymous Coward
    Anonymous Coward

    The last person who fixed a malware outbreak...

    ... got thrown under a bus by the UK government and then arrested by the US Feds. I wouldn't touch the repair and removal of malware without a waiver signed and a lawyer present.

    1. Adam 52 Silver badge

      Re: The last person who fixed a malware outbreak...

      "thrown under a bus" as in wasn't told he was going to be arrested. That's not really the same thing and would have been a really big ask of GCHQ.

      1. Sir Runcible Spoon Silver badge

        Re: The last person who fixed a malware outbreak...

        They didn't mind getting him in to consult for them whilst they surely knew about the US intentions.

  4. Anonymous South African Coward Silver badge

    How did they managed it? I would love to hear the side of things from an IT techie... it is stunning... the mind just boggles.

    1. tonyszko

      Lot's of people working on it on multiple fronts. Lots of logistics - for example, you might be surprised how little USB storage pens are on the stock in the shops :).

      One important aspect - no panic! Don't do things if you don't fully understand the scope and what hit you.

      Besides technical details there are lessons from the management side:

      - do not let people doing the work being bothered by people throwing random questions or issues, put some "firewall" to handle it

      - good communication where are you with efforts and what is recovery time is crucial. Dedicated team for it might help you A LOT.

      As I wrote in other replies here we will be covering it soon on our blog from our managed services team perspective who was on-site to help recover out of it for couple of weeks.

      1. This post has been deleted by its author

    2. highdiver_2000

      This is a ransomware, so it is a massive quarantine, wipe, reinstall and safezone effort.

  5. Anonymous Coward
    Anonymous Coward

    Maersk's own experience is that the attack it endured cost it between $250m and $300m.

    Now if they had spent only a fraction of that preventing it.

    1. Lysenko

      Re: Maersk's own experience is that the attack it endured cost it between $250m and $300m.

      I'm sure (!?) that after his "Road to Damascus" moment Mr Snabe also directed that a further $300m be invested in DR facilities and redundant systems while also doubling the systems security budget.

      Back on planet Earth.....

      1. Chrissy

        Re: Maersk's own experience is that the attack it endured cost it between $250m and $300m.

        Probably not.... now the IT department has un-stealthed itself as still having some staff left then the CEO is asking "how come these people are still on the payroll.... why haven't they been outsourced already?"

        1. Anonymous Coward
          Anonymous Coward

          Re: Maersk's own experience is that the attack it endured cost it between $250m and $300m.

          Depends on who was responsible for letting it happen. If the service provider they will lose the contract at renewal. If internal at least they recovered.

      2. Stork Bronze badge

        Re: Maersk's own experience is that the attack it endured cost it between $250m and $300m.

        The company may have changed (I stopped working with the main IT supplier in 2007), but it was then a company that was aware how important IT was to them. Doing any work at their servers at headquarter, even when employed by a subsidiary, meant having one of their guys standing next yo you.

        If they think it makes sense to invest further, they will do it.

        - I hear a comment from an ext. consutant once: "we sat at this meeting, and as we talked hardware costs went up from 50k to 200k in 20 minutes - and the customer (Maersk) didn't blink".

  6. HmmmYes Silver badge

    Youd think, with companies with large PC deployment maybe, just maybe, a light bulb might come on a ping 'Maybe we need to put a bit more diversity into the client OS?'

    I mean FFS. Its a company. Most of the applications can sit beyond a browser now.

    1. Dan 55 Silver badge

      But Office! But Outlook!

      1. Sir Runcible Spoon Silver badge

        Both of which work in a VM.

        Sure, your VM is toast, but off-line backups and the underlying OS are all good yeah?

    2. tonyszko

      But even for browser apps you need service to authenticate people and some common services like e-mail. With 100k+ employees it is not that simple.

  7. Craigie

    20% drop going to manual

    Beancounter: 'How much are we spending on IT?'

    1. Anonymous South African Coward Silver badge

      Re: 20% drop going to manual

      Beancounters are the natural enemy of the BOFH... Death to all things Beancountery! :) :p

    2. A Non e-mouse Silver badge

      Re: 20% drop going to manual

      Beancounter: 'How much are we spending on IT?'

      That's a good question and IT should always be prepared to justify their costs. It's a shame the same can't be said of management.

      1. Boris the Cockroach Silver badge

        Re: 20% drop going to manual

        Quote: That's a good question and IT should always be prepared to justify their costs.

        Horse shit to put it bluntly

        IT is the core system that lets the company function, no IT no company

        Thats as true for the 20 man band I attend upto the mega-corps like Maersk.

        If you are not treating IT as the core section of the business and consider it an incidental expense (like my manager did once*) then you deserve to have your company crash and burn

        This should be drilled into every C level exec with a big mallet.

        *His eureka moment arrived thanks to a dead PC containing 10 years worth of robot/cnc programs ...

        1. cantankerous swineherd Silver badge

          Re: 20% drop going to manual

          IT the core system?

          the one that sends emails that can't be replied to?

          that says we're experiencing extremely high call volumes 24/7?

          that turns employees into robots mindlessly reciting words on a screen?

          the one that ensures your details can never be updated because the system won't let me?

          I'm seeing a huge competitive advantage here, at a modest cost, resulting in a more effective and resilient organisation.

        2. Anonymous Coward
          Anonymous Coward

          Re: 20% drop going to manual

          Quote: That's a good question and IT should always be prepared to justify their costs.

          Quote Response: Horse shit to put it bluntly

          Ker-wrong. They should still justify their costs but as a a core function it ought to be a whole lot easier. I'm pretty sure the story above said Maersk struggled but didn't fall over. Maersk is not IT in the same way that the IT companies with banking licenses are. Maersk moves metal boxes, not ones and zeros.

          1. Stork Bronze badge

            Re: 20% drop going to manual

            You would be suprised how much IT Maersk actually is.

            When I started in the group I thought it was quite simple to move a box from A to B, but when you have a million or so boxes (then) and have to keep track of where they come from, where they are going, what is in them, who does it belong to (and this is "legal title to ownership" sort of serious), can it be loaded under deck, how do you stack it on the ship to make as few moves as possible (without making the ship tip over due to wrong balance) you really need IT.

            This is before implementing rules such as "do not send ALU containers to certain places as the locals find it too easy to get into them".

    3. J__M__M

      Re: 20% drop going to manual

      Beancounter: 'How much are we spending on IT?'

      IT spending is what pays my bills, and that number has me asking the exact same thing.

    4. The Boojum

      Re: 20% drop going to manual

      I suspect that the loss would be an exponential function of time. Another 10 days? Major crisis. 10 more? No company.

      1. David Hicklin

        Re: 20% drop going to manual

        "I suspect that the loss would be an exponential function of time. Another 10 days? Major crisis. 10 more? No company."

        Plus some ships could have been at sea during those 10 days, so agree that the loss rate would ride as time went by.

    5. cantankerous swineherd Silver badge

      Re: 20% drop going to manual

      spending the extra to get rid of the computers, thereby improving the quality of service looks like a good move?

      no more robotic emails or dumb chatbots telling you they're passionate about customer service, you'd get to talk to someone who could actually do something. back to the future.

  8. dubious

    Good job guys.

    Don't get company values such as 'uprightness', and 'humbleness' at many other places.

    I'd left the Big M group years before this, but we were well into a outsourcing and centralisation program, pushing lots jobs to India. Despite it being as unpopular with the users as these things always are, we went full steam ahead on Helldesk and desktop support, but the plans went much further. We were pushed to an expensive US company for HP-UX, Oracle, and Navis for example, which didn't seem particularly smart since Maersk had plenty of domain knowledge and was big enough to support centralised internal teams. Can't say they didn't know their stuff though.

    There was also talk of centralising port infrastructure until the speed, cost, and reliability of Internet/IPLCs in many locations was brought up.

  9. Destroy All Monsters Silver badge

    Hmmm...

    the country's most popular accounting software

    Wasn't it a State requirement? That sure would make it popular.

  10. Tom 7 Silver badge

    All IT installations should be called emergencies

    so the managers can be used to sign cheques and kept out of the decision making process.

    They "just did the work to keep disruptions to a minimum". So no progress meetings or getting the design dept in to argue over colours and fonts!

  11. Anonymous Coward
    Anonymous Coward

    This is where Infrastructure as code comes into play. If you can blow away the entire lot and stand back up an entire fresh set of machines where you have an immutable / declarative way of launching everything you could save a massive amount of time. Ok bare metal is a bit harder and getting up the base hypervisor layer if you're on-premise will take a short while, but after that IAC would save you a fair old chunk of time.

    Terraform ftw, or Heat if you're an OpenStack shop.

    1. Doctor Syntax Silver badge

      "This is where Infrastructure as code comes into play. If you can blow away the entire lot"

      Would that be the entire lot as "including whatever tin the infrastructure as code was running on2?

      1. Peter2 Silver badge

        You know, it sounds horribly bad when you first think of the work requried. But then thinking about it for a few minutes you can see how it could be done relatively quickly for deploying standard builds. It'd be interesting to know how they did it.

        Doing a job of this scale, personally i'd think the fastest way of doing it would be to create a new (clean) desktop image via WDS, rebuild the servers from backups and then firewall everything but WDS and AD for joining PC's to the domain. Download the image to each server, and then send somebody around each site to ensure that every PC ends up reimaged and on the network with the correct network ID.

        It's a big job, but far from an impossible one (as they demonstrated by doing it in ten days, although I suspect that they had a lot of tidying up to do such as installing odd bits of random software on PC's that wasn't in the standard build.

        1. Joe Montana

          Firewall rules

          If you allow rules for AD, then you allow the very ports that most of this malware uses to propagate.

        2. JimC Silver badge

          Standard builds

          AIUI desktops were the least of their problems. Umpteen different systems had to be rebuilt from backups, brought up again and then the data from the emergency processes merged back in. Never underestimate that, its one hell of a job.

        3. cantankerous swineherd Silver badge

          assuming the backups weren't corrupted...

    2. tonyszko

      How it will help you with one things - DATA! Recovering infrastructure could be simple but what if you have lost the data in this infrastructure? Content of it? Fresh build is not giving you anything - you need to recover the previous state.

  12. A Non e-mouse Silver badge

    Reliance on computers

    But he also warned that in the near future, as automation creates near-total reliance on digital systems, human effort won't be able to help such crises.

    That's why when J. Lyons and Son bought their first computer and saw how many people it could replace, they didn't go live with it until a spare was installed ready to take over.

    That was back in the 50s...

    1. Tom 7 Silver badge

      Re: Reliance on computers

      I think they didnt buy their computer - they builded it themselves. I think every IT person should be made to read 'A computer called Leo' so when they are wrestling with the formatting of some arseholes code in a spreadsheet they can ask themselves how the boys at the tea rooms had offices more automated than the 6000 PCs they are trying to manage, and did it on 6000 valves that wouldnt power the keyboard on the PC they're swearing at.

  13. jms222

    Hadn't appreciated Lyons and the LEO as much until our event at the Computing Museum. They were absolutely amazing.

  14. Allonymous Coward
    Thumb Up

    I'd like to have a C-level exec like this guy

    Ours could learn a thing or two about recognition of huge efforts pulling business-critical systems out of the fire.

  15. Anonymous Coward
    Anonymous Coward

    It is not *the* Ukraine!

    In case you need it explained.

  16. SlavickP

    It’s “Ukraine”, not “the Ukraine”.

  17. Walter Bishop Silver badge
    Terminator

    The internet cannot support applications that rely on it?

    "Noting that the internet was not designed to support the applications that now rely on it"

    What total nonsense and pseudo technical sounding waffle. The Internet performs exactly as designed, it transfers packets from one end node to another end node. The problem relies solely on the "computers" connected at either end.

    What would be of interest to your readers is what was the dollar value in costs to Maersk in down time and the expenditure spent on employing people to reinstall all eight thousand plus "systems" and what indemnification did the vendors of the software provide Maersk in the event they were victims of a hacking attack.

    "Snabe plans to ensure Maersk .. turn its experience into a security stance that represents competitive advantage."

    What you need to do is run your "computers" off of 3½ floppy disks with the write-protected shutter in the enabled position.

  18. johnnyblaze

    I assume this means once they take into account the cost of the recovery and the money they lost, Maersk will look to recover some of this by cutting overheads, so IT will be the first to feel the chop. It's great to be an IT 'hero'.

  19. EnviableOne Bronze badge

    Maersk wasn't the only outfit to cop a huge NotPetya bill: pharma giant Merck was also bitten to the tune of $310m, FedEx a similar amount, while WPP and TNT were also hit but didn't detail their costs.

    Hmm isnt TNT a subsiduary of FedEx

    So the FedEx numbers are TNTs numbers

    "2016 - On 25 May, FedEx completed the acquisition of TNT Express." from `https://www.tnt.com/corporate/history

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019