back to article IBM melts down fixing Meltdown as processes and patches stutter

IBM has scrambled to fix the Meltdown and Spectre bugs, but has struggled to develop processes, reporting tools or reliable patches to get the job done for itself or its clients. Internal documents seen by The Register reveal that Big Blue has ordered staff not to attempt any Meltdown/Spectre patches, but that the advice to do …

  1. matjaggard

    Also...

    Dinosaur struggles to rent apartment.

    100 year old couple fail to conceive.

  2. Phil Kingston

    "Big Blue’s remaining employees..."

    Ouch

    1. MyffyW Silver badge

      ... not so long ago they'd have been doing it in Lotus 1-2-3

      1. GruntyMcPugh Silver badge

        1-2-3,...

        ... you're not wrong. For quite a while during my time at IBM, we had no central management tool, our license for BladeLogic expired, and there was nothing to replace it, ... so we ran MBSA scans on each server, imported the results into Lotus 1-2-3, then created scripts for each server based on that report, and then ran those scripts manually to patch. Then of course ran MBSA again to prove the server had been patched. We patched manually on each server for about two years, before we finally got IEM fully deployed.

  3. m0rt

    IBM has no future other than a similar fate to the Atari branding. But the way they are going even that branding is going to be toxic.

    I reckon Oracle will buy them. Larry has a fondness for big Logos so he can add it to his collection. Like a Lepidopterist.

    1. poohbear
      Joke

      "Like a Lepidopterist."

      Don't you mean Laridopterist?

  4. Anonymous Coward
    Anonymous Coward

    Incapacitated By Meltdown

    The new IBM acronym

    1. Anonymous Coward
      Anonymous Coward

      Re: Incapacitated By Meltdown

      Back in the 80's I came across a list of 100 translations of the IBM acronym. 30 years later one of them is still my all-times favorite:

      It's Better Manually

      1. Paul Woodhouse

        Re: Incapacitated By Meltdown

        my favourite was always:

        I Blame Microsoft

        1. Anonymous Coward
          Anonymous Coward

          Re: Incapacitated By Meltdown

          I've Been Misled

    2. Anonymous Coward
      Anonymous Coward

      Re: Incapacitated By Meltdown

      Inshallah Bokrah Malesh

      God Willing - Tomorrow - Nevermind

    3. oldcoder

      Re: Incapacitated By Meltdown

      Itty Bitty Machines

      was the one I liked.

    4. Peter Gathercole Silver badge

      Re: Incapacitated By Meltdown

      I'm not sure if I was the first person to use "It's Broken, Mate", but I don't think I saw it used before I used it.

      1. Anonymous Coward
        Anonymous Coward

        Re: Incapacitated By Meltdown

        Donkey's years ago I taught a Unix internals class to a groups of IBM lab engineers, they assured me that IBM stood for "I've Been Moved".

  5. Destroy All Monsters Silver badge
    Windows

    No Shutterstock Photo with obviously posing Stepford People?

    Maybe a short clip from Apocalypse Now - the one where the guy on LSD is, like, totally admiring the exploding fireworks over the doomed river bridge - would be appropriate?

  6. Anonymous Coward
    Anonymous Coward

    Electric Avenue

    Lots of upper management were still on leave so the middle management drones who just follow orders didn't know what to do. The few technical managers on deck get bogged down in continual meetings briefing higher ups, explaining things, walking through technical challenges etc. instead of being able to provide direction down the chain of command. Or their instructions get reparsed through management groupthink and become delayed as more upper management want to stamp their approval. I can understand that they want a consistent apprach and communication but there was a project managing this before the issue became public knowledge and we thought they would have already worked out what communications, direcations and actions needed to be communicated.

    Meanwhile delivery executives hassle technical teams to patch resulting in some offshore resource finding the patch doesn't install, sees a reference to a registry key to make it work and then applies the patch and bluescreens multiple boxes or breaks the antivirus.

    In lack of clear directive of what to actually say to the customers who approach me, I just explain the issue in simple terms, that there are many moving parts, brief overview of what we'll need to do and that we are confirming that we have all the info so we can execute a plan to remediate.

    Locally, my team know what to do and have been prepping in the background (system inventories, firmware levels, AV status, patch readiness etc) so that we'll be ready when we get the go ahead (still awaiting customer agreement/signoff on the process as we proposed to separate the Meltdown/Spectre patch deployment from the normal monthly patch deployment.) Meanwhile, the offshore team who normally do all the patching are sitting there waiting to be told what to do through the normal chain of command.

    It's like they've been listening to the Eddy Grant song;

    We gonna rock down to Electric Avenue

    And then we'll take it higher

    And are now waiting for further instructions...

    1. Chairman of the Bored
      Pint

      Re: Electric Avenue

      Ouch. Sounds like a perfect s##t storm. Drink one of these, repeat as needed until the sting wears off

    2. InNY
      Paris Hilton

      Re: Electric Avenue

      "...provide direction down the chain of command."? I've heard of it, but does anyone know what it actually is?

  7. Anonymous Coward
    Anonymous Coward

    And folks, that's what happens if you get rid of your experienced staff...

    Sure, firing your experienced staff and relying on documentation so that the plebs and n00bs can continue working (and rake in money) will only work for a while (and look great with all things beancountery) - until an IBM (incapacitated by meltdown) event happens.

    Then, when you need more experienced staff, tough luck finding some.

    1. Anonymous Coward
      Anonymous Coward

      Re: get rid of your experienced staff...

      Nah, recruiting ten more interns on a four-weeks contract will do the job. Or a hundred... or a hundred thousands. One of them might be bright enough to solve the problem, and not mature enough to ask decent package for it. Or... might he not?

      1. Adrian 4

        Re: get rid of your experienced staff...

        Adding manpower to a late software project makes it later.

        -- Brooks, 'The Mythical Man-Month'

        1. MrBanana

          Re: get rid of your experienced staff...

          But it's the IBM way. A project manager with the problem: "If it takes 9 months for a woman to conceive and then give birth to a baby" then it stands to reason that all you need to do is contract this out to three women and get the job done in 1/3rd the time. Simple Maths.

          1. Anonymous Coward
            Anonymous Coward

            Re: get rid of your experienced staff...

            MrBanana:

            Talk about Project Management timelines - 2000-2006: I was subcontracted to IBM for a total revamp/facelift of ibm.com. Development and Project planners laid out a comprehensive plan and reported to "Management" that it would take 12 months from start to "go live." This was a streamlined plan with minimal "fudge factor" built in. "Management" (a person who's last name was Watson-otherwise I seriously doubt she would have lasted long) listened to the planners, then decided that "6 months is plenty of time. Deal with it." Can't begin to tell you how many times I've seen this happen at IBM. BTW, it took 12 months to complete.

            AC for obvious reasons: I'm currently a sub on another IBM contract. Yep, got sucked back into the Big Blue bilge water.

            1. Anonymous Coward
              Anonymous Coward

              Re: get rid of your experienced staff...

              "Management" listened to the planners, then decided that "6 months is plenty of time. Deal with it." Can't begin to tell you how many times I've seen this happen at IBM. BTW, it took 12 months to complete.

              Unless the particular Mangler was savvy to IBM fudge factors, and simply shaved off the numbers. If she had left it at 12 months it would have taken 18 or24 months (a longer project time would lead upper manglement to think they could "resource" more people).

  8. Anonymous Coward
    Anonymous Coward

    meh

    What do you expect when you offshore pretty much all your technical people.

    1. Anonymous Coward
      Anonymous Coward

      Re: meh

      What do you expect when you offshore pretty much all your technical people.

      And then piss off the remaining staff by making them try to manage them.

    2. Bucky 2

      Re: meh

      You're in trouble already if you're offshoring in the first place. It's like Lauren Bacall selling off all the furniture in "How to Marry a Millionaire."

  9. Blotto Silver badge

    Deep Think

    probably better for IBM to put out a proper measured response rather than rushing to deploy something that isn't properly tested and breaks customer stuff. There is loads of mitigation that can be done to provide some assurance in the short term that systems are not currently being compromised and will not be compromised.

    Doesn't stop teams learning and working stuff out in case a rapid response is needed.

    1. Gordon 10
      Thumb Down

      Re: Deep Think

      When MS and AWS had rolling patch updates going since last Wed, it looks pretty piss poor to me. If you take the entire AWS product range IBM don't even have the defense of scale and complexity,

  10. Anonymous Coward
    Anonymous Coward

    What do you mean - has IBM's Watson AI not got all this covered?

    I'm shocked and surprised that the hype-bullshit is not useful in real world problems where thinking and technical skills are required.

  11. Santa from Exeter

    Potentially worrying

    Quote "The documents also say some Red Hat Enterprise Linux servers aren’t rebooting after patching" whilst we haven't seen this happen with anything we've patched so far this could be concerning if there's any real evidence of this. Anyone experienced this or IBM just blowing smoke?

    1. Anonymous Coward
      Anonymous Coward

      Re: Potentially worrying

      Its true, I upgraded asoftlayer VM today and rebooted or rather didn't, had to boot in rescue mode and change the kernel to the previous one. There is a note tucked away on the general announcement saying not to implement the upgrade on redhat Enterprise 6 whilst they investigate.

      I also experienced it with another cloud provider so they are not the only ones.

      That said their console access is a java applet which doesn't appear to work under windows 10, nor does pptp that is required to connect in the first place (windows 10 issue). You can use ssl but that only works on IE. After a number of responses from support they sent me (incorrect) instructions for another SSL client but no response on yet as to how to get round the java console issue.

      So wasted a couple of hours trying to gain console access before booting in rescue mode and using ssh.

    2. AdamWill

      Re: Potentially worrying

      I work for RH, but not on this stuff exactly (I work on Fedora, which we haven't deployed any Spectre fixes to yet). I've poked some people internally about this story to see if we want to come up with a response or something.

      1. AdamWill

        Re: Potentially worrying

        Just in case anyone was waiting for an update: seems that so long as this is just second-hand news about internal IBM documents, we don't want to comment on it. Of course, if you're a paying RH customer and you're concerned about the consequences of updating any of your systems, please contact the relevant support folks and they'll certainly be able to help you with it.

    3. Anonymous Coward
      Anonymous Coward

      Re: Potentially worrying

      Anyone experienced this or IBM just blowing smoke?

      IBM is *always* blowing smoke. It's the wheels of a no-longer-well-oiled machine grinding to a halt.

  12. Franken Farter

    I hear IBM has run out of toilet paper

    1. This post has been deleted by its author

    2. Anonymous Coward
      Anonymous Coward

      ### SPECIAL ADVISORY ###

      It our understanding that this is library versioning issue in our middleware. We have deployed our finest team of devs to reach a resolution that will work for all our customer of this software and expect a speedy resolution by 2023, which will give our team the opportunity to develop the appropriate skills while finishing college.

  13. Alistair
    Windows

    Damn I'm glad I left when I did. I know I wouldn't want to be on that team right now.

    We've 8 RH7.3 systems - testing cluster for the patches -

    1) no issues with the RH patches - these are v4 intel dual sockets.

    2) minor issue with network driver, (10G/b Ethernet) fixed with an additional patch to the driver from HP.

    *there is* a performance hit on high IO processes and well, hadoop is massive IO. Still trying to get a consistent number on the real affect here.

  14. cschneid

    IBMers are therefore being urged to ensure client systems are thoroughly backed up before attempting patches, and even then to do so only after rigorous testing and securing users’ signoff of patching programs.

    Backup and rigorous testing isn't SOP?

    1. Anonymous Coward
      Anonymous Coward

      If the customer is large enough there might be a one team managing backups, one looking after day to day support and another working on patching (all offshore). Patching team doesn't necessarily check that the last set of backups were successful before their change. Even smaller accounts might be using shared infrastructure with a separate team looking after backups for the multiple customers.

      Normally a support team would do a manual backup where the nightly backup had failed (if it had been communicated by the team monitoring backups.) However, some customers won't allow backups run during the day to avoid potential impact to normal day to day operations. A few days of problematic backups, no manual backups (communicated to account team and customer) and then something goes wrong in a change or a system fails. I've been pulled into more incidents than I care to remember where an ongoing backup issue had not been communicated properly, an update went pear shaped and the backips were missing data.

      Testing is often done by the customer application teams - IBM would do an Operating System Post Implementation Validation (PIV) while the customer is resposible for application PIV. Customer pre-production (DEV, SVP, UAT) PIV often isn't as rigorous as that as production instances - Having worked on changes in all environments myself I've had the customer report back all okay after 10 minutes in a test environment, production PIV took 2 hours. I've also seen a lot of incidents where the customer has signed off a PIV in pre-prod and then a week later lodge a complaint about a change 'breaking' their application (because they hadn't tested rigorously.)

      As the technical staff deal directly with the application teams when organising the changes, the communication would be re-iterating the process (mostly for management types who would start asking about backups etc as they aren't familiar with the day to day operations.) The other issue is that communications in some customers can be just as bad as any other organisation and while the comms goes out from account teams to their contacts in the organisation it sometimes doesn't get passed down. We've had application teams push back on updates being deployed as they are about to perform a release even though their own CIO has declared the update as a CIO override for deployment of the patch.

      1. GruntyMcPugh Silver badge

        checking backups,....

        ... when I worked in security and compliance, we checked backups had run before we deployed patches. Generally we'd check the last week in the eventlog for any errors, check the backups had been working, and record all running services (yes, I scripted that), and then after we patched I ran a post script that made sure all the services that were running, we still running, and checked for errors in the eventlog again. We got IEM as a patching tool, and supposedly someone was going to build our checks into IEM pre-reqs before the scheduled task would run, but I left before that happened.

        Meanwhile, Service Management were also supposed to alert us if backups failed because they got a daily report on that. Not once was I ever notified of a backup failure by Service Management though, they knew we checked, so they never bothered.

  15. GnuTzu

    Red Hat on AIX virtualization

    Are the AIX people happy, or did I miss something.

    1. Anonymous Coward
      Anonymous Coward

      Re: Red Hat on AIX virtualization

      Nobody is happy. Some are less unhappy than others.

    2. Anonymous Coward
      Anonymous Coward

      Re: Red Hat on AIX virtualization

      AIX is POWER only (the short-lived AIX-5L port to Itanium has long since disappeared, and AIX/PS2 is a dead product).

      AFAIK, nobody has demonstrated that AIX is vulnerable to MELTDOWN (indeed, it relies on virtually the whole kernel memory space being mapped into the address space of user-processes, although it is protected by memory access controls, and AIX does not have that mapping).

      I'm guessing, but I think that the Power Linux distributions are removing the mapping of the kernel memory from the user process address space because this is actually a very sensible precaution (Linus has a lot to answer for, UNIX systems on other architectures like PDP-11, VAX, s370 et. al. never mapped kernel memory into user process's address space, so him doing it for Linux was rather short sighted - although early x86 processors were a bit deficient on the MMU front).

      SPECTRE is a different beast. I would not be surprised to see some elements of SPECTRE affecting Power processors.

      1. Michael Wojcik Silver badge

        Re: Red Hat on AIX virtualization

        the short-lived AIX-5L port to Itanium has long since disappeared, and AIX/PS2 is a dead product

        But let us not forget the also-gone original AIX for the RT PC, AIX/370, and AIX/ESA.

        Anyway: Meltdown (it's not an acronym, so there's no reason to write it in block caps) only applies to CPU+OS combinations where pages with different read permissions are mapped into a single address space, and speculative execution ignores those read permissions. Currently, that's only x86 and one ARM core family (which isn't in production yet).

        The much larger Spectre (also not an acronym) family of attacks are generally possible, in some form, on any CPU that provides speculative execution and any side channel that permits indirect analysis of load contents. The attacks in the Spectre paper use cache timing, but the paper notes some of the other possible side channels.

        Spectre has been confirmed against x86, AMD, ARM, POWER, z, some nVidia GPUs, and by now probably other processor architectures. Because spec-ex is well-known and long-established for high-performance general-purpose processors, Spectre attacks are widely applicable.

        1. Anonymous Coward
          Anonymous Coward

          Re: Red Hat on AIX virtualization @Michael Wojcik

          I'm interested in the references to any PoC on Power processors that you may have.

          I can find a zdnet article that claims vulnerability, which quotes the IBM PSIRT blog item that is hugely non-specific, and does not mention Meltdown or Spectre by name or CVE number. The original Google Project Zero write-up dos not list Power as being one of the processors it discovered had issues.

          Because of the specific mechanism, Meltdown uses, until I see someone claiming they have a PoC. this bug on Power will remain in the not-proved category as far as I am concerned.

          The write-up for Spectre, however, lists a range of techniques, and lists in passing things like power monitoring, branch prediction table poisoning, and instruction timing exploits, some of which can be made more effective by exploiting speculative execution.

          I know that this may be a complacent view, but I believe that IBM's line on Power is that there is a possibility that one of these various techniques detailed in Spectre may well work on Power, and that not issuing a statement or patches would be more damaging to the reputation of IBM and it's Power line than issuing fixes that do something (like remove the kernel address space mapping from user processed), which remove one of the identified issues that cause problems on other processors).

          I've seen no indication that anybody has actually come up with a viable method of removing significant amounts of risk of the Spectre vulnerabilities, other than those which serialize instruction execution, effectively disabling speculative execution. These normally involve code or compiler changes, and this will not make any difference if some malware not compiled with these techniques is executed on the system, i.e. it's not a complete solution.

          Couple that with the referenced Return Oriented Programming, using existing sequences of bytes in a process that may not actually be code, but which happen to represent valid instruction sequences are identified, and then executed using buffer overflow techniques to jump to these locations, and you have attacks that are extremely difficult to mitigate.

          So if you have found any references to any real Power PoC, I would be very interested in reading them.

  16. Agamemnon
    Mushroom

    "but that the advice to do nothing is incorrect and needs to be changed: ...

    NO! You are FUCKING WRONG.

    WAIT. Sit the hell still and wait for a WORKING remedy. THis rushing in with ill-concieved, half-cocked bullshit is bricking shit.

    Patching for the sake of patching is the same idiot game we played with, well SSL seems the latest cluster-fuck apropos.

    Wait, for a usable solution (if you're not Intel) and Then Roll it.

    There is a Huge duplication of effort here and it's going to step on other initiave's toes.

    I, for one, do not have time to explain to My Powers That Be that I need new motherboards OR a Surface Mount soldering station and a metric shitload of new CPUs and BIOSs.

    Chill out, sit down, and just wait a sec. Bejeabus.

    1. Jamie Jones Silver badge

      Chill out, sit down, and just wait a sec. Bejeabus.

      Blimey! Take some of your own advice dude!

  17. EveryTime

    It's likely true that there isn't a crisis. These security holes aren't trivial to exploit, yet.

    But this should never have been an issue. Vendors had months to work out a response and fixes. IBM is exactly the company that should have process and procedures ready to announce and deploy.

    It really does appear that IBM laid off the experienced people that would have been able to understand the implications of these holes, and how best to deploy the fixes. Instead they have a bunch of people in India that can apply a patch and reboot systems, but don't understand the whole-system implication of code being able to look beyond the address space protection.

    1. Michael Wojcik Silver badge

      These security holes aren't trivial to exploit, yet.

      Spectre is easy to exploit in drive-by Javascript. There's sample code in the original Spectre paper, and the countermeasures introduced by browser vendors are laughable.1 I'm not sure how much closer to "trivial" you want.

      1They've basically disabled the two high-precision timing mechanisms used in the paper. There are several others available to scripts, and they're well-documented. One of the key papers describing them was linked to from the comments on another Reg article recently.

  18. elip

    Actually that's better than what we're doing

    I work for a large cloud/SaaS provider; you guys often do write-ups about our terrible licensing terms. We haven't even begun to plan the remediation.

    1. Anonymous Coward
      Anonymous Coward

      Re: Actually that's better than what we're doing

      > We haven't even begun to plan the remediation.

      Ignore it, and let the marketing/sales guys figure out how to spin that as a unique benefit?

      1. Anonymous Coward
        Anonymous Coward

        Re: Actually that's better than what we're doing

        > > We haven't even begun to plan the remediation.

        > Ignore it, and let the marketing/sales guys figure out how to spin that as a unique benefit?

        Just tell the customers that they have now automatically been migrated onto the "performance plan"[1] versus the "standard plan"[2].

        [1] Comes with small security considerations. Sign here.

        [2] Reduced performance, not yet available.

    2. Anonymous Coward
      Anonymous Coward

      Re: Actually that's better than what we're doing

      We haven't even begun to plan the remediation.

      Is that because Intel didn't even tell you about the problem until just before Christmas?

  19. Anonymous Coward
    Anonymous Coward

    Excel?

    When I were an IBMer we were told not to use Office, to eat our own dog food with Lotus Symphony, which last I heard had been rolled into OpenOffice?

    1. Anonymous Coward
      Anonymous Coward

      Re: Excel?

      You're going back a few years then. IBM gave up on Symphony back before 2010.

      1. GruntyMcPugh Silver badge

        Re: Excel?

        ... they might have given up developing Symphony ~2010, but I was still using it in 2014. But then I had a Linux laptop, so had little choice.

        I understand Apple hardware is sneaking onto desks now.

  20. dave 81

    Azure

    They patched Azure, and absolutely stuffed many of our customers installs to the point we have had to wipe and reinstall from scratch.

  21. Anonymous Coward
    Happy

    And there I was thinking that meltdown is what would happen if you didn't patch.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like