back to article RBS Mainframe Meltdown: A year on, the fallout is still coming

A year ago, RBS experienced its Chernobyl moment – an incident when a case of simple human error by those running critical systems resulted in a crisis. IT staff badly botched routine maintenance of the IBM mainframe handling millions of customers' accounts – a system processing 20 million transactions a day. The mistake was …

COMMENTS

This topic is closed for new posts.

Page:

  1. Annihilator
    Stop

    Outsourcing <> offshoring

    Straight away the article insinuates it's "outsourcing". Yet again, the need to point out that outsourcing doesn't mean offshoring.

    As I recall, RBS employed their batch support staff directly. Regardless of location, this isn't outsourcing.

    1. Velv
      Headmaster

      Re: Outsourcing <> offshoring

      Whatever you call it, getting rid of 7,500 man years of experience is never going to end well.

    2. Mad Mike

      Re: Outsourcing <> offshoring

      From my contacts at RBS, the cause of this issue has nothing to do with the hardware or software. It has everything to do with money saying management decisions. They simply don't seem to understand the following:-

      1. Loosing tens of thousands of man years worth of experience on the job is not a good idea.

      2. Distance within teams causes problems proportionate to the distance. So, Edinburgh to India is a big problem.

      3. When you have different cultures involved, it causes additional difficulties, especially when one is using a second language and (for understandable reasons) don't necessarily understand the nuances etc. if you think of how often misunderstandings occur between people working in the same office, with the same language etc.etc. and then translate this to different cultures, you start to understand the issue.

      4. RBS IT had (especially at the higher levels) a grossly overinflated opinion of itself.

      5. When RBS took over NatWest, it seems to have kept the worst of NatWest culture.

      6. Don't underestimate how much the ex-NatWest staff hated RBS and their staff, partially due to point 4.

      All of the above is never going to result in a situation that ends well. RBS is effectively wasting £450million as they simply aren't dealing with the real issue.

      1. Anonymous Coward
        Thumb Up

        Re: Outsourcing <> offshoring

        Well I for one as a UK tax payer and by implication an indirect share owner applaud RBS senior managements cost saving initiatives be they either outsourcing or offshorig.

        Mind I would also add that this statement is a lot more comfortable as an 'ex customer' of RBS............

        1. Mad Mike

          Re: Outsourcing <> offshoring

          "Well I for one as a UK tax payer and by implication an indirect share owner applaud RBS senior managements cost saving initiatives be they either outsourcing or offshorig.

          Mind I would also add that this statement is a lot more comfortable as an 'ex customer' of RBS............"

          Don't worry, all the major banks and organisations are doing very similar things. Your time will come.........

        2. Anonymous Coward
          Anonymous Coward

          Re: Outsourcing <> offshoring

          @Titus: So as a UK tax payer, are you happy for RBS to pay the redundancy to those highly paid techies, the first to go always being the most skilled, longest serving and most highly paid. Are you happy for them to give up work for a few months while they work out what to do? Are you happy for all those ees and ers NICs to go unpaid and for all that higher rate tax to go uncollected?

  2. Anonymous Coward
    Anonymous Coward

    This wasn't a hardware flaw. This wasn't a mainframe flaw.

    This was a botched CA7 upgrade that caused chaos during the back-out of said upgrade.

    The fact that it was on a Mainframe means that the system was more important, but the mainframe itself performed the actions it was required to flawlessly.

    The actions it was required to perform were wrong, and that was the fault of the people operating the system.

    1. Gordon 10
      Meh

      Re: This wasn't a hardware flaw. This wasn't a mainframe flaw.

      Correct me if Im wrong - but not even IBM mainframes cost £450m its entirely possible that this budget includes new teams to run the stuff.

      (Its also possible that they have just spunked £450m just to IBM - but Im a glass half full type person).

      1. Anonymous Coward
        Anonymous Coward

        Re: This wasn't a hardware flaw. This wasn't a mainframe flaw.

        "but not even IBM mainframes cost £450m"

        Agree, there is no way the mainframes themselves cost anywhere near that amount of money. The cost of the hardware was probably trivial. The cost of rewriting/modernizing millions of lines of COBOL probably is the bulk of these costs.

        1. tom dial Silver badge

          Re: This wasn't a hardware flaw. This wasn't a mainframe flaw.

          It appears it may not have been a programming flaw, but more a series of operations flaws: first, in committing an error upgrading CA-7, then in compounding it during the attempted rollback, with the finishing touches applied when application recovery procedures were not up to snuff, or perhaps there was not enough available capacity to handle the recovery and normal workloads concurrently. All speak to operations and capacity management, none to the applications or the geriatric implementation language. RBS management may wish to shift blame away and empower themselves to spend great bundles of money, but the facts are as they are.

    2. Matt Bryant Silver badge
      Facepalm

      Re: AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

      "....This was a botched CA7 upgrade....." Which neatly ignores the question did the platform's complexity, inherent inflexibility and convoluted design have anything to do with the inability fo the staff to complete what would probably have been a far simpler task on UNIX, Linux or Windows.....?

      I would suggest RBS's money would be better spent spending a million on a redesign using commodity servers and Linux, and it probably come in at a fraction of the cost of the ludicrous mainframe upgrade.

      1. GreyWolf
        FAIL

        Re: AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

        " what would probably have been a far simpler task on UNIX, Linux or Windows.....?"

        You have just revealed how little you understand of what the night batch at a bank actually entails...

        1. Matt Bryant Silver badge
          FAIL

          Re: GreyWolf Re: AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

          "You have just revealed how little you understand of what the night batch at a bank actually entails..." Yeah, 'cos bank systems aren't about shuffling numbers round or anything that other industries already do very well, and in some cases on a far larger scale. It's all magic and unicorn farts, right? Seriously, if you want to pretend that even the largest banks do anything unique then you're probably the muppet that shelled out for the £450m mainframe upgrade.

          1. Intractable Potsherd

            Re: GreyWolf AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

            For the record, I'm the one agreeing with Matt. I therefore want to insert that disclaimer that I don't work for RBS, I am merely a customer that can't see that any of the banks are any different, and so stay where I am and take the free insurances offered through my account.

          2. Skoorb

            Re: GreyWolf AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

            There is a fair point about whether a full rip and replace would have been warranted.

            Whilst there does come a point at which a system becomes so unwieldy the only option is to bin it, RBS do not feel that the time has come to put themselves through that pain.

            As for moving to a more 'standardized' system, Nationwide are having a shot at moving to SAP, and it seems to be sort of on track. Mostly.

            Anybody think that moving to SAP would be a better bet for RBS? It seems more like simply swapping to a different kind of masochism to me...

            1. Anonymous Coward
              Anonymous Coward

              Re: GreyWolf AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

              Germans and masochism? Sounds like a great party! Will there be beer?

      2. tom dial Silver badge

        Re: AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

        Mainframes have, as has Linux (or Windows) complexities. However, nothing about a mainframe environment, from an application development or management viewpoint, is inherently more complex than a single Linux/Unix/Windows server. The hardware has a MTTF measured in decades; the hardware architecture has been gradually extended and polished for about half a century to remove bottlenecks and single failure points. As has the operating system. While great herds of commodity servers surely have a place, it is unlikely that they have a complexity, managability, or reliability advantage over mainframes for large workloads. It is quite possible that they also have no overall cost advantage.

        All that I have read about the RBS failure leads to the conclusion that the cause was management and operator error. As another commenter noted, 450 million GBP probably covers a great deal more than a hardware in-place replacement which would not address the underlying cause anyhow.

        1. Kebabbert

          Re: AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

          "...While great herds of commodity servers surely have a place, it is unlikely that they have a complexity, managability, or reliability advantage over mainframes for large workloads. It is quite possible that they also have no overall cost advantage...."

          Google thinks it is more cost effective and reliable to have herds of commodity servers for their services, than a few large servers. Google has designed their system so it doesnt matter if a server breaks, they just insert a new cheap server. If you have 100 cheap servers that can switch roles anytime, you will have a better uptime than a single mainframe. Distributed beats centralized both in terms of costs and reliability and performance.

          But sometimes, you can not distribute. Some workloads are inherently centralized and can not be parallellized. Google has embarassingly parallell workloads, so they can just use lot of COTS servers.

          For banking, Mainframes are the norm. They can be replaced by a fleet of COTS servers, but no one has done it yet, as I know of. There will be years of R&D. Maybe it is easier to just use a Mainframe.

          Another note. In banking, Mainframes are the norm. They are slow, have extremely slow cpus, but good I/O. Banking is doing lot of account updates like calculate interest rate, etc. Nothing sexy. Old, boring, dusty stuff. More, accountant stuff. It doesnt matter if your account gets updated 0.1 s later or so. Latency is not important. Much work is done in batches, and can wait another hour. Mainframes have good throughput, bad latency.

          In finance, you never use Mainframes. In finance, you typically use Linux/Unix. You are doing real time high performance calculations. HFT. Quant. Math. Algorithmic trading. Linux has low latency, so that is important in some fields. Mainframes are too slow to do finance as they have very weak cpus, much weaker than x86 cpus. For instance, large stock exchanges typically run Linux/Unix. No stock exchange runs on Mainframes, they have too bad latency for that.

          Banking: accountant stuff, boring. Needs to be reliable and just work. No requirements for bleeding edge performance. Simple calculations.

          Finance: math, high performance stuff, complex calculations. Needs to be reliable and high performance. If you are faster than any one else, you get your order filled and you can earn money.

          1. Anonymous Coward
            Anonymous Coward

            Re: AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

            Slow CPU, I suggest some research, google zEC12 processor speed - that's a mainframe BTW.

          2. David Beck

            Re: AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

            Google also doesn't care if you get the "complete" answer to your query. If a section of the data is "down" then you just don't see that in the results. How'd that work with your bank account?

      3. MikeC711

        Re: AC: This wasn't a hardware flaw. This wasn't a mainframe flaw.

        Agreed, put them on a cloud of Windows machines, then the 57 Windows security releases in 1Q2013 could be rolled out to each one individually. Mind you, they could not be rolled out all at once as there was an order and a need to do it in at least 3 separate waves. And also, keep in mind that as you handle the workloads on these 32 and 64 core machines, that you have to pay licensing for your DB and Middleware on all those cores. Still, there are times that this makes sense, but there are countless migration nightmares we can talk about. And the migration consultants (who are eager to assist you for a price) will forget to work out costs for extra hardware for firewalls, dev/test environments, DR environments, ... etc. So as long as RBC can handle a far less reliable environment and a migration that will likely cost at least 3x what the consultants quote you ... it's all good.

  3. Dominion
    Thumb Down

    It ain't broke??

    450 million to replace a system that is still at the mercy of incompetent staff managed by stupid managers? Do the clowns in charge not realise that throwing shiney tin at a problem will not make a blind bit of difference? Outsourcing / offshoring, it's all the same. Getting rid of competent staff and replacing them with incompetent replacements.

    1. Brewster's Angle Grinder Silver badge

      Re: It ain't broke??

      Q: "Do the clowns in charge not realise that throwing shiney tin at a problem will not make a blind bit of difference?"

      A: They're bankers.

      1. Anonymous Coward
        Anonymous Coward

        Re: It ain't broke??

        The Governemnt needs to bring in someone qualified from outside. A familiar face, someone who can be trusted. Terry Wogan.

    2. Richard Wharram

      Re: It ain't broke??

      This wasn't the mainframe's fault certainly. Doesn't quite add up.

      1. Ken Hagan Gold badge

        Re: It ain't broke??

        It adds up perfectly. It's just doing a different sum from the one you wanted.

        By throwing a big pot of cash at the problem, the management appear to be taking "tough decisions" to sort out the problem, thereby justifying their "compensation packages". They don't know how to actually fix it, but this should get the regulators off their backs for a while.

        I expect Bruce Schneier would call it "management theater".

  4. Anonymous Coward
    Anonymous Coward

    If an IT focused business like RBS can fuck up your banking, what chances have other companies got who aren't IT focused. Like a Supermarket?

    1. Dominion

      All large corporations are run by bean counting bureaucrats. IT is just an expensive irritation to them. All they want to do is spend as little as possible, regardless of impact. I include all of the large IT outsourcing companies in that sweeping generalisation as well.

      1. Anonymous Coward 101

        "IT is just an expensive irritation to them. All they want to do is spend as little as possible, regardless of impact."

        That explains why they are spending £450m on a new computer.

        1. Mad Mike
          Facepalm

          "That explains why they are spending £450m on a new computer."

          Both points are true. Most large organisations these days are run by accountants of one form or another. A finer example of blinkered sight is hard to find. They will spend as little as possible on everything. An accountant knows that to improve profits (all they care about), one simple answer is to cut costs. There are always loads of people who claim to be able to do that. So, they go to lowest common denominator at all times.

          However, once the shit has hit the fan, an accountant (or general senior management type) cares about only one thing......saving their own necks. At this point, it's all about blaming someone or something else. Hence, blaming 'the mainframe'. It was actually their stupid decisions, but can't let that get out as it would result in them having to take responsibility for being idiots. So, you blame something else. Then, you come in with the solution which obviously has to be fixing whatever you've blamed. At this point, you look like you weren't responsible and can gain advancement through providing the fix. Cost becomes irrelevant as it's about making yourself look good, which is no longer about saving money. So, £450million for a new mainframe system........chicken feed. Where do I sign!!

          Never underestimate the stupidity of senior executives and other rampantly self-centred people in the desire to advance their own careers.

          P.S.

          My father was an accountant!!

          1. vagabondo
            Meh

            Re: @Mad Mike

            "Most large organisations these days are run by accountants"

            In the case of banks perhaps a few more accountants, and not so many salesmen/gamblers at the top would have saved the rest of us a deal of grief.

            1. Richard Wharram

              Re: @Mad Mike

              Accountants would be no use. I'm sure their figures all added up. What they didn't have was adequate assessment of risk.

            2. Mad Mike

              Re: @Mad Mike

              "In the case of banks perhaps a few more accountants, and not so many salesmen/gamblers at the top would have saved the rest of us a deal of grief."

              The salesmen/gamblers weren't at the top. They were further down, but were employed by the accountants at the top. Problem is, the current problems aren't really the fault of the dealers etc. themselves. Governments have this idea that continuous growth is all that's important. Things simply have to keep going up...GDP for instance. This is why it's all they talk about and recession is the enemy. Problem is, continuous growth isn't actually possible or even desirable in the long run. It encourages people to borrow money that isn't fully backed by real money (cause of the crisis) and then encourages them to do so in circles amongst many banks and government coffers. This leads to the illusion of growth, but in reality, it hasn't. Effectively, you're just inventing money. Loans that are not 100% covered by real money aren't real themselves. Run this cycle for long enough and eventually someone stops paying. Then, it all comes crashing down.

              It is, in effect, a giant pyramid scheme. Problem is, it's generally promoted by governments, not banks. Yes, they jump on board and make money whilst they can, but eventually it has to come down. It pretty much started in Margaret Thatchers era, but the larger part of bank deregulation (and certainly the worst parts) actually occurred during Brown/Darlings tenure. They needed more and more money to try and prove to people life was good and they should stay in power. So, they deregulated more and more, leading to lower and lower levels of money backing loans. Then, the inevitable happened. When they couldn't deregulate any more, they then went down the PFI route. Again, it's effectively inventing money that doesn't exist.

              This whole issue is much wider than the bankers. Yes, they played a significant part, but politicians especially are, if anything, far more culpable.

              1. Anonymous Coward
                Anonymous Coward

                Re: @Mad Mike

                " Governments have this idea that continuous growth is all that's important".

                Yes, our industries (especially financial) must grow faster and make bigger profits than the unscrupulous foreigners. Unfortunately, that involves becoming even more unscrupulous than the foreigners. And before we know it, everyone is engaged in a Gadarene race to the cliff edge.

                It's going to be very hard, because all our government policies and corporate strategies - even our whole view of economics - is predicated on continuous growth. It's a testament to the sheer innumeracy of our leaders... We badly need a scheme for running the world in equilibrium from year to year and century to century, instead of continually pursuing the growth that is bound to annihilate us.

        2. Dominion

          Is 450m enough? I genuinely have no idea how much a replacement would / should cost. If a robust replacement should cost 500m, then cutting corners and spending 450m means they are building down to a price, not up to a spec.

          1. Mad Mike

            The spec and building down to a price are not really relevant here. The problem is they aren't dealing with the actual problem. Therefore, it will reoccur. Doesn't matter how much you spend, it will happen again, because the core issue hasn't been resolved.

        3. Anonymous Coward
          Anonymous Coward

          Well, assuming the Parliament cost about 400M, and the RBS computer prob wouldn't fit, then maybe some kind of giganto-monolith making muffled but deep bleeping noises and lights fashing slowly on and off ought to do it? Perhaps Renzo Piano can do them one, I would suggest on Arthur's Seat or Salisbury Crags? About 1200M ought to do it, for that price you could maybe get titanium skin and some Swarovski, and get bolts for the roof beams that dont require immediate fixing? And maybe train a new set of "priests" for it :P

    2. Anonymous Coward
      Anonymous Coward

      Outsourcing

      It's not about the type of organisation, it's about outsourcing, and especially about outsourcing to what might be called "warm body consultancies" where people are moved around projects (I know some very good consultancies that have domain specialization).

      I worked as an on-site contractor for a company, managing their intranet and after about 6 months, making changes was 2nd nature. You learn where things are and it sticks. Something breaks, you shorten the process of fixing it by hours because you know the stack - the name of the web page, the service it calls, the table that it writes to. You aren't having to find it out each time.

      When I left, it got outsourced because it was cheaper. But... what was happening is that they got new staff every couple of months. So, every change took longer because people had to work out how to make the change. Fixes took longer to resolve because they weren't like "oh, that'll be the overnight update job failing", double-check it, make a fix and kick-off the update.

      That's what none of these pinheads doing outsourcing understand. They see development staff as being like checkout staff or plumbers, where you can pretty much hire one working in one place and move them to another.

      Simple question: if outsourcing is such a great idea, how come Google and Amazon hardly ever do it?

      1. David Neil

        Re: Outsourcing

        Bang on, they also assume that all of those little nuances of the systems can be captured in documentation - they can't be as things will change slight;y from change to change so you end up with a monster list of exceptions which say do x, unless it's y in which case do z unless b is....

        1. Ken Hagan Gold badge

          Re: "monster list of exceptions which..."

          ...is exactly the list that elicited the comment "you actually understand all that stuff?" when a previous commenter waved it at the people who (through their management decisions) actually wrote it.

          It's a bit like politicians and tax law. They've made it so complicated that they don't understand it anymore and when it starts delivering the wrong answers they blame whoever or whatever is following the rules.

      2. Anonymous Coward
        Anonymous Coward

        Re: Outsourcing

        You are absolutely correct about consultancies. And that's why it's relevant that there was no outsourcing in this story (and El Reg should really bother to understand that). They were RBS staff working in a different RBS office.

        They were inexperienced but this issue happens to all companies... no-one stays at a job forever. RBS should have moved some experienced staff to that office to cross-train, maybe for the first year, say. But they (apparently) didn't.

    3. Anonymous Coward
      Anonymous Coward

      Supermarkets not IT focused?

      Well yes and no.

      Tesco has multiple z/os mainframes in a quite nice DR configuration and quite a few big Unix boxes, 590's, 595's, I even think they kept the old 690's. A smaller number of Sun boxes and of course a gaziilion windows machines. Shed loads of cross site replicated SAB storage and at one time managed their own national network (now outsourced to BT) The IT team there was a good one, long service, good and deep business knowledge, solid applications knowledge. It needed that IT function to grow as it did and in the late 80's through to the mid 90's IT was respected. From then until around early 2000's the companies respect for IT slowly drained away.

      Then the main board and bean counters decided that it would be a wonderful thing for the company ($$$$$) to offshore and all that systems and business knowledge was shown the door. Even as the quality of service provided by IT started to dive and the off shoring costs kept rising the main board still kept insisting that it was a "success". One IT director joined, looked over his domain, challenged the main board on their assumptions and was shown the door but at least he had the balls to do so.

      The only reason that the Tesco CA-7 roles were not off-shored was that the compensation they were offering at the time was too cheap even for Bangalore and they couldn't get enough takers.

      The bottom line is unless the IT director is on the main board as an equal then the IT department is always going to end up a ripe target for cost cutting. Sadly many directors seem to be deaf to the implications of ravaging their IT departments, could be a genetic thing....

  5. Anonymous Custard
    Headmaster

    Apps?

    Since when did mainframes run apps rather than programs? Makes it sound like a big smartphone running the show...

    1. Jemma

      Re: Apps?

      iOS/360 perhaps?

      This article has me a little lost.. why, if the problem was some khat stunned PFY who didn't know what he was doing is the cause of a problem do you rip out the hardware? To replace it with more of the same no less.

      I could understand doing a burn and rebuild of the software as it sounds like its needed.. but hardware doesn't make sense..

      More to the immediate point; how stupid are RBS going to look if they start the shiney and another Nasreen the Nerk frags it within the week?

      Methinks a point has been missed here on a truly monumental scale. If the problem is user error you fix the user (now where's that quicklime..), you don't put in a faster computer so they can trash the accounts that much faster.

      1. John Smith 19 Gold badge
        Happy

        Re: Apps?

        " ...khat stunned PFY..." "...Nasreen the Nerk..."

        You aren't by any chance based somewhere around Edinburgh?

        "khat stunned PFY"

        I'll be remembering that one.

    2. Corinne

      Re: Apps?

      Apps is short for application software, i.e. the software that does the work of the business as opposed to the system software such as the OS. So they aren't running apps rather than programs, they are just different terms for the same thing.

      Apps has become the trendy term for software that does stuff the user wants, and tends to be used more often for modern small computing devices because that's what most non-IT people see and use.

    3. This post has been deleted by its author

  6. Anonymous Coward
    Anonymous Coward

    Would they have if still private?

    I wonder, if they were still a private company would they have spent this kind of money?

    As theyre currently backed by the public, and theres zero expectation of profit theres really no risk in splashing money around on big ticket projects.

    Once the dust has settled and its 'back' to running correctly they can then go ahead with returing the bank to private hands.. After they've used the tax payer to fund all the improvements.

    Even 'better' if they leave the old mainframe in the proposed "bad bank" that stays with public while spinning off the new and improved profitable bank back to the private sector.

    Or am I just getting too cynical in my middle age?

  7. Anonymous Coward
    Anonymous Coward

    Re: Would they have if still private?

    You misunderstand how this works. If you're in government, you make sure you pump public money into something BEFORE you hand it over to your mates in the city. It would be unthinkable for those buying the bank at some heavily discounted price have to buy their own replacement mainframe. What'll be in it for them apart from a knighthood and a seat in the Lords?

    1. Anonymous Coward
      Anonymous Coward

      Re: Would they have if still private?

      You misunderstand how this works. If you're in government, you make sure you pump public money into something BEFORE you hand it over to your mates in the city. It would be unthinkable for those buying the bank at some heavily discounted price have to buy their own replacement mainframe.

      Not to mention that spending megabucks with a supplier might just end up with a nice little retirement number as a board director.

Page:

This topic is closed for new posts.

Other stories you might like