back to article RBS Mainframe Meltdown: A year on, the fallout is still coming

A year ago, RBS experienced its Chernobyl moment – an incident when a case of simple human error by those running critical systems resulted in a crisis. IT staff badly botched routine maintenance of the IBM mainframe handling millions of customers' accounts – a system processing 20 million transactions a day. The mistake was …

COMMENTS

This topic is closed for new posts.
  1. This post has been deleted by its author

  2. Anonymous Coward
    Anonymous Coward

    "The exact reason for the problem have been found and poured over in a tedious level of detail, then process will have been put in place to prevent the set of circumstances repeating."

    There was story many years ago about a new CEO of M&S. He examined their thick manuals of "process" - and after a few weeks replaced them with a much slimmer one.

    He said that the previous manuals had so many processes - added with specific hindsight - that no one would be able to absorb them. Therefore they were only useful for apportioning blame after the event. His new manual dealt with principles that could easily be understood.

    1. Terry 6 Silver badge
      FAIL

      A bit like education.

      There's a policy document or manual for every little thing, and every time something's gone pear shaped they bung in a new one. But no one can actually read all of it, let alone absorb it. And that's assuming the things are actually workable in the first place.

      (BTW, "Apps?" Please don't say my bank account is being run on software from Google Play Store. Still that would explain...........)

  3. Brewster's Angle Grinder Silver badge

    "Unlike, say, the nuclear or airline industries, where accidents have led to investigations that have produced operation and safety standards..."

    And yet you're more likely to be affected by a banking crash than a plane crash or a nuclear meltdown. Admittedly, it's not quite as fatal; but several days without money is tough.

    1. Ken Hagan Gold badge

      "several days without money is tough"

      If you're counting since 2008, it's quite a bit longer than that.

  4. Anonymous Custard
    Headmaster

    The Edinburgh mainframe was so old that parts of its code had been written in Assembler, a language rooted in the immediate post-war years, with dates going back to 1970.

    Umm, didn't the war end in the mid 1940's?

    And we haven't fought with Scotland itself since the mid 1500's, which is a little early even for Assember.

    1. Anonymous Coward
      Anonymous Coward

      Assembler/assembly

      Of course, the suggestion that assembler is old-fashioned reveals immense ignorance. All computers intended to be programmed have assembly languages, which are simply symbolic forms of machine code made easy for trained human beings to program in. True, assembler was used more frequently in the early days of computing - simply because there weren't any compilers or 4GLs yet.

      1. Anonymous Coward
        Anonymous Coward

        Re: Assembler/assembly

        Indeed assembler still have it's uses. I wrote some small modules to perform functions that are awkward and/or inefficent to do in Cobol.

        Some examples:

        1) Making 32-bit hashes of records and/or files.

        2) ASCII to EBCDIC conversion where the record has binary numbers (can't convert those parts)

        3) pah and other stuff I forget now.

        Also did some simple VSAM access modules for TSO/ISPF panels (CLIST and REXX can't read VSAM files unless you got REXXTOOLS). Plus it makes the panels more responsive with the efficiency of the asm programs.

    2. Rabster
      FAIL

      so old that parts of its code had been written in Assembler

      Wow - spectacular ignorance. A lot of user exits can only be written in Assembler. I'm primarily a java coder now developing web-based tools on WAS/unix but I spent May rewriting the TWS logon submission exit and a Rexx functional extension package to allow rexx execs to talk to MQ.

      Tied in with the "apps" quote well........

      1. Brewster's Angle Grinder Silver badge
        Facepalm

        Re: so old that parts of its code had been written in Assembler @Rabster

        "A lot of user exits can only be written in Assembler."

        I think we've found the problem. I don't know what a "user exit" is (perhaps mov ax,4c00h; int 21h?) , but a system that requires you to write bits in assembler is broken. And I speak as someone who was once paid to write apps in assembly. That's just no way to run a bank.

    3. James O'Shea
      Devil

      "And we haven't fought with Scotland itself since the mid 1500's, which is a little early even for Assember."

      <cough> Culloden Moor </cough>

      <https://en.wikipedia.org/wiki/Battle_of_Culloden> Note the date.

      Still a trifle old for assembler, though I may have once tried to use an old GA-18 which may have been designed close to that time.

      <exit, stage right, to the tune of 'Charlie, he's my darling'. and 'Scotland the brave but stupid for following a Stuart'.>

      1. Anonymous Coward
        Anonymous Coward

        Regardez-ci, c'est moi, votre vieil ami, Charly! Aidez-moi, Scottie-garcons! Aieee !

        Qu'est-ce que c'est, "Lu-Ther-Ann", c'est un vrai mot?

    4. Anonymous Coward
      Anonymous Coward

      We should check with the Moderator of the General Assembler of The Church of Cobol :P

  5. Anonymous Coward
    Anonymous Coward

    Oh dear...

    ...oh dear. How many times have I witnessed this panic response in my life as a service management consultant. RBS - think, please think. What happened was either a process failure or the staff the were too lazy, or too stupid to follow the process correctly. Throwing money at the problem by buying new tin is likely to make the problem worse not better. Standby for another disaster before too long.

  6. JayBizzle

    The articale talks about a LTSB outage based on a change to old hardware... well they moved to a brand spanking new core banking system as part of integration of HBOS, this apperas to have gone very smoothly (from a customer perspective).

    Im with the others who are making comments in that user and process error are main root cause faults not the kit itself... but what ever keeps the top brass happy, no one in IT is really going to say no to more investment and new shiny stuff.

  7. sugerbear

    Porting Apps? Downtime.. Eh

    [quote]RBS faces a Herculean job in bringing online a new mainframe operating in a core part of its day-to-day business. It must plan and execute the job without interrupting the existing service by taking the old mainframe offline during the transition.

    RBS did not say when it plans to bring the new mainframe online.

    But hardware is only one thing: RBS must also determine what do with the existing apps running on the system. Either it must port existing apps to the new system - which is likely - or write or buy new apps. If the former, RBS must design, write, test and then shift. If the latter, RBS must make sure the new apps work on the new mainframe and interoperate with other RBS’s other, connected systems.[/quote]

    Spoken by someone that knows nothing about mainframes. Are you a consultant by any chance ?

    RBS are just buying a bigger mainframe and then plugging it into their existing parallel sysplex system. Whoop de doo. All that will get them is the chance to use the newest version of the bits of mainframe software and things will run a bit faster.

    No porting of apps, no downtime required. That is what makes the mainframe such a great environement to develop and run. Something that lesser mortals dont get.

    1. Richard Wharram

      Re: Porting Apps? Downtime.. Eh

      Indeed. Updating zSeries and z/OS was not a minor task but a lot less effort than desktop and server upgrade programmes in my experience :)

    2. John Smith 19 Gold badge
      Meh

      Re: Porting Apps? Downtime.. Eh

      "No porting of apps, no downtime required. That is what makes the mainframe such a great environement to develop and run. Something that lesser mortals dont get."

      True.

      But is not £450m somewhat expensive for a hardware upgrade alone?

      1. tom dial Silver badge

        Re: Porting Apps? Downtime.. Eh

        It is more than a bit expensive and I, for one, do not believe it without serious proofs offered.

    3. Geoff Lamb

      Re: Porting Apps? Downtime.. Eh

      YES +1

      A mainframe is one of the EASIEST systems to upgrade. Most banks (if they pay the IBM tax) run on pretty much the latest kit while still running decades old code. All completely virtualised - decades before VMware came along.

      Oh, and the idiot who thinks you can swap a banking system with a google cloud - no one cares if your google search returns the answer that is not quite right. See how many customers you have if you do that with a couple of accounts !

  8. Anonymous Coward
    Anonymous Coward

    Surprise meme!

    The press of middle England weighed in, too, pillorying the bank’s already unpopular chairman as he a grovelling apology to MPs for the whole episode.

    Surely you mean "as he accidentally a grovelling apology"? ;-)

  9. Oliver 7

    Is the editor off this week?

    1. James O'Shea

      Oh, the editor's been off for a while.

      It was the smell which first alerted the rest of the staff.

  10. ukFletch

    ...and the IT problems still haven't been fixed...

    FCA are investigating... blah blah blah... Replacement Mainframe... refinaciate IBM'icus... DR Plan... of course there is one... somewhere, try looking under that lads mag from 2005, the one with Jordan on the front, a classic that is...

    The only thing I really care about, despite working in the IT industry for getting close to 20 years, is why since 1 year after the SHTF at RBS, they still can't update my balance properly? I'm assuming that the roll back they must have done, recovered them to a restore point from 1983, say around April the 1st?

    How the hell they can on the 1 hand roll out a new "helpful" system to give you cash when you've lost your card while on the slash, yet can't tell you if you have any money left to spend on the monthly shop when your stone cold sober, beggars belief. Or am I living in cuckoo land?

    1. Anonymous Coward
      Anonymous Coward

      Re: ...and the IT problems still haven't been fixed...

      Err... I think you're living in cuckoo land. I never had any problem being told how much money is in my account for immediate access. Unless, perhaps you've got lots of non-faster payments going through the account or a very high volume of card transactions?

      1. ukFletch

        Re: ...and the IT problems still haven't been fixed...

        Then I and a number of others I know, suffering the same problems are all proud residents of CuckooLand. Which I am assuming that Jamie and His Magic Torch (along with Wordsworth the dog) will be visiting soon then.

        1. Richard Wharram

          Re: ...and the IT problems still haven't been fixed...

          No two nights are the same.

  11. Anonymous Coward
    IT Angle

    Lessons learned from the meltdown?

    "it's quite possible that what happened at RBS might be replicated elsewhere as old and overloaded mainframes like the one at RBS hold millions of accounts at other banks who’ve also sent their IT jobs overseas"

    What happened at RBS wasn't caused by 'old and overloaded mainframes` but by errors introduced into the overnight CA-7 batch processing job made by a single operator. Subsequent measures to roll back the error made matters worse as they rolled it back to the beginning of the financial quarter and not just the previous nights transactions. Took ages to manually restore all the accounts.

    "The exact reason for the problem have been found and poured over in a tedious level of detail, then process will have been put in place to prevent the set of circumstances repeating."

    So, who was it that fired the most experienced staff, then handed responsibility for overnight batch processing over to a single individual with only four months experience with the system.

    1. Rabster

      Re: Lessons learned from the meltdown?

      Exactly. Most mainframe shops have large teams doing the ITIL Capacity Management process.

  12. Anonymous Coward
    Holmes

    Looks like the CIO decided to exploit the crisis and pitch a bid for new kit (like we all did for Y2K!)

    Maybe the clapped out mainframe can be given to the RBS 'bad bank' along with all the other toxic assets?

    1. Dominion
      FAIL

      Jesus wept! The mainframe isn't clapped out - the staff and management running it are!

    2. Mad Mike

      The mainframes aren't clapped out. You wouldn't believe how often they're replaced!!

  13. Caff

    new mainframe solution??

    Switching out one IBM mainframe for a newer one is pretty par for course and relatively painless.

    From what I heard about the RBS meltdown caused by the CA7 upgrade most if not all of the ensueing madness was down to the way that the various mergers over the years affected their ability to roll forward daily batches to catch up with what they had missed over the days while still allowing some day-to-day banking.

    The hardware upgrade will do nothing to fix these issues, rather they need to go back and properly intergrate the various disparate batches and clearing processes of the banks that make up RBS ( not straightforward by any means )

    1. Anonymous Coward
      Anonymous Coward

      Re: new mainframe solution??

      RBS don't have one mainframe. They had something like seventeen spread over multiple sites when I worked there (IIRC), they were fairly proud to say that they had the largest footprint of IBM mainframe in Europe.

      1. Caff

        Re: new mainframe solution??

        That would complicate things more than a little if they inherited all the smaller mainframe from the member banks and tied the batch of each ones together with file transfers and attempting to schedule and trigger jobs between them.

        This could tie in with the "new mainframe" plan mentioned, which would go someway to explain how this could be a solution to their problems.

  14. Anonymous Coward
    FAIL

    Not a good article

    Quite apart from already-pointed-to fact that this was not a hardware problem at all, a lot of what the author is saying amounts to mainframes are big and complicated, and people have to know a lot to use them.

    Banks are big and complicated, and are supposed to be managed and staffed by people who do know a lot about them, their systems, and how those systems are computerised. And if you are going to computerise something like that, you're going to need something big. Now, lets see, what are those systems called that are up to the job? Ahh, yes, mainframes.

    It's not the sort of task one does with a PC and a bit of cloud.

  15. Lord Elpuss Silver badge

    "25 of the world’s top banks use mainframes from IBM, according to Gartner"

    25 out of how many? Without context this statement is meaningless...

    1. Jemma

      Context...

      The ones that got bailed out and are still trading?

      God knows there can't be many left now... scary to think there are more MG6 customers than there are competent banks.

      Even scarier that these people are CHARGING us to take our own money and lose it.. and people seem OK with this..?

      Is it me, or is there something wrong with this woodcut?

    2. Dominic Connor, Quant Headhunter

      True

      Actually, I'd expect the 25 to be 25 out of 26 or 27, ie nearly all of them.

      As the article says, it is insanely difficult to replace these things and even where they aren't the spine of the firm they do run some critical bits.

      Also the biggest banks grew by sticking bits of smaller banks on, so even if the "main" bit of the bank didn't use mainframes, some other part will often have it.

    3. Anonymous Coward
      Anonymous Coward

      There will never be a need in the world for more than 26 banks :P

  16. Captain Scarlet
    Facepalm

    Heh yeah right, everywhere I have ever worked has treated IT as a thing that can just be upgraded and replaced. Its odd to see peoples reactions when they realise something isnt as simple as plugging someone in

    "financial IT where the technology is treated as black box – meaning it can be installed and operated, without much thought to who runs it"

  17. Arbee
    Alert

    To the people asking what £450m will buy a bank...

    For a good reference of what £450m will buy you, Nationwide recently completed their 5-year Voyager project, which replaced their Unisys mainframes running whatever banking application they did, with SAP Banking Platform. Essentially, this was a complete replacement of the back-end banking platform, a rewrite of most of the middleware, and a rewrite or replacement of most of their front-end applications. We commonly called it "rewriting the bank."

    That cost £400m.

    (Note they still have the mainframes, but they just have savings and mortgages).

    1. Sir Runcible Spoon

      Re: To the people asking what £450m will buy a bank...

      Well, all I can say is that if that had been a government IT spend you could have added a 0 to the figure

    2. Anonymous Coward
      Anonymous Coward

      Re: To the people asking what £450m will buy a bank...

      Sure, but wouldn't most have that have gone in profit margin to SAP? :P

  18. Anonymous Coward
    Anonymous Coward

    Brilliant insightful article, but....

    Call me a cynic, but I just don't see it changing the behaviour of the banks or any corporations chasing revenue and cutting costs.... More like this from the Reg please....

  19. dabar
    FAIL

    HSBC - suffering a meltdown at the moment - 21-6-13

    Just tried to pay in at my local HSBC to be told by one of the staff that HSBC is currently having a global outage on all systems including card transactions in shops.

    1. Mad Mike

      Re: HSBC - suffering a meltdown at the moment - 21-6-13

      By earlier prediction in these comments of it'll happen to them all in the end seems to be coming true quicker than I thought!!!!

      1. Anonymous Coward
        Anonymous Coward

        Re: HSBC - suffering a meltdown at the moment - 21-6-13

        Astonishingly prescient Mike .... I noticed that as well :).

  20. Anonymous Coward
    Anonymous Coward

    OVERSIGHT vs. Regulation....

    What this case illustrates is that outsourcing, language barriers, cost cutting, and assumptions all conspired together to create the mother of all fuckups! The fact that corporations are always playing bungee chord management games in cost cutting and over-efficiency, means that assumptions will always be made. I laugh when I read that more regulation may be needed... Ya think?!

    What's needed is constant OVERSIGHT. Not more regulation. Regulation is just another form of form-filling, bureaucracy and INTERPRETATION of rules. Look at Dodd-Frank for Christ sake. What we need are experts being regularly sent in to assess the true-grit reality of the situation before a crisis occurs...

    "If regulation is mandated then don’t expect a quick fix. Unlike, say, the nuclear or airline industries, where accidents have led to investigations that have produced operation and safety standards, similar standards in financial systems will be difficult because of a fundamental refusal to share information".. Fine. Then send in experts who won't make the details public but will whip complacency into shape!

    But please don't write more rules that will just lead to a fine! Its not enough of a compliance incentive. If the situation if found to be dire, then you need to threaten to remove their banking license.... But this is only workable if we break up these banking monoliths into lots of smaller banks. As an ex-derivatives HFT guy I cannot understand why they are not doing this.... Lobbying is the only answer!

  21. John Smith 19 Gold badge
    Unhappy

    Nothing actually *wrong* with mainframe

    So why replace it?

    IIRC Assembler had a big part to play supporting ATM's on OS/370, but that decades ago due to response speed and number of them. Hard to believe it's still a key tool in the MF developers arsenal.

    Back in the day Cybermation built an awesome mainframe scheduler.

    It was the mother of all TSR's. The devs (patching a live mainframe OS) must have been quite special.

    Sadly CA bought them and the rest is history.

This topic is closed for new posts.

Other stories you might like