back to article RBS Mainframe Meltdown: A year on, the fallout is still coming

A year ago, RBS experienced its Chernobyl moment – an incident when a case of simple human error by those running critical systems resulted in a crisis. IT staff badly botched routine maintenance of the IBM mainframe handling millions of customers' accounts – a system processing 20 million transactions a day. The mistake was …

COMMENTS

This topic is closed for new posts.

Page:

Silver badge
Meh

Bah!

Yes, let us replace all the old technology.

We'll get the expense whch can be written off (it has been an industry maxim since forever that IT departments are a money sink and a Tax Loss). Shareholders=happy.

We'll be able to employ lots of programmers. Jobs=popular.

We'll experience cost overruns in the millions. Vendors=ecstatic.

We'll miss deadline after deadline. CoB="Concerned".

We'll probably experience several false starts as we discover the hard way what we forget to spell out in the spec or what the Universities forgot to teach *this* crop of IT Graduates. CoB=fired (problem solved, at cost of "New Broom" policies that invalidate man years of work - and before you scream and leap, what woman would be stupid enough to get ensnared in this can of worms?)

I can see no downside to the plan.

On the other hand, we *could* just develop a migration path to a more cross-connectible solution over time. in which new development was built using newer ideas.

Yes, I *know* we all said "yes" when given those "are you planning to migrate from mainframe to client-server architecture in the next five years" questionnaires at every f---ing IT confab run in the 90s, but we did that to qualify for the free tote bag of goodies.

No-one in their right mind would seriously jeopardise their career by actually *doing* that. The vultures would start circling as soon as you ran the first project meeting and from that moment on your days would be numbered as every pair of eyeballs in the place would be vetting your every move just waiting for the main chance to crap all over you and take your job.

Rewrite everything in "C"? Are you barking mad? In what world does *that* make sense? All the expense of rewriting your financial software, but in a language intended for crafting device drivers. Remind me what your degree subject was.

Change is bad. Ask the guy who was put in charge of the Y2K project so his failures would be public enough to fire him should he continue to agitate with the board for unpopular tech reforms.

3
1
FAIL

Stick to what you're good at, Gavin

I don't think you have the background to write intelligently about the mainframe world, Gavin, and looking down the list of your last dozen or two articles suggests that you might have been mis-assigned to this story.

For one thing, you emphasize the replacement of the IBM mainframe as RBS's solution to the problems causing last year's outage but never even hint at what models are being replaced and what levels of operating system are involved. For another, IBM's parallel sysplex environment is specifically designed to make upgrades on the fly fairly straightforward (as straightforward as such things can be, I grant you), and parallel sysplex has been around for more than 30 years and as a result works rather well.

The root of the problem, as has been pointed out by several other commenters, was a failed CA-7 upgrade. This job management package has also been around for more than 30 years and is well understood. True, CA has done its usual half-fast job after acquiring it from Uccel (formerly University Computing Corp.) but the problem still boils down to the wrong people responsible for the wrong tasks at several levels of tech and management. If you have an incompetent team, taking away their old hand tools and giving them shiny new power tools just means they will do more damage more quickly.

3
1
IT Angle

And yet...

executive types still ask questions like, "why do we spend money on IT when they don't directly make money themselves?", and "why do we spend money on systems monitoring?".

The thick will only get thicker. Bad IT policy is a result of the profit-this-week-to-hell-the-rest mentality sprung upon companies by investors blinded by their own impulsive, myopic and short-term self-interests . Yes, I do believe its the investors who are the root cause, bad-children actors in all this mess.

0
0
Silver badge
Devil

>500 IT jobs that have been outsourced to suppliers in India.

you know, because IT guys in India have this huge experience running IBM mainframes...

0
0
Anonymous Coward

I heard they all had 50 years each, on their CVs.

0
0

Two CEO Viewpoints

There are 2 ways of looking at an IT department:-

1) Our IT Systems work perfectly and without problems because we have many highly skilled and experienced staff.

2) our IT Systems work perfectly and without problems, so why do we have all these expensive IT staff.

3
0
Gold badge
Unhappy

Odd but when they multiply No of staff used x No of yrs to develop banks discover

It costs a f**k of a lot of money to duplicate the functions and then test the software, to confirm you have.

You cannot overestimate how far a bank will go to avoid having to re-implement a system, especially a system that has been running reliably for decades.

BTW what's tended to happen was support systems were added over the years. Firstly as applications on the mainframe, then (typically) on DEC VMS boxes, then on to various flavours of Unix, then on Windows (and Linux) servers, with rising levels of virtualisation as that technology has improved.

Would reverse engineering this "architecture" to re-factor the system so some modules were brought "closer" together be a good idea? Probably.

Is it going to happen. Probably not.

0
0
Silver badge

Knight Capital

It wasn't an algorithm problem, it is believed that the market making testing software got released into production with the market making software. It then did its job and tested the automated market making software - across the entire market, not just KC's.

http://www.nanex.net/aqck2/3525.html

0
0
Flame

Applying Some Brutally Honest German Rationality Here

A) Shoddy Journalism: RBS apparently bought new mainframe hardware (certainly not just one for 450 million). All that we know points to that they just bought instruction-level and Operating-System-level compatible IBM mainframes, which will happily run code from the 1960s. Essentially, the latest model of S/360. Disregard the IBM marketing bullshit "zOS". These are the latest S/360s running MVS (aka bullshit "zOS") and VM.

B) If they actually decided to completely rewrite their operational software, this would cost something like 5000 million pounds and take 10 years if they would ever finish that project. So, they will simply run their existing software on newer, faster hardware. Still S/360 compatible and it will happily run all the old assembler programs and all the programs for which source code has been lost. Which certainly is something like 5% of all source code. Don't bullshit me with "this is a bank, they have proper source control". They have NOT.

C) All we know points to the M.B.A. idiots (and other management "talent") to have purged themselves of their skilled IT/mainframe/operations specialists in a mad drive to "cut costs, improve profits and get a fat bonus for top management". In other words, my dear Englishmen and Scotsmen, the Banksters have shafted you thoroughly by eliminating well-paid, demanding, high-quality jobs. The "icing on the financial cake" so to speak.

D) The awe of the financial sector displayed in this article is just ridiculous. Stop Believing In The Money Religion. The financial industry is rife with crooks and they will crook up your entire country, the entire economy if you don't grow a healthy scepticism towards them. If you continue to let them lie unpunished, then you can look forward to the rule of Ricardo Cromwell, Prime Master in 2020.

0
1

Think I'll put my savings under the mattress.

Almost all "big" banks run their core systems on IBM mainframes. They do it because they work, they are reliable, and they are cheap! Don't believe the $450m rubbish, maybe this included the new building for their new DR centre? The hardware and system software is simple, standard IBM stuff with a handfull of add-ons from other vendors like CA-7 in this RBS example. The system software provides the database management systems and the transaction management systems. This is all standard stuff and not banking specific. A typical bank has a handful of machines clustered together in a thing they call a sysplex. They can pull boxes into or out of the sysplex for maintenance or upgrade whenever they like, and almost all of these machines are less than 2 or 3 years old. The sysplex behaves as a single system, and most of these have run 24hrs per day for years without outage. Then they will have another sysplex in some other location ready to handle a disaster at the main site.

The "system" might be standard stuff, but the application software is another story. These systems have grown incrementally for half a century. Surrounding the online transactional systems that drive the ATMs or your smartphone app there will be thousands and thousands of interconnecting batch jobs which have been patched and tweaked over the years every time some new regulation comes along to tweak the banking rules, or everytime some bright marketing person comes up with another gimmick to help sell some bank service. Every bank take-over or merger escalates the complexity. This network of jobs is what CA-7 tries to manage, and what "broke" when CA-7 itself got broken, apparently by a "pilot error".

The RBS meltdown was spectacular but as they say "...you aint seen nuthing yet". Most of these same "big banks" have been steadily hollowing out their core support skills. Either by outsourcing or offshoring or just simple downsizing. The systems are so darned reliable that they normally just tick along trouble free, it is only when something goes badly wrong that those "experts" really earn their keep. And it is not the "Wall-Street-Way" to pay people who are not immediately vitally needed ... right now. So the experts are disappearing. Next time a bank has a technology meltdown who will they call? They might be hoping to rely on 3rd party expertise, but even the likes of IBM are shedding people so fast that finding someone who understands how bank XYZ's systems actually work will be an ever bigger challenge. Think I'll put my savings under the mattress.

2
0
Anonymous Coward

Architects with a God Complex v Darwinian IT Evolution

Here we have an example of why Outsourcing / Offshoring doesn't work and how errors are compounded by refusal to wake up and reverse bad decisions.

A working lifetime of skills don't get transferred in a time-boxed handover, only a list of procedures and instructions. Eventually (sooner that you might think) those instructions are undermined by rapid turnover of staff in the offshored / outsourced environment which is compounded by the minimal maintenance / upgrade investment of the Outsourcer who typically inherit ageing kit and who will eventually hand back vintage Unix / Mainframe / Wintel boxes that are now virtually unmaintainable, limited remaining ISV and HW Support and with limited/complex/expensive upgrade options.

The Business Brains @ RBS made a terrible gaff and made short-term savings and have put their business at unnecessary risk, now rather than admit their mistakes, they appear to be planning to rip out a perfectly working system (that they admittedly no longer fully understand) that has grown over decades - you cannot re-design systems this complicated over night.

Last I heard of someone doing something that complex was the big beardy guy upstairs. AFAIK Architects / Business Managers are not omnipresent God-like beings, just a bunch of hairless monkeys who ought to wake up and recognise that complex systems evolve over a long,long time. Some bits don't work and will die off or be adapted or updated and there's a bunch of redundant code that probably isn't used any more.

0
0

What is the root cause

I have read the article and do see what is the root cause?. For an upgrade to south, and not able to recover in a short time span, means a number of things, bad planning, wrong resources being forced to take on the upgrade, and not understanding the product and the environment.. and lastly where was the fall back plan.. Should of not been approved to perform such a change. Probably to much senior pressure.

Replacing the system, whether this is hardware and software is more an executive management escape from finding what the real cause was. I doubt very much it had anything to do with the platform (software or Hardware) but like others have pointed out, lack of real experience with the environment in which they are working on..

If the same philosophy is used by Senior/Exec Management years from now the same will occur.. Whether this be Mainframe, Windows, Linux..

This occurs far to much all sectors within large organizations, and they go and spend millions to appease share-holders and the board, that something is being done..

But again many Exec Management only stay around 3-5years and move on... Exec management 101

0
0
Anonymous Coward

RBS uptime

even internet banking has a maximum 97.2% uptime and that is before the other fiascos...

below is every single day!

From the RBS website:

System maintenance - 02:00am to 02:40am

Please be aware that our systems update every morning between 02:00am and 02:40am. During this time you will be unable to log into your Digital Banking

0
0

Page:

This topic is closed for new posts.

Forums

Biting the hand that feeds IT © 1998–2018