back to article 'Inexperienced' RBS tech operative's blunder led to banking meltdown

A serious error committed by an "inexperienced operative" caused the IT meltdown which crippled the RBS banks last week, a source familiar with the matter has told The Register. Job adverts show that at least some of the team responsible for the blunder were recruited earlier this year in India following IT job cuts at RBS in …

COMMENTS

This topic is closed for new posts.

Page:

FAIL

You get what you pay for

End of line.

42
2
Silver badge
Devil

Re: You get what you pay for

In this case, bigger yachts for the execs, smaller UK payroll for staff.

22
0
Anonymous Coward

Re: You get what you pay for

But what are you paying for?

You're maintaining (or not) the investment you've already made in 1000 man years of intimate systems knowledge!

(I know how many staff ran RBS Batch Services prior to the outsourcing).

Bean counters however don't see it that way - they simply see a figure on THIS YEARS books.

ALL companies should learn from this, ESPECIALLY those in Financial Services. The EU is going to push to split up banks, and there's going to be a substantial number of new entrants on the market. They need to learn quickly that running a bank is not the same as running a supermarket. A bank survives by the knowledge in the computer, not by the stock on the shelf.

32
0
Meh

Re: You get what you pay for

Having worked in IT both for large banks and large supermarkets I must point out that neither is like 'running a supermarket' in the way you imply. For instance a small glitch in your invoice processing system could quickly lead to nothing being on the shelves to sell.

7
0
Anonymous Coward

Re: You get what you pay for

Really?

As a taxpayer, what I mostly seem to get is shafted.

35
0
Silver badge
Thumb Up

Re: You get what you pay for

Not to worry, a fresh container of bills-o-the-queen has just left the print shop.

This should lube you up for a forthnight.

2
0

This post has been deleted by a moderator

Anonymous Coward

Re: You get what you pay for

Not quite true: I don't think RBS paid for a comprehensive trashhing of their banking platform, which is what they got.

Serves them right for outsourcing IT though. It's always a disaster. Every time.

AC because the company I work for outsources to India too, with the same level of incompetence resulting.

5
0
Bronze badge
FAIL

Re: You get what you pay for

What else is new, some dammed exec needs a larger bonus.

What would be `interesting` would be the fallout in the executive ranks for the lousy son of a b---- that decided to outsource in the first place. Serve his head on a platter.

0
0
Anonymous Coward

Major error?

Was it sudo rm -rf /

9
2

Re: Major error?

The computer got a virus, which no anti-virus software could yet recognise. Luckily it was very easily removed. They simply had to delete System32.

1
28
Anonymous Coward

Re: Major error?

I was under the impression it was an iseries box, so I cant see your command doing much :)

6
0
Thumb Down

iSeries?

You mean that NatWest runs on a glorified AS/400, sorry, make the S/38.

No, I think it will be a Series Z, which is really a 370 in a pretty frock and new OS. Bet its still running OS/270 with VM/CMS.

As for CA-7, Computer Associate says, and I quote "reliably manage enterprise-wide workload processing" This gives a whole new depth to "reliably".

3
2
Stop

Re: iSeries? @Spartacus

Any system is only as reliable as the people who use and maintain it. This is the technological equivalent of giving your car engine a good going over with a sledgehammer, and then complaining it wont start so it must be unreliable.

22
0
Anonymous Coward

Re: iSeries?

When RBS took over NatWest, they moved everything over to the RBS system. Everything worked. They didn't lose track of money.

That does suggest that the underlying system is solid. But that is no protection against humans doing the wrong things.

2
0
Silver badge
Windows

Re: iSeries?

There is nothing wrong with an AS/400 or VM/CMS.

Damn youngsters.

12
0
Mushroom

Re: iSeries?

I agree. I especially like IBM mainframes, though personally I cut my teeth on MTS and then MVS/TSO.

3
0
Silver badge
Flame

Re: iSeries? @Spartacus

More accurately the equivalent of having your car serviced by a work experience sociology student who's there only because his benefit will be cut if he isn't, rather than an engineer of twenty years' experience who loves cars (who isn't there because the garage "let him go" to save a few pennies in the short term).

11
0

Re: iSeries? @Spartacus

I was working on the basis that the owner knows nothing about cars, hence the sledgehammer, but your analogy works for me too.

0
0
Linux

Re: Major error?

Not quite, it seems to have been: sudo nohup rm -rf / > /dev/null &

0
3
Happy

Re: Major error?

Nah. meant to type crontab -e but did crontab -r.

0
1

Computer Associates

Well, hell, there's your problem right there. CA: where lousy software goes to be murdered, resurrected as a zombie, and unleashed upon an unsuspecting world.

2
0
Facepalm

Re: Major error?

Oh, that sounds an interesting command. Let me just try that.

I'll be back in a minu... oh shit!!

0
0

Re: iSeries?

I bet they are written in assumption that the person knows what she/ he is doing and qualified for the job.

Just like rm -rf won't ask you newbie questions like "are you sure?", you have root and it is assumed you know what you are doing.

0
0

Have they tried turning it off and on again?

Failing that, adding more RAM and an SSD can help to speed up a sluggish system.

2
0

Re: Have they tried turning it off and on again?

I suggest they visit http://www.downloadmoreram.com/ and get some decent RAM downloaded.

Should get things up and running again in no time.

9
1
Anonymous Coward

Re: Have they tried turning it off and on again?

You used to be an EDS manager didn't you?

4
0

Re: Have they tried turning it off and on again?

Well, I went to the downloadmoreram web-site - and find that I have to download 4GB of RA via my browser. You would have thought that they could at least have made a torrent available, hmmm? Or at least they could have made RAM into rar, probably halving the download traffic.

That's quite important with VirginMedia: the zeroes are often too round for the fibre-optic cable; you could rotate them 90 degrees around their vertical axis of symmetry, but then they would like too much like ones. It's my I bet the ones would come down the wire quicker if they were also rotated 90 degreees, to look like hyphens.

RBS should use advance critical thinking skills like this. Oh, wait....

0
0
Thumb Up

Good work, El Reg

Keep digging!

17
0
Thumb Up

Re: Good work, El Reg

Indeed! The Reg is now the technical authority in this affair.

1
0
Pint

Re: Good work, El Reg

You're showing up the rest of the media as a bunch of useless press-release advatorial writers. Proof that the average journo writes bollocks as doing tech was too hard for them.

Well done.

Have a virtual beer token on me.

3
0

Re: Good work, El Reg

+1; but with one correction

"Proof that the average journo writes bollocks as doing [select * from useful_careers] was too hard for them"

0
0
Anonymous Coward

no backup of the schedule?

So did they have no backup of the CA-7 schedule?? certainly sounds like that is the case.

4
0

Re: no backup of the schedule?

More likely they did take some routine backup which included the database but had never exercised a full recovery back to service of the application given this failure mode, followed by successful completion of the batch schedule.

3
0
Anonymous Coward

Re: no backup of the schedule?

Not knowing how CA7 works, I'm not sure, but even if you could restore the batch schedule from tape, you'd have to be careful to ensure you don't lose the current schedule state, i.e. which jobs are still in the queue pending to run, which have run successfully, which may have failed, etc, etc, etc. It's not just a case of reloading from tape, I'd have thought.

Of course, there should be some method of recovering batch jobs (primarily due to manual error, I'd imagine), whether it has ever been designed to cope with such a massive loss/corruption of the schedule is another matter.

6
0
Silver badge

Re: no backup of the schedule?

I should think that such a piece of software was built solely to solve such problems and, if not, then you should be leaving obvious markers as you churn through jobs and also have proper transaction rollback so you can "undo" a broken / incomplete job before continuing with its replacement (and even just rollback to before any of the jobs were run or any of the schedules deleted!).

We shouldn't apply common server management functions to such large jobs, I think, but we should hold them accountable to a HIGHER level of control over such things.

How did someone inexperienced get on the team?

How did they get access to the schedule controls?

How did they managed to delete EVERYTHING on there?

Why did the software allow such deletion without confirmation?

Why is there not a rollback or even versioning function for the schedules?

Why, precisely, does one mess-up by one employee in front of one computer put your ENTIRE BANKING SYSTEM out of action, nationwide?

21
1
Anonymous Coward

Re: no backup of the schedule?

"Why did the software allow such deletion without confirmation?"

You seem to be assuming it was without confirmation?

Maybe there was a language issue and the operative simply didnt understand the confirmation?

7
0
Anonymous Coward

Re: no backup of the schedule?

This is a mainframe - the language was probably so convoluted even a native English speaker would struggle to figure out what he was being asked.

4
3
Silver badge

Re: no backup of the schedule?

Then I refer you back to the first question - how did someone who couldn't understand the confirmation get into a position where they were presented with it in the first place? And/or, what were they doing confirming it if they didn't understand it rather than CHECKING with someone else? And/or, what is your entire banking system doing hinging on the wording of a confirmation?

9
0
Gold badge
Facepalm

Re: no backup of the schedule?

1) Even experienced people drop a bollock sometimes. A colleague of mine had over 15 years experience of the systems concerned when he heroically deleted the entire environment for ${country}. Yup, a whole machine's application set and data down the crapper in one misplaced rm -rf *. Longest recorded ohnosecond in history.

2) When you're in rolling back from upgrade mode, you tend to be playing with O/S commands and infrastructure utilities, not user software. There are many things around at that level in most environments which do stuff quite capable of ruining your day without asking. It's sort of assumed that the type of person allowed to play with them is allowed to use the sharp scissors. However, as we see in (1), even the best of us drop a clanger occasionally.

26
0
Silver badge

Re: no backup of the schedule?

The same way call centres work... Crib sheet.

Press this, click that, just say "Yes" to anything it asks you to confirm.

Now off you go.

2
1
Bronze badge
Go

Re: no backup of the schedule?

@TeeCee

"Even experienced people drop a bollock sometimes. A colleague of mine had over 15 years experience of the systems concerned when he heroically deleted the entire environment for ${country}."

Of course experienced people make cockups. Backups and well tested recovery procedures aren't there just to recover from hardware failures, but human error too.

I've never kept a tally, but my real life restores due to human error far outnumber those due to a hardware failure, probably by a factor of 100.

12
0
WTF?

Re: no backup of the schedule?

From the CA-7 sysprog manual.

Because CA-7 is controlling a production environment, backup and recovery of its database

becomes extremely important. Backups of the CA-7 database should be scheduled

on a regular basis, at least once each day. If possible, CA-7 should be down or at least

reasonably inactive during the backup, with no permanent updates being made to the

database. All data sets in the database must be backed up at the same time.

Additionally, the backup procedure should be as fast as possible especially if scheduling

is to stop. Two other concerns for backups are to produce a single source for recovery

and, where practical, to provide error checking of index and pointer elements.

With the above items in mind, you may find that no single utility satisfies all your concerns.

On the one hand, the SASSBK00 program provided with CA-7 creates a single

source file for recovery and performs error checking of index and pointer elements;

however, it is slow for a large database. (It is slow because it creates a logical as well as

a physical backup for conversion purposes and therefore produces many more records

than a utility such as IDCAMS or CA-ASM2.) On the other hand, utilities such as

CA-ASM2, IDCAMS, and DFDSS are fast and can produce a single source for recovery,

but they have no error checking of elements.

Seems pretty simple to me.

8
0
Anonymous Coward

Re: no backup of the schedule?

To answer Lee Dowling questions:_

Q1. How did someone inexperienced get on the team?

A1. RBS got rid of all the experienced staff and outsourced their jobs

Q2. How did they get access to the schedule controls?

A2. RBS got rid of all the experienced staff and outsourced their jobs

Q3. How did they managed to delete EVERYTHING on there?

A3. RBS got rid of all the experienced staff and outsourced their jobs

Q4. Why did the software allow such deletion without confirmation?

A4. Computers do what you tell them, which more often than not is not what you want them to do, that's why you get experienced staff to do it. You can try this yourself, open a command prompt on you PC and enter "format c:" and answer "Y" to the question.

Q5. Why is there not a rollback or even versioning function for the schedules?

A5. There is but RBS got rid of all the experienced staff who knew how to do it and outsourced their jobs

Q6. Why, precisely, does one mess-up by one employee in front of one computer put your ENTIRE BANKING SYSTEM out of action, nationwide?

A6. RBS got rid of all the experienced staff and outsourced their jobs

Simples

83
2
Anonymous Coward

Re: no backup of the schedule?

"Then I refer you back to the first question - how did someone who couldn't understand the confirmation get into a position where they were presented with it in the first place?"

You're making a lot of assumptions there, matey. You appear to be getting all steamed up over a hypothesis.

"And/or, what were they doing confirming it if they didn't understand it rather than CHECKING with someone else?"

People. Fuck. Up.

There doesn't have to be a conspiracy. There doesn't have to be a failure in process. Sometimes people just screw up, and sometimes there really is just a lone gunman. And I very much doubt that you will get the answer in a public forum.

"And/or, what is your entire banking system doing hinging on the wording of a confirmation?"

What; you've never accidentally deleted an entire file system by rushing an hitting a wrong key?!

Maybe there were four phones ringing with people screaming for an update and an ETA on the fix. Maybe three managers were over his shoulder 'helping'. That's pretty distracting when you're trying to fix a major system outage in a FTSE100 company, in my experience.

12
6
Anonymous Coward

Re: no backup of the schedule?

I don't think it was the scheduler faulted. Most likely that, some poor sod omitted vital step(s) from EOD runs / scheduling table got corrupted during upgrade.

Have you ever worked in a complex environment where multiple systems feed data to each other? No I thought not! Even if you have recovered from the changes to the scheduler, you would have to restore the missing input feeds, massage header files etc.

This has to be a concerted effort for all production systems up/down stream from the host the EOD batch runs from.

Even under controlled DR environments with known input files it is usually 2 days effort with months of planning.

To make the matters worse, if you realise you've missed something 2 nights ago, you have to do this concerted effort for every EOD run.

So please, before critisizing the people who are recovering the system / transaction engage your brain.

Recovering a stand alone box is so easy, I would not even shed a single bead of sweat for it.

3
1
Go

Re: no backup of the schedule?

I work in a large investment bank and am very au fait with scheduling of overnight processes, although we use control-m.

We did have an instance where a global upgrade to control-m failed and brought the whole tool down. The trouble is, best practice is now that all batches are scheduled by many disparate systems, and the batches need to be "granular" i.e. each part of the batch runs to completion and is sanity checked by automated checkers before the next stage can run. Because of the insistence that this is done by schedulers, many teams struggle to run their overnight processes without knowing the scripts that are run, or the order they are run in. In the event that the whole system is down, this information may not be readily available. There are some right numpties on particular teams in any large institution, and they're often trying to unravel things designed a decade ago.

I bet something like this happened here - the scheduler fell over, overnight batches were interrupted (potentially mid process where that process isn't rerunnable, requiring a database restore) and the teams with the knowledge to run these overnight processes manually may not necessarily have been on call or available.

1
0
FAIL

Re: no backup of the schedule?

"Why, precisely, does one mess-up by one employee in front of one computer put your ENTIRE BANKING SYSTEM out of action, nationwide?"

it's all about management controls.

Whilst sometimes it is not possible to build in management controls to prevent a single operator nuking a system (but many times they are, but the can't be arsed to build them in), in this situation you use a checklist, do not think, do not use initative, follow the check list to the letter.

There is a reason pilots use them.

They work just as well for operations and application support.

and if they can't follow a check list, you shouldn't have hired them for that job.

1
0
Bronze badge
Mushroom

Re: no backup of the schedule?

Unless there is no checklist, or the checklist is out of date...

0
0
Anonymous Coward

Re: no backup of the schedule?

"You can try this yourself, open a command prompt on you PC and enter "format c:" and answer "Y" to the question"

Oh, really now? Quick Robin, to the CMD prompt!

C:\Windows\system32>format c:

The type of the file system is NTFS.

WARNING, ALL DATA ON NON-REMOVABLE DISK

DRIVE C: WILL BE LOST!

Proceed with Format (Y/N)? y

Formatting 286181M

System Partition is not allowed to be formatted.

Nope, turns out you were wrong.

7
2

Page:

This topic is closed for new posts.

Forums