some people have already thought this
The NatWest and RBS IT cock-up that caused 600,000 transactions to go missing this week was entirely unrelated to the 2012 mega IT cock-up, the bank has said in an not-too-reassuring update. In a webcast about the Royal Bank of Scotland's IT strategy today, Simon McNamara, chief administrative officer, said: "It is different …
I respectfully disagree. The management gobbledegook filter clearly states "ingest", so they had the file. What you've described is a failure to "transfer" a file.
No, the word "ingest" means, quite clearly, that the file had some kind of unexpected content.
And we're now ALL thinking the same thing.
"The CSV file had a comma in the wrong place."
Because decades of experience, billions of pounds and ever-improving technology STILL can't defend itself against a comma in the wrong bloody place.
Such as it ever was, is, and no doubt will be.
Perhaps there was a data error, then.
But nobody checks anything these days until something goes very visibly wrong. The doctrine in schools is that we must not query the creativity of the little darlings by anything so vulgar as re-reading and checking their work.
And so they become the programmers who let the customer find the faults and complain.
Would have thought they used something like a dry run process, some kind of environment that would let them look at what the file is and does before it goes live.
I don't know lets call it test, you are testing the system so perhaps call it a test environment. Where you can then test the 13million transactions and find missing commas before it hits your live environment.
Or have I just worked in places with logical thinking?
No, no. They have the filter that checks for the misplaced comma. It runs a vm that logs the results and that's all that vm does. However the guy who normally archives the logs was
fired downsized rightsized retired last week. As a result the vm crashed when it could no longer write the log file. This in turn cause the ingest failure.
Seems plausible to me. Alternative explanation: Cleaner unplugged the server for her Hoover / tripped over the relevant cable and noone noticed. Well, it is RBS.
600,000 transactions. FROM 1 THIRD PARTY. Should be possible to verify that by checking what from whom has gone BOING. The anguished cries of the twitterati et al should give a good cross section. That would check the veracity of the only clear part of the gobble.
Anyone got a few minuites to waste checking?
I do love the word Ingest. Do they still use paper tape / tie readers? Enquiring BOFHs need to know etc.
FROM 1 THIRD PARTY
Which is HMG. Most of the screams were about tax credits and such. I did not notice a twitterati scream about salary (or other form of money earned by hard work).
When HMG is involved it can always be presumed to be the guilty party. So my guess is - incorrectly formatted transaction file coming in.
It's reasonable to assume that the RBS systems have are able to handle exceptions at a transaction level, i.e. reject records that contain bad data, rather than the entire file. If they didn't, problems like the recent one would happen at least once a week.
If a system that can handle bad records rejects a whole file, the likelihood is that the third party that supplies the file has modified the format, either deliberately or accidentally.
You're all wrong! At least if it has been established that this is a government batch-file.
They've said they can't restore it until the weekend, so it's obvious. The government sent the CD, and it's got lost in the post. So they've re-burned it and posted it again. Natwest now have a techy permanently stationed in their post-room, Segway at the ready, to zoom him off at top speed to run it up to their server room. 15 minutes after it arrives, all will be sorted.
Presumably if this one goes wrong, then they'll put it on a memory stick, and lose it in a taxi instead. it's important to plan for a variety of failure scenarios...
Standard processing for Direct Debits and Standing Orders can still be three days depending on the source and target banks and the size of transaction, so three days plus the missed day means people *could* be affected for four days. The vast majority will have completed by now, but you don't want to further upset the tiny minority still affected.
Failure to ingest
Is it just me that has a mental image of a server in an old fashioned metal cabinet, tapes whirring of course, and green vomit spewing out of it (accompanied by smoke and plaintive beeping), as it fails to ingest this file?
Perhaps a computer like this one (Youtube link).
Scalability is not the same as availability.
No-one with a clue about what they are doing says a system is 100% available. That would be saying that there is no remote possibility that could ever endanger your system, including Godzilla strike, changes to the laws of physics and err... Moth-ra strike. Or Mechagodzilla. You would only offer an SLA of 100% if you are happy that the payout you are contracted for (or likely to be fined for in this case) is affordable.
If it's perfectly sane (i.e. can't be crashed by bad data), perfectly stable (i.e. no changes are applied) and perfectly secure (i.e. no patching needed), you can design the infrastructure for 5-nines.
However, basic probability theory says that an event in the past does not affect the probability of future random events.
I imagine the import routine puts the direct debit/standing order transactions into a single monolithic database that is latency/performance dependent on the server it runs on and/or the storage it uses. No doubt RBS have upped the server spec over the years, put the database on faster storage, but baulked at actually rewriting or modifiying the app to use a distributed database. It's a scale-up problem.
Investment from RBS probably means "we'll buy more hardware" not invest in staff who know how to write the application for the modern world.
You need to understand the way the updates are processed, the parallel update requirements etc. before you cam make a statement like "modifiying the app to use a distributed database".
I never worked with RBS, but certainly while we were working with NatWest they were willing to rewrite, the question was into what? While there have been many advances since RBS bought NatWest, 2002?, what would you write it in today?
Can I just make a point here about the number, 600,000.
Firstly, it's a bit exact, meaning that anyone that works in numbers knows that when someone give an exact rounded number then it's bullshit, around 600,00, just over 600,000, are both ok.
Secondly if it was say 6 million but you say 600k no one will ever find out unless either a. you know 600k other RBS customers or b. Over 600k people complain on twitter/facebook (which strangely enough would be 10% of that number)
Just my tuppence worth.
What good does it do fining an organisation - where is there justice in that?
That 56 millions of squids would be better spent in relief to those directly affected and making infrastructure improvements so it limits chance of things happening again.
There aint no justice in a fine. And a fine has potential to harm those already harmed by the earlier incidents by diverting dosh away from where it is needed? (If a fine is required it is better to fine those individuals directly rather than the organisation itself?)
Organisations do not create money and any money in an organisation is generally and principally provided by its customers. High earners in the company are likely to remain unaffected however ...
Far better for the authorities to instruct organisation to implement an improvement plan and compensation plan to 56 millions squids level. (Just like Health & Safety inspections in uk - there are no justifications for slapping on a fee because additional improvements are required?)
Point well made and taken.
But the banks use customers money to do risky stuff that creates profits or losses for the bank (and increasingly these days particularly in days of high inflation? - there is attitude: get your money working for you. Meaning any organisation handling money speculates with that money in its possession with a view to creating profits or at least covering staff costs and staff bonus for those involved?)? In UK Icelandic bank "crash" (term used loosely) cost local and central guvmint administrators quite a bit as they had dosh (that is customers dosh and profits made from handling that dosh) tied up in Icelandic funds no?
is a waste of time, they just charge their customers more to recover the money. What should happen is that those tossers directly responsible and accountable for the fcuk-up should be personally fined. Taking money directly out of their pockets would seriously focus their minds.
True, true, ...
The trouble is in UK that the strong tradition of Tort, vicarious liabilities, redress, ... sort of is overlooked and ignored by recent practice of not holding individuals to account with a preference to apply a fine to an organisation.
It seems a bit of a strange set-up considering England's attraction to Common Law?
I suppose we may draw our own conclusions as to why this set of circumstances comes about.
Is a bastage a cross between a bastard and a hostage?
Are you suggesting that approach each board member with an attractive "secretary", and once he's fully honey-trapped, she sprogs and then you hold the child hostage to his good behaviour. Either with risk of harm to the child or exposure for his affair - plus huge paternity suit?
I can see this idea working. However, rather than wasting your talents regulating the banks, I'd prefer to move you into Ofcom first.
I would approach this is a top-down manner, not from the poor sod in the basement upwards. Ultimate responsibility is with CEO, fine him, then comes the CTO, fine him, if there is a senior manager of IT,he gets a fine as well. Each should have a fine based on their salary, plus their (extravagant) bonuses - 20% of this total sounds about right. That should send the message and keep them on the ball.
I'm only half joking here :-)
"Organisations" do generate money, yes from customer who once they hand it over no longer own it.... that is the whole point of "commercial organisations". Sell shit for more than it costs to make and supply; get it?
Your statement would have some truth if the "organisations" you were referring to were charities or not for profit.
Now that said......................
I totally agree with your point that it is counter productive to fine something like RBS that is in debt and loss making. This merely adds to their problems.
However it is increasingly clear that the "Regulators" despite the hugely expensive and time wasting re-brand are not able to hold such Banks to account when it comes to investing "a fine" in better infrastructure.....
After all they can barely do the job they are supposed to do and that is stop "market and customer abuse"
I think a better approach is heads rolling at senior levels plus a fine that needs to be invested in defined service improvements that are auditable and measureable.
The problem is not so much with the fines as with how they are assessed. We invented the corporation to limit liability so that, if I wanted to invest say 5,000 pounds in RBS, that 5,000 pounds would be the limit of my liability (I wouldn't lose my house or retirement fund). Likewise we've limited the damages an officer of the corporation can suffer. The problem is that we've extended that protection too far. While it is reasonable to offer officers some protection against what Donald Rumsfeld called unknown unknowns, we've extended it not only to known unknowns, but even known knowns. If the fines for this were applied to the officers of the corporation they could have a salutary affect.
I see this time and again on mainframe (and server) batch processing. Coders write code to process an input file, which fails when it gets an 'unexpected item in bagging area' such as an odd character or a blank file.
Expect data received to be wrong sometimes and code to handle it or at least alert that it has happened so you can fix the data.
It's not rocket science people.
"We would love to be able to say will never have a technology failure again," he said. "But it is not feasible to run a 100 per cent faultless system."
It isn't possible to have a cheap fault free system, but the active word there is cheap.
RBS are now running a span between failures of less than 4 years. The embedded software in my TV, car, fridge, washing machine, tumble drier, dishwasher, burglar alarm, etc etc etc has all managed many magnitudes of that. Even my Windows Phone has ran error free for all of that time.
Yes, RBS systems are rather more complicated than any of those systems, however, large complicated systems only look large or complicated if you view the whole thing at once without looking at the repetition & commonality, or you don't break it downinto components and tasks.
Amazingly, from recollection that may be incorrect, RBS didn't have any of these issues, but did have the archaic systems, prior to offshoring all of their staff. Quite how anyone can, with a straight face, deny that is central to the problem is beyond me.
While that is obviously the problem, the solution is not quite so easy as it might seem. The problem when you have a group of highly skilled people working in a specialized area is that once you lose them (in this case from firing them) you can never reassemble the team. Each of them has had to go and find new work. And even if you did approach them and offered them their old salary plus inflation with full reinstatement of seniority plus a 10% signing bonus, they'd have to be nuts to trust you again. So you now have to build that expertise from scratch even if you insource the work.
I wonder if this is yet another time bomb left by the outsourcing deal put through by Mike Errington and Ron Teerlink. Isn't it about time that organisations looked at the real cost of moving the work from experienced hands to developing countries, or at least the people doing the deals being held accountable?
RBS gets what it gets. They don't even both using an SPF record to protect their users from spam. If they can't even bother with a simple DNS TXT record, why should we believe they have any skill for serious issues??
;; QUESTION SECTION:
;rbs.com. IN TXT
;; AUTHORITY SECTION:
rbs.com. 1799 IN SOA ns1.markmonitor.com. hostmaster.markmonitor.com. 2015061803 3600 1800 3600000 21600
;; Query time: 30 msec
Some careful wording in that statement. Seems logical to assume that the 'file from a 3rd party' probably relates to a 'related' third party. In factit probably relates to a file and team which would probably been 'internal' a few years ago but can now conveniently be reffered to as an anonymous '3rd party'. Its probably rbs india or accenture or similar.
The last time there was a major cock up there was much talk of 'coming clean' and 'full disclosure' when the full facts were available but then when things went quiet nothing happenned. Save for some chat about ca technologies contributing to cover rbs's fine there has been no clear admission of the previous fault, how it happened, who was to blame or who was ultimately punished. Dont expect anything diffrent this time.
Biting the hand that feeds IT © 1998–2019