Re: I read that as an expired cat!
> Have you ever seen the damage that mice will do to the wiring under the floor of the server room?
Yeah, the USB ones are the worst.
On the fourth day of a IT systems choke-up that has left customers unable to access money and in some cases unable to buy food or travel, Natwest and RBS – which both belong to the RBS group – still have no idea when the issues will be fixed. A spokesperson said the banking group had been working overnight to fix the problems …
Things appear to much more serious than is being admitted, if you can believe what you read in the newspapers ...... http://www.independent.ie/national-news/ulster-bank-thousands-without-cash-as-bank-fails-to-fix-it-crash-3146374.html .... and on fora you get a feel for the inconvenience caused, but fortunately will all be compensated should they suffer unwarranted charges caused by the glitch? ........ http://www.boards.ie/vbulletin/showthread.php?t=2056676772
It never rains but it pours, and it is lashing down here today.
"" The words "we'll just get SAP and adapt it for our needs" are a sure sign that you have morons running your company. ""
Hmmm... I've been on the frontline of implementing and managing SAP projects for end clients 12 years now and I've never heard those words mentioned once. In my not inconsiderable experience the morons on SAP implementations are normally the leaders/management of the 3rd party SIs such as Wipro, Accenture, Capita etc. rather than the end clients themselves; who generally go through quite detailed business cases and product selection processes before they choose ERP systems or related products.
I'm not sure what your experience is with SAP/ERP, but I would like to point out that I do agree with the first 3 paragraphs of your comment though.
Unless it has changed drastically, IIRC RBS (and Natwest, they migrated Natwest customers to the RBS system after the takeover) update main customer accounts in batch (on an IBM mainframe, natch.) overnight via a number of "feeder systems" (BACS, Accounting Interface ,etc.) through a number of "streams" through their main account update system (can't remember the exact name it had, Sceptre? - something like that) which cover a range of branches. These originally reflected the distance of the branch from Edinburgh, so stream 'A' was branches in the far distant north and run first, allowing the van with the printouts to leave earliest as it had the furthest to go. Seem to recall that Natwest started at stream 'L'.
The actual definitive customer account updates were carried out by a number of programs written in assembly language dating back to about 1969-70, and updated since then. These were also choc-full of obscure business rules ("magic" cheque numbers triggered specific processing) and I do not believe anyone there really knew how it all worked anymore, even back in 2001. I remember sitting in a meeting discussing how the charge for using an ATM which charged for withdrawals could be added as a separate item to a customer statement and waving a couple of pages of printouts of the source of one of them. The universal reaction was "wow, can you actually read that?". I can't see them having mucked about with that one too much, since good assembler people are like hen's teeth and it was decidedly non-trivial to make any changes to it, but the accounting rules did change frequently in the feeder systems. My bet is that some change has resulted in discrepancies in the eventual output from these systems, and a combination of retirement and redundancies has left them with very few people who know how it is all supposed to work, and therefore identify the cause of the error and fix it. Or they might have just blown the size limit on one of the output files from the various feeder systems (VSAM IIRC, limit was 4GB, unless they moved to the extended types), or possibly the FICON cable just dropped out of the back of whatever device some of the disk volumes live on.
Anyway, very complex stuff which all has to just work together. Of course, the moral is, complex mainframe systems require staff with the skills, and in this case, the specific system knowledge to keep things smooth. The fewer of these you have, the more difficult it is to recover from problems like this.
My $0.02. Perhaps someone with more recent knowledge will care to rebut?
My partner worked at Halifax/HBOS and their experience sounds exactly the same as yours. Complex spagetti code with stupid naming rules that made the complex software even harder to understand and no concept of testing or proper debugging. Debugging at Halifax is pretty much the use of printf style statements scattered throughout the code rather than using proper tools. Halilfax even threw out Leeds' system which was Unisys and a high level languge for IBM and assembler becasue it wasn't invented here.
I understand that your description of the RBS Mainframe based batch update process is fairly accurate. The source of the problem was a software update to Batch scheduling suite CA7. The upgrade when so well that now there is no schedule to run all of those thousands of batch jobs to receive and make BACS payments, update balance, schedule printouts, etc.
I am sure the problem with the CA7 upgrade and the unfortunate misplacing of the Batch schedule has absolutely nothing to do the with the last UK based technicians leaving recently. The guys in India of course are perfectly able to cope and fix their mistake. I'm sure they understand how the thousands of jobs in the schedule need to ordered to make sure there is data corruption or loss. After all the problem happened on Tuesday and it's only Friday.
I wonder how many ex-RBS staff have received very lucrative short term contracts in the last few days......
This post has been deleted by its author
I'm AC for very obvious reasons having been one of the recent 1000+ to find their roles now being done from Chennai, however I have been speeking to a few ex-collegues who are still there and can confirm that they say the same as the above poster as in a CA7 upgrade was done, went horribly wrong, and was then backed out (which will have been done in typical RBS style - 12 hours of conference calls before letting the techie do what they suggested at the very start).
My understanding is that most if not all of the batch team were let go and replaced with people from India and I do remember them complaining that they were having to pass 10-20+ years worth of mainframe knowledge on to people who'd never heard of a mainframe outside of a museum. The Indians were keen and willing to try and learn, but with out the years of previous experience will now be deep in the smelly stuff.
The onyl good thing is being the batch and over night processing that failed, all the data will still be in the sysetm awaiting processing so no one should find they money going missing as a result of tis incident.
I hope they will indeed be able to execute all transactions and make things balance.
But a few years ago I heard privately of a FTSE 250 company whose accounts got corrupted. When they turned to the backups, either they were already corrupt or they were screwed up during the "restore". So they lost everything, and had to write piteous letters to suppliers and customers.
I scanned the financial press for months afterwards, looking for a public report, but saw nothing. Full marks to the cover-up team, then!
If it was a software update to CA-7 and they corrupted (or otherwise lost) the various VSAM datasets which hold the schedule database, then I think that backing out and restoring should have been a fairly simple exercise, and the complete failure of an entire overnight batch run is something they would have noticed pretty quickly. Assuming that they are even slightly competent.
Well, if as the previous poster says, it takes about 12 hours of conference calls to get anything done, then I guess that they held over until the subsequent night's run to try to re-run everything. Of course, unless things are staged very carefully, they then have to process twice the transaction volume, and there may just be some hard limits on the feeder system dataset sizes which are now too small, or the batch runs now take too long, so the on-line daytime stuff cannot start. And undoing the problems which cascade from there is where you really, really want your experienced system people.
Which it seems as if they no longer have.
Based on my experience at a large bank that outsourced support to India, I'm sure the guys in there will work diligently to meet their contracted SLA.
Probably by sending an email requesting further information 15 seconds before going off shift and then not returning to the office for three days.
It is pretty much disastrous. In RBS world, there are many interconnected systems, some of which can maintain a view of an account for some time, but eventually all transactions need to be reconciled via the main overnight mainframe batch. If this is not done, the account info maintained by these satellite systems (ATM, card purchases, etc) will become stale, and increasingly risky from the bank's point of view. So the CA-7 failure seems entirely plausible. It leaving them in the shit for 4 days, however, is not a situation one would expect a competent mainframe site to find itself in. If this is a consequence of "off-shoring" support, then someone has made a very bad judgement on an essential component of the bank's ability to stay in business and heads need to roll over this.
"The source of the problem was a software update to Batch scheduling suite CA7. The upgrade when so well that now there is no schedule to run all of those thousands of batch jobs to receive and make BACS payments, update balance, schedule printouts, etc."
Yep. Had the same problem with CA batch scheduling software in the late 1990s.
It's all a bit hazy now, but there were ways for someone who had seen it before to recover from that situation.
Ah, in the time it took me to type the above paragraph, it has come back to me that one of my first tasks in a new job in 1998 was to write a bit of code to export then reimport CA scheduler jobs, just in case it all went titsup.
This is RBS, remember. ISTR them cutting their batch schedule from ~7 hours to 1 1/2 hours by the simple expedient of remembering that they no longer had IBM 2314's and changing sequential datasets from LRECL=80 BLKSIZE=800 to half-track blocking on 3390's. Don't get me started on METACOBOL.
I transferred a couple of grand from HSBC to my natwest account last night. It uses the faster payments system. It worked fine, money was there in minutes. Now I think all the people having issues are waiting for salaries/benefits etc, which I believe use the old BACS system. So my guess is RBS's BACS gateway has gone tits up. Unacceptable though
You're missing something. Your transfer may show up in whatever webby system which shows incoming transactions. Your real account balance is updated overnight in batch on a mainframe.
In other words, RBS/NatWest do not actually know what balance any of their customers actually have in their account, have not done so since this failure occurred, and will not know again until the problem is fixed. So good luck getting your money back out, since they do not know how much you actually have.
A Maximum Fail icon is required for this.
Christ, that was a bit harsh. I'm just using basic logic to hypothesise what the issue could possibly be. My natwest balance is completely accurate, and cash withdrawals/card payments are working fine. People who have been paid via BACS don't have an accurate balance on the other hand. Think the maximum fail might be with you son
The systems which tell you your account balance via ATMs and handle that side of cash withdrawal run on a "snapshot" balance which is eventually updated to maintain your "real" balance via batch systems on a mainframe. So they can maintain that view, for some time at least. This "snapshot" will not be updated with your salary, say, until that is processed into your "real" balance, and therefore your salary will not be available to you via ATM or card payments until this problem is fixed.
"Think the maximum fail might be with you son"
Unless you have worked on the systems in question (and I have), try to avoid comments like this.
Makes you appear to be a bit of a tool.
I've found that sometimes, you just need cash. Relying on banks (and me tbh) to not fuck up or me not losing my card or whatever other reason is going to catch you out eventually, so I keep a couple of hundred quid hidden around my flat. I appreciate that not everyone can afford to do this though..
It is kind of ironic, given that Natwest have wall-to-wall TV adverts declaring "a better way" and promoting features to allow you to get access to your money even when you lose your card.
You sir win word prize of the day! That word has been repeating in my head as I read the background and think of the number of IT mates struggling for work.
Maybe this will be the tipping point that happened with off-shored call-centres, a change of attitude towards it. One can hope now that it's actually hurt a corporate so obviously.
nK
"No customers will be permanently out of pocket as a result" - Natwest.
Personally, I wouldn't trust them as far as I could kick them. I highly doubt that they will compensate people for the time they wasted as a consequence of this mess, for starters. This is the same bank that promised that it wouldn't close the last bank in town, then closed Farsley Branch (which was the last bank in town - see http://www.bbc.co.uk/news/uk-england-leeds-17041251).
Right now, I only have one thing to say: "There is another way, you know".
NatWest used to have adverts making fun of the fact that some bank branches had been closed and turned into trendy wine bars.
Well ..... the Standing Order pub on Iron Gate, Derby is a Wetherspoons house -- not exactly a trendy wine bar, but ..... guess which bank it used to be a branch of?
You beat me to it! I'm not sure if they use in house/out sourced software. But it's much easier for them to actually start their "contingency plan" and get everything keyed in manually or on paper ledgers. No idea how many accounts they have, but there cannot be many millionaires left right now. :(
"The problem is that few people seem to realise the importance of good documentation until they don't have any."
Or in this case, with the major cull and outsourcing, the outgoing guys may have known exactly how important documentation would be to their cheaper replacements ... if there was any.
"Yes, yes, of course I'll be delighted to spend the remaining 37 hours of my contracted time documenting how Sanjeet can do my job perfectly while I'm in the dole queue. What's that? The spam box needs emptying? ... oh dear, seem to be out of time. Documentation: 'try turning the mainframe off and on again at the wall, a couple of times, quickly. if that doesn't work, pee on the socket.' Bye, enjoy the 'savings' from downsizing me..."
If someone out there knows what's gone wrong, I'd be putting another zero on there as a day rate, and expect a massive completion bonus of at least five figures on there too, seeing as you'd be saving RBSs global repuation.
You'd get paid what your worth, and after three or four days now, that rate has increased significantly...!
Steven R