Reply to post: State of IT

BA's 'global IT system failure' was due to 'power surge'

Jaded Geek

State of IT

Its interesting reading the various comments on here, including the links to the issues that Capita had, also lets not forget RBS/Natwest major batch failure of several years ago.

I've worked in IT for nearly 30 years and have seen the gradual dumbing down of IT, some in part due to technology changes, others due to offshoring.

When I first started you had to understand the systems, I was lucky enough to work on the early Unix servers, where there was no such thing as google, you either sat down with a manual and learnt something or you figured it out. It was also a time where if you upgraded the hardware (new cards/drives) etc, you had to know what you were doing as more often as not the cards had dip switches that needed setting and in some cases you had to mess around with kernels to get stuff to work and don't even get me started on RS232 communications and hardware handshaking.

As computing became more intuitive as things became more plug and play as computer programming became easier, people needed to understand less about how computer systems work.

I've worked for several large companies that have huge offshore offices and teams, both directly employed by them or through various consultancies. The reason for this is obviously its cheaper, considerably cheaper, however the average skill level is not what I would personally expect. This isn't a criticism as such, more of an observation and the issue boils down to experience. I kept being told by my employers that the people were the brightest graduates etc, but how does that help with a banking system that was developed in the 70s? Did all those graduates have the experience when RBS's batch failed?

I'm not saying that some of the onshore people would be any better, but you also have to factor in the culture thing, as someone pointed out the offshore people tend to just do as they are told and not question something, even if its wrong. There are plenty of examples where I have seen entire test systems trashed because someone was told to push data in, which they did, but there were errors in the data or system problems, when this was pointed out to them they said they knew, but they were just told to push data in.

There have been a couple of comments in the thread about using simple Java type apps, which would have possibly made things easier to recover, but if BA is anything like the large Banks, they will have legacy systems bolted into all sorts and a real rats nest of systems as they evolved over the decades, its just the way it is. If you then offshore this (or even onshore it via a consultancy) all the knowledge of how that stuff connects gets lost, also the simple things like what order should a system be started. I worked through a major power failure of a data centre and we lost almost every system due to diverse power not actually being that diverse when push came to shove. However the real issue was the recovery process, hundreds of people on a call trying to get their system started first, but not understanding the dependency of what they need. Someone actually wanted the outlook servers starting up first, so we could communicate via email, forget the fact that all the remote gateways we needed to access said email was down!

We still don't know what the actual cause of the outage was (BAs), but I do wonder if the reason for the extended outage was because people didn't have a document that said "open when sh*t happens" as you can't document everything and at the end of the day there is no substitute for experience, regardless of the location of that person or team.

If the extended outage is a result of the offshoring it would be interesting to understand how much the outage will cost BA in compensation versus what they saved with the deal.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon