User topics

Article topics

Log in Sign up

Network failure closed hospitals to ambulance admissions

University College London hospitals trust (UCLH) has launched an investigation after a network glitch led to the closure of A&E to blue light traffic. The problem also led to cancellations of operations. The trust was last month forced to halt a number of services, including the cancellation of 50 per cent of its operations, …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Wednesday 30th March 2011 09:26 GMT phil mcracken

difficult to track down?

Spanning tree strikes again...

1 1
1. Wednesday 30th March 2011 10:03 GMT Ian McNee
  
  Indeed: Statseeker or JFFNMS anyone?
  
  Perhaps the highly paid external "IT experts" from Logica (oh...quelle surprise!) were too busy examining the insides of their eyelids to check their e-mail.
  
  I mean these "expert" private sector consultants wouldn't have installed such a network without the most basic of SNMP monitoring tools. Shirley!
  
  2 0
  1. Wednesday 30th March 2011 11:25 GMT Blitterbug
    
    @Ian McNee
    
    Yes. As an Ex-Logica (Logician? Ech...) guy, that sounds about right. And any failures are jotted down in a 'lessons learned' document, which gets lost - er, archived - in the black hole. I mean Logica network.
    
    4 0
2. Wednesday 30th March 2011 14:11 GMT prathlev
  
  @spanning tree problems
  
  Oh please... the spanning tree protocol is there to _help_. Many people think it's a bad thing, but it's not. Many people configure their network wrong and blame spanning-tree for the ensuing problems.
  
  You could be right about this being a spanning-tree related problem, i.e. a problem created or exacerbated by wrong STP configuration, but STP is not at fault. The problem lies with the (highly paid?) incompetent "consultants" that design and operate the network.
  
  It sure does sound like they need a redesign: A network with a single point of failure, where said SPoF can disable the whole network and where fault isolation takes several hours. Amateurs! :-)
  
  1 0
  1. Wednesday 30th March 2011 14:48 GMT phil mcracken
    
    that was my point...
    
    ... configure spanning tree correctly and it's great. Configure it incorrectly (or not at all) and heaven help you when it goes wrong.
    
    Paris, because she has a CCNA.
    
    1 0
Wednesday 30th March 2011 09:32 GMT John Robson

IF?

"A full investigation into the network design and components is being undertaken to verify if there are any design issues to be addressed."

There is clearly a nasty single point of failure here. I am going to stick my neck out and suggest that it isn't the only switch which could have gone pop (as they had to systematically close off the network).

Is this stuff not monitored?

4 0
Wednesday 30th March 2011 09:39 GMT Sir Runcible Spoon

Sir

Who the hell designed the network resilience? Christ, the NHS pay through the nose for consultancy fees, yet they still get crap service.

5 0
1. Wednesday 30th March 2011 10:02 GMT Peter Jones 2
  
  Having worked in the NHS...
  
  it probably wasn't a consultant who designed the network. In fact, the network was likely never designed at all. A room had computers put in, and so cables were run to it. As a switch fills up, another one is added on to expand capacity. If anyone mentions redundancy, it probably goes something like this:
  
  "We should probably get a second switch for resilience."
  
  "What do you mean?"
  
  "Well if this switch breaks..."
  
  <irate>"Why would it break? Have you recommended the wrong thing? We have paid enough for it, why would it fail?"
  
  <techie mentally weighs the likelihood of this ending well> "...no, it's fine. Forget it"
  
  Management is obsessed with avoiding blame. If they hear that there is any risk *at all* in doing something, they simply won't do it. If it will cost money, they won't do it for fear of blame over the budgets.
  
  Again, IT is seen as a cost-centre, rather than a system that enables people to do their jobs. So everything is done cheap, crap and quiet.
  
  12 0
  1. Wednesday 30th March 2011 14:11 GMT prathlev
    
    @Peter Jones 2
    
    You might very well be right about the current operator (Logica) probably not being directly responsible for the initial sorry state of the network. But _anyone_ assuming responsibility for a network _has_ to observe some kind of "due diligence". If Logica (or whoever) is willing to take the money they implicitely also accept taking the blame.
    
    1 0
Wednesday 30th March 2011 10:02 GMT Jonathan White

Strange use of English..

Call me a fool but I wouldn't classify having to cancel all operations and divert ambulances to other (possibly farther away) hospitals as 'business as usual'. Their 'business continuity plan' patently wasn't anything of the sort. Perish the possibility someone in management will take responsibility though, some lowly network techy will walk the plank but nothing else will change.

As for the NHS and consultancy, just wait till Dishy David and Curious George get through privatising it...

Jon

7 0
1. Wednesday 30th March 2011 10:20 GMT Sam Liddicott
  
  business as usual
  
  "Call me a fool but I wouldn't classify having to cancel all operations and divert ambulances to other (possibly farther away) hospitals as 'business as usual'."
  
  [fake innocence of an MBA]
  
  What do you mean? The management were still there carrying on with their usual business. There may have been less patients to bother about, but nothing that stopped any of the normal business of the managers. The same thing can happen with bad weather.
  
  6 0
  1. Wednesday 30th March 2011 11:07 GMT John G Imrie
    
    business as usual
    
    Even better, with fewer emergencies disrupting the smooth operation of the Hospital performance targets will be easier to hit and Management Bonuses secured.
    
    3 0
  2. Wednesday 30th March 2011 15:34 GMT Jimbo 6
    
    Nothing that stopped any of the normal business of the managers
    
    Unless they couldn't summon a porter to carry the golf clubs to the Bentley
    
    </cynicism>
    
    1 0
Wednesday 30th March 2011 11:25 GMT Anonymous Coward

Someone made redundant when they outsourced to Logica?

Someone may have left something unpleasant in the stored configuration which only came into effect when the device was rebooted. Spanning tree or VTP settings perhaps.

0 0
1. Wednesday 30th March 2011 14:19 GMT prathlev
  
  @AC 11:25
  
  Well ffs... the people taking over the responsibility for the network (and taking the money) are obligated to look for problems. I only know Cisco and cannot speak wisely about other vendors, but everything is in the configuration. You can't "hide" stuff as such.
  
  1 0
Wednesday 30th March 2011 13:35 GMT Anonymous Coward

24 hours to fix?

Pretty quick for Logica I guess.

Maybe they will think more carefully about which bits are critical systems in future, and not outsource everything to a bunch of monkeys.

1 0
Wednesday 30th March 2011 14:49 GMT despairing citizen

Excellence in NHS IT Delivery and Operation

This is obviously why all NHS IT jobs state that you must have previous NHS experience.

Lord forbid that they risk actually getting somebody on staff with a track record of successful delivery and operation of mission critical system in geographically dispersed locations, vendor management, testing, etc.

How many NHS medical facilities do you think we have that have two seperate landlines at oposite ends of the building from dfferent suppliers? (*)

(*) as in something better than having all the staff at a hosiptal locked out because THE (as in single) authentication server was offline, and wouldn't be fixed until monday morning (this hospital has a minor injuries unit, plus a neurological ward)

1 0
Wednesday 30th March 2011 16:55 GMT John Smith 19

24 hours to fix a network.

What would it take to make the repair process *that* slow?

1) No up to date network (logical) map linked to a physical location map. So not sure how the data gets from A to B.

2) No remote monitoring of critical network devices, so someone has to go out there and *look* at a front panel (possibly reporting back what they see to someone in a network admin office). Not a good idea when hospitals tend to be big and hardware tends to be stuffed in locked cupboards blocked by heavy equipment or the odd dying patient on a trolley.

3) No on site spares to do the replacement.

4) A network vulnerable to *multiple* single point failures (which *might* have been picked up if 1 had existed and a competent person had looked at it) so you know the *whole* networks down but you don't know how.

5) Written authority required from some senior management type who *absolutely* must sign off any drastic action (although they won't understand what it does even if *explained* to them) who is naturally out of contact, probably at a conference on improving network reliability.

I'm not a network admin so I'm sure you guys who do this for your day job (one or two of whom I'll bet work in the NHS) can find plenty more ways to turn what I would think should be no more than a 1 hour task (That includes getting the replacement box to where it has to be and swapping the plugs) into a *minimum* of a 10 hour job (when they say normal ambulance service resumed)

1 0

This topic is closed for new posts.

The Register Biting the hand that feeds IT

About Us

Our Websites

Your Privacy

Situation Publishing

Copyright. All rights reserved © 1998–2024