Feeds

back to article How UK air traffic control system was caught asleep on the job

A big outage that struck Britain's air traffic control system on Saturday was due to a technical fault with a touch screen interface provided by Frequentis, The Register has learned. On Saturday 7 December, during the run-up to one of the busiest times of the year for the UK's airports, controllers at NATS (National Air Traffic …

COMMENTS

This topic is closed for new posts.

Page:

One million lines of code

That's an interesting defence. "Look mate, this system is huge. It cost loads and loads of money. It's so complicated that my head spins just thinking about it. So, when it fails I want it to fail big! No trivial little glitches that nobody even notices for me - oh no. Ask the banks, they understand. If you've paid for serious software then you want to see serious failures. I want my money's worth."

35
0
Silver badge
Facepalm

Re: One million lines of code

Frequentis: a division of CGI Federal?

You'd think, when updating software in something as big and important, as the touch screen interface on an air traffic control system, that the software company concerned might do a bit more than the normal "yup, looks OK to me and it's out the door" variety of testing...

...but, apparently, you'd be wrong.

10
4
Silver badge

Re: One million lines of code

They may well have used the same rigorous testing procedures as RBS.

6
0
Anonymous Coward

Re: One million lines of code

So, it's twice as good as this: http://dilbert.com/fast/2003-08-26/

1
0
FAIL

Re: One million lines of code

Reminds me of my time working for a large UK veterinary group - one of our patient/client management apps couldn't work out birth dates or ages correctly - you'd register a new pet and enter the birth date given by the customer and the software would fill in the approximate age field (or you could fill in the approximate age and it would work out the rough birth date).

Trouble was, you'd enter something like 'Poochy'..born 9-12-2005 and according to the software, the poor dog would be something like 402 years old!

After some email correspondence with the support team, their conclusion was 'date handling is complex'.

It was never fixed. We eventually kicked out the system for a myriad of reasons.

12
0
Bronze badge
FAIL

Re: One million lines of code

unless the code is in one big text file, that is a rubbish excuse.

Code quality or difficulty is not a simple linear metric....

P.

9
0
Silver badge

Re: One million lines of code

Thinking about it, I would much rather it failed big-time and was really obvious about it than having a subtle little bug somewhere that allowed aircraft to collide. I'd also like to add the feature that it only fails on days when I'm not due to fly in the following week.

5
1
Bronze badge

Re: One million lines of code... Sigh...

What "comPetent" "veteran" "software designer" doesn't vet date problems? Sounds like that "developer" did not want to force the users to put dates in specific date fields and use those fields in boilerplate reports and letters components.

I despise developers who allow users to enter any old random shit in any field the user chooses just "because data type entry enforcement slows us down" is what power-wielding users/buyers will sometimes bandy about.

(In the early 90's, I once temped at a famous "memory leak detector" software developer. I and 2 others had to trudge through MS Access to clean up data so that future sales types (replacing the ones who must've been binned, I sometimes wondered, or for future investors) who entered phone numbers in conversation fields, addresses partially in phone numberr fields, states in the city fied, and so on. It was horrendous, mind-boggling, and blood-pressure boiling, and I'd only been at home playing with Lotus Approach for under two years or so.

It was supposed to be a 1-week contract, or maybe it was 4 or 5 days, but after two days of that shit, and the ratty interface cobbled by some wannabe or unfortunate Access interace putter-togetherer, it seemed to me it would take 3 weeks or longer to comb through multiple thousands of records and make the data all-right and alright. I recommended Lotus Approach for this task, not to replace Access, but to from a non-programmatic, data-entry-clerk-on-limited-time basis. I convinced the in-charge developer I knew what I was doing and of what my limitations were, and what I'd done at home with Approach. He permitted me to install it and give him a quick run-through of what my plan was, and after a few minutes, green-lit it. We did the task in around 3.5 or 4 days total rather than the 2-weeks or so it was clearly becoming as we earlier kept uncovering more and more SHIT entered by uncaring, clueless, reckless sales/marketing people who obviously did not value a need to revisit and make sense of the data.

Similarly, I did this at 2 or 3 other South Bay offices but staffed with fewer than 20 people, NONE of whom wanted nor had time to do this "drudgery" type of work. Unfortunately for my ego, Approach was usually never to stay around and take hold. Ditto in the mid 2000s when I again was at (larger) firms with maybe 5,000 employees and 10s of thousands of records, one problem being sifting out related, duplicate, and possibly fraudulent multiple (dozens) entries of employees for pay enhancement purposes. At least, that is sometimes what seemed to keep popping up in my face as I kept relating sites to employees. I was not privvy to SSN information, so, I was not able to fully settle my suspicions. Still, in that industry, it would not be uncommon for employees to be related to each other by 2-4 people. Unfortunately, some relatives had very similar or identical middle or between names.)

1
4
Bronze badge

Re: One million lines of code

Did anyone say they did actually do an upgrade? They seem to be suggesting its worked fine for years and just stopped randomly, rather than suggesting an upgrade went wrong?

9
0
Silver badge

Re: One million lines of code

Or http://dilbert.com/strips/comic/1996-01-31/

Or more likely http://dilbert.com/strips/comic/1996-02-01/

1
0
Anonymous Coward

@ dssf: ...allow users to enter any old random shit in any field...

The best example (that I can think of) of the effect that a) bad design of input forms and b) indifferent and/or stupid data capturers can have, has to be the South African eNatis/AARTO system.

The bad design came when the address field (on the form that motorists have to complete) was put below the residential address field. Since many people have their mail delivered to their residential address, the residential address was completed fully, and the postal address filled in as "As above".

I swear, they must have taken people off the street whose only skill needed to be the ability to match up letters on the form with letters on the keyboard, and simultaneously be able to enter said characters in the corresponding field on the screen.

The upshot (I guess by now you know where this is heading...) was that more than sixty percent of traffic fines were (probably still are) mailed to "As above".

The system was supposed to have been implemented country-wide in 2009 or 2010 (can't be bothered to check), but is still seriously hobbled and running as a pilot project in Gauteng only.

For the record: whilst it was enacted in 1998, with the intention of full deployment in the early 2000's, it is still not operational.

See here: http://en.wikipedia.org/wiki/Administrative_Adjudication_of_Road_Traffic_Offences_Act,_1998

And on fines going astray: http://www.arrivealive.co.za/mobile/news.asp?NID=1766

http://ezinearticles.com/?Why-Most-AARTO-Traffic-Fines-Issued-Go-Astray&id=4996024

2
0
Anonymous Coward

"Ladies and gentlemen, please keep your seat belts fastened in case we have to engage in some violent manoeuvres to avoid on coming aircraft. There has been a little glitch in Air Traffic Control that I am assured will be fixed shortly. If any of you have concer..............."

"For fcuk sake! turn left! TURN LEFT!"

0
1
Anonymous Coward

Re: One million lines of code... Sigh...

I'm doing a CRM migration at the moment and I feel your pain! In the new system we have built there is a big red button that disables an advisor when they leave the company. That means any historic orders handled by that advisor still maintain the record of being handled by that user, but no new orders can be assigned to them.

However, that is too easy for the end users. Instead if Steve Smith leaves the company they will just edit his name to Steve DONOTUSELEFTTHECOMPANY. So when a customer logs in to look at their old orders they can see it has been handled by Steve DOTUSELEFTTHECOMPANY. And Steve DONOTUSELEFTTHECOMPANY gets assigned new orders too. And that is just the tip of the iceberg... *sigh*

2
0

Re: One million lines of code

I was a software tester on one of IBM's flagship mainframe products for 20 years. And I can tell you, anyone who knows a way of catching every glitch in complex software before it goes live (yes, even the ones that bring the customer's business-critical systems crashing to a very visible, embarrassing and expensive halt), that is simple and robust enough to be used in practice in widely diverse environments and by development teams using different approaches and tools - AND simple enough to be understood by non-technical management, so that they won't simply throw out what works two years later in favour of the latest "flavour of the month", in order to be perceived to be "managing" - REALLY needs to get themselves some good marketing and legal support, because they're in line to make a LOT of money.

2
0
Bronze badge

Re: One million lines of code

What we don't know is how many times the resilience HAS worked i.e. the primary system could have failed hundreds of times over the years ,with the resilience kicking in perfectly every time until now.

We just don't get to hear about those occasions as there's no impact.

3
0

Re: One million lines of code

RE: PassingStrange

One tip might be to use more than one sentence when writing the Functional Spec!

1
0
Bronze badge

Re: One million lines of code

My vote goes to

http://dilbert.com/strips/comic/1996-01-31

0
0
Bronze badge

Wrong way

AC: "Ladies and gentlemen, please keep your seat belts fastened in case we have to engage in some violent manoeuvres to avoid on coming aircraft. There has been a little glitch in Air Traffic Control that I am assured will be fixed shortly. If any of you have concer..............."

"For fcuk sake! turn left! TURN LEFT!"

If you were to ever find yourself in that situation you should turn RIGHT! As the other sod heading for you should also be doing. (Them's the laws of the sky and the agreement between all airmen) Unless ofcourse you find there is no other option, and pray to god the other guy doesn't turn right instead,

2
0
Bronze badge

Re: Wrong way

If you were to ever find yourself in that situation you should turn RIGHT! As the other sod heading for you should also be doing. (Them's the laws of the sky and the agreement between all airmen)

Sounds singularly UStanii.

Don't you / they / whoever, know that people naturally break left?*

It was always thus until the US forces made us change the direction of engine rotation in WW2.

Bloody stupid pondjumpers.

*If you don' believe me just look at the muddy tracks outside university entrances where our young lords and masters are being trained to lead us. Immature (usually males) always take a short cut to the left over once green and pleasant landscapes. (Some extremely stupid bright young things will do it too. (IKYN!))

0
2
Silver badge

Surely...

...we should be talking to Security Service and GCHQ about this?

After all, they have justified a sizable chunk of their budget by saying that they will now be the authority responsible for defending the UK's Critical Infrastructure. Which means looking after its Confidentiality, Integrity and AVAILABILITY.

They took the money - now's the time to ask them what they did with it.

And no getting off with "I'm afraid that's classified information..."...

8
1
Bronze badge
Coat

Re: Surely...

"Which means looking after its Confidentiality, Integrity and AVAILABILITY"

Or CIA for short....

My coats the one with the roll of tinfoil in the pocket for when I need extra layers on my hat.

2
1
Anonymous Coward

Re: Surely...

They took the money - now's the time to ask them what they did with it.

Ah well you see... underwater fibre channel taps are very expensive, not to mention the storage systems which we had to purchase to store copies of all your packets on, have you seen how much a couple of petabytes of enterprise storage costs these days? And trust me you really, really don't want to know about all the electricity we have to burn processing every word of every email, just in case someone wrote a dodgy word in one... then there's the staff costs... but I won't bore you with them, I'm sure you have no interest in how much clever paranoid fascists cost to run...

2
1
Mushroom

Odd.

There are systems in place to handle ATC over to to the military, and this has happened in the past.

Why not this time?

2
2

Re: Odd.

At a guess, because the military wouldnt even have 80% capacity of NATs? The assumption being that If the military is in charge, something more fundamental is wrong and flights to Ibiza are less of a priority and the capacity wouldnt be needed.

2
0

Re: Odd.

Whilst I'm sure they have the capability to do ATC. I'm sure they don't have the necessary capacity at the drop of a hat to cope with the same volume of traffic. The limitation may be personnel! I doubt Air travellers/taxpayers want to pay for 100+ people to sit around at the correct locations doing nothing 364 days a year. Then there is the number of extra workstations needed, they won't be cheap.

But they do need a alternate backup system in place for as many parts of the system as possible.

0
0
Bronze badge

Re: Odd.

Well, reading a few posts later than yours, The_H appears to have answered your question. it looks as if MIL ATC was also being handled at Swanwick from that day onwards which just utterly removes any fallback to MIL ATC in case of any future issues. Perhaps that decision might be coming up for a rethink!

2
1
Anonymous Coward

Re: Why Not The Military?

ATC the military way: a fighter flies alongside with the pilot pointing down.

0
1

Re: Odd.

As someone who used to work for NATS I can't say very much, but what I can say is that we provide the infrastructure and radar services for the millitary air controllers these days, so they had the same problems we did.

1
0
Anonymous Coward

Scary

This is making me wonder about just how safe air travel is.

As far as I understood it air traffic control was meant to be able to fall back to operating completely manually, with bits of card for each aircraft and the like... if that's not true WTF are they going to do in the event of a major technology failure... as opposed to just not being able to log some staff members into their system, which is what this amounts to.

0
13

Re: Scary

I think that this is a different system to the one they can operate by cards. The one they can operate by cards being the tracking, prioritising, and routing of planes on the ground and in the air. The system that crashed being the one that enables them to tell the other traffic control areas that they've got a plane entering their airspace.

0
0

Re: Scary

It's probably no more scary than to sit at home waiting for the rest of the crap IT systems in this country to fall apart in a heap.

2
0
Anonymous Coward

Re: Scary

The system that crashed being the one that enables them to tell the other traffic control areas that they've got a plane entering their airspace.

So when they're operating on cards, they can't handover aircraft to overlapping control areas... jesus fuck... excuse my French, but that's not very reassuring.

0
5
Silver badge

Re: Scary

The thing about the fall-back card stytem and military control is that they are for emergencies. All the systems are fail-safe in that they do control airspace safely. It's just that the military and card scribbling/pushing can't handle anything like the normal volume of traffic required. Also, nowadays especially, large and rich companies lose a lot of money if civilain ATC falls over, hence the political pressure being brought to bear.

4
0
Silver badge
Facepalm

Re: Scary

Do you really think they could cope with 80% of capacity with paper cards?

YES they can fall back, but really, 80% is pretty impressive with a faulty system...

10
0

Re: Scary

The fall-back system to flight strips (the "cards") will probably also fall back to manually looking up and dialling the controller you have to hand the flight over to when it runs off the end of your RADAR screen. i.e. instead of a 20% reduction in capacity it's probably closer to 50% because of the added workload.

The manual dial system is what worked for decades before all these fancy computers came in and cocked everything up.

2
1
Hoe

Re: Scary

But surely we still need an enquiry about it?

Just think David my mate Lord Billy Milksalot can chair it for just £500,000 of the Tax payers money (plus my 50k referral fee of course).

Thanks

I.M.Athief (MP).

0
1
Bronze badge
Facepalm

The system that crashed ..

'One of the key changes involves improving the warning messages that flash on the air traffic controllers' screens when an aircraft moves out of their area of control and responsibility. The aim is for a warning to flash on the display to remind the controllers to ensure that they have completed all their co-ordination checks before an aircraft leaves their screen and becomes the responsibility of others.

"There is a quirk over whether it flashes or not," says Chisholm. "We want it to work in 100% of cases".

It is important to fix this problem because the Swanwick system, unlike the current manual process, supports the automated transfer of aircraft from one air space sector to another.

Currently at the London Air Traffic Control Centre, when controllers relinquish responsibility for an aircraft, they confirm this by phoning the appropriate new controller. This will not happen under the new automated procedures at Swanwick.' link

0
0
Anonymous Coward

Re: The system that crashed ..

The aim is for a warning to flash on the display to remind the controllers to ensure that they have completed all their co-ordination checks before an aircraft leaves their screen and becomes the responsibility of others.

"There is a quirk over whether it flashes or not," says Chisholm. "We want it to work in 100% of cases".

So one minute after flight 666 has declared an in flight emergency, and requested an emergency landing. The air traffic controller is going to lose sight of the aircraft he is marshalling out of 666's way, as his screen is filled with flashing messages...

Maybe I'll take the train.

1
0

This post has been deleted by its author

Re: The system that crashed ..

That aricle is from when the centre was being built 10 years ago, and the system envisaged there was implemented successfully some years ago.

0
0

Is there more to the story than this?

PPRUNE (a pilots' forum) carries this interesting NOTAM (Notice to Airmen)

AT 0100 ON 07 DEC 2013 SCOTTISH AIR TRAFFIC CONTROL CENTRE MILITARY(SCOTTISH MIL)(SCATCC(MIL)) WILL TRANSITION TO SWANWICK AND ASSUME THE TITLE LONDON AIR TRAFFFIC CONTROL CENTRE MILITARY (LONDON MIL)(LATCC(MIL)) NORTH, THERE WILL BE NO CHANGES TO SERVICE PROVISION ARRANGEMENTS OR INITIAL CONTACT FREQ. ALL LATCC(MIL) SECTORS WILL ASSUME THE VOICE CALL SIGN 'SWANWICK MILITARY'. A SINGLE UK FLIGHT PLAN ADDRESS, EGZYOATT WILL BE USED FOR ALL OPERATIONAL AIR TRAFFIC FLIGHT PLANS.

In other words - this "phone system failure" just happened to coincide with the day that all military air traffic control transferred to Swanwick. Hmmm.

15
0

This post has been deleted by its author

Anonymous Coward

99.9998% Availability.....

One 14 hour failure in 11 years, that's what 99.9998% availability ?

What's the betting Frequentis won't even have to pay any SLA money... ;-)

3
0
Anonymous Coward

Re: 99.9998% Availability.....

If they get picky about it they delivered approximately 90% of the required capability for the 14 hours of faults - which works out at 84 minutes of lost capacity-time in 11 years. As epic failures of critical systems go that's not at all bad.

0
0
Silver badge

Re: 99.9998% Availability.....

Length of failure time is less important that how badly everything can screw up during the failure.

2
0
Bronze badge

Re: 99.9998% Availability.....

With mission critical systems the backpu solution isn't supposed to be a copy of the primary system. It's supposed to be developed separately, updated separately,managed separately. This way a bug in the primary system isn't replicated to the backup.

There is a difference between the primary system availability and a Disaster Recovery system. DR is used when the primary systemand it's resiliance completely fails. The problem here is the DR system and processes didn't work. That system would not be provided by Frequensys. Probably it's more owned by NATS themselves.(the original card shuffle systems..)

0
0
Anonymous Coward

Re: 99.9998% Availability.....

"With mission critical systems the backpu solution isn't supposed to be a copy of the primary system. It's supposed to be developed separately, updated separately,managed separately. This way a bug in the primary system isn't replicated to the backup."

I've heard the theory.

Who's seen any recent real examples ?

Or is "It's too expensive to do dissimilar redundancy, we'll just do one and test it properly" followed by "We can't test it properly, it's too expensive, just ship it" the universal refrain in recent decades?

0
0
Anonymous Coward

Let (s)he who has failproof software cast the first stone.

For all the griping and complaining can someone point me to any piece of software that has worked without a major cock-up for 12 years ? (Ok I admit the solitaire game bundled with windows seems rather resilient :-))

I've seen military C3I systems crash less than gracefully in the middle of live operations, I've seen signalling software that control rail traffic lock up at rush hour and the list goes on with telcos, power grids, chemical factories and nuclear plants.....

Software will crash - it's a fact of life - and I think the guys at NATS did a rather good mitigation job handling 90% of the workload in a crippled situation (and I'm sure they do have procedures to handle things with pins on a paper chart and analogue telephone lines should the whole IT infrastructure go kaput)

To answer a previous comment - when a mission critical system crashes you actually prefer it to crash in a big and obvious way and "get your moneys worth"... There would be nothing worse or more dangerous than an "elusive minor bug" ! Imagine a bug that t would randomly omit to show some flights on an ATC controller's screen - now that is a scary scenario...

9
0
Anonymous Coward

Re: Let (s)he who has failproof software cast the first stone.

> (Ok I admit the solitaire game bundled with windows seems rather resilient :-))

If it is so resilient then why is it on version 5.1?

3
0
Silver badge

Re: Let (s)he who has failproof software cast the first stone.

If it is so resilient then why is it on version 5.1?

Several answers to that - one is that it's resilient now because the bugs have been fixed, another is that it's been introducing features (rather than bugs) with each new release.

0
0

Page:

This topic is closed for new posts.