Two be Sure To be be sure
Big O rah.
An air traffic control fault that brought Dublin airport to its knees last week has been traced to an intermittently flakey network card. Sadly, while the problem was simple enough to diagnose, it’ll be weeks before the airport’s air traffic control system will be able to run at full capacity. The system went for a little lie …
"an intermittent malfunctioning network card which consequently overcame the built-in system redundancy"
Erm ... what sort of redundancy is it that can be "consequently overcome" by the intermittent malfunctioning of a SINGLE network card ?
It strikes me that a little MORE redundancy might be required here ... preferably starting with whoever it was that designed the present supposed redundancy.
A new bunch of hosts was being turned up on a client's network and nothing was routing anywhere for anyone.
Turned out to be a uniquely flawed batch of <major anonymous large vendor's> NICs that had been burned with the same MAC address and shipped together for the install.
We were the vendor of the switching and routing gear and it was nice to discover it was not our fault. Good times.
Kudos to the tech that figured out where the "flakey" NIC was located. ROTM?
Couldn't they just unplug the damned Ethernet offender, and peace is restored? If it didn't happen, then something else is amiss.
Star topology (should, at least) guarantee that, once a faulty equipment is unplugged, the network should work just fine.
Unplugging a (faulty) network card never caused me problems since I got rid of my 10-base-2 Ethernet coaxial cables, some 10-15 years ago.
I assume they are not telling the whole story.
Mine is the one with an Etherkiller® in the inner pocket.
@Luiz. Yeah. Except since it's Air Traffic Control, they're going to have to investigate, reinvestigate, and re-re-investigate what "caused" the card to go bad, probably decide how to redesign it so it won't be a problem again, then find out they spent all the cash on the above so they can't actually implement the suggested changes 8-)
And, I don't think it was the Air Traffic Control, but I remember reading just a year or two ago about ANOTHER airport that was knocked offline by a faulty NIC -- I think it was the flight scheduling? OK, based on the article, it was because of the security theatre systems breaking down.. Oh, here's a link courtesy of google: "Because of one faulty NIC, 17,000 stranded at LAX" (from August 2007): http://www.boingboing.net/2007/08/15/because-of-one-fault.html
I once had a NIC go bad and take a whole network down. It was on a switch but that didn't isolate it. All the other computer's went off the network and nothing worked unless I shut everything down and powered back up. I had to power each computer off and check each one out until I found the problem. It was also intermittent at first.
A totally redundant link on all the computers could have still allowed operation, but it's also possible the bad link might have shut the other one down too. Off the shelf network gear might not solve this but one might need to design custom hardware to get past it. I suspect the industrial redundant ring switches like the ones from Sixnet would have still failed in this case.
Paris because she knows all about networking of videos.
When the dodgy NIC is in mission-critical hardware, "just unplugging" it isn't a good option.
@ the redundancy wankers:
You don't seem to consider the expense or difficulty of retrofitting redundancy into an existing system. Having a backup for every device is unmaintainable if they don't have identical configurations, and that can be difficult to accomplish if the original kit is a decade or so old. In this case you might just add a second NIC to every device. But running both cables along the same route doesn't add redundancy if there's physical damage, so they have to take completely separate paths. You'd have to rip open the walls and ceiling to do that, and it makes godawful spaghetti of the wiring schematics.
Investigate? In Ireland? Are you crazy? I mean seriously the investigation will take place around the table at the local pub.
"O'Leary I think this investigation needs another round."
Not to mention the ATC ladies and gents.
"Umm ATC, this is US Airlines Flight 197 could you repeat that last command, the words were heavily slurred."
Trust me, I'm Irish and I drink like a fish :)
Hehe though airport and Ireland in the same article reminds me of the Family Guy episode where the jet lands on a runway full of beer bottles :)
/mines the one with the bottles in the pocket and the caps for buttons.
They're using it for flight critical stuff too, not just for in flight entertainment or stock control or whatever. They're not calling it Ethernet (they're calling it CDN or ARINC664), and PCworld aren't stocking the relevant NICs or switches, but bear it in mind next time you book a flight on a trendy modern Boeing or Airbus aircraft.
Obviously it's designed properly and tested properly so that this kind of fault can't happen.
Which is of course what Thales originally said too.
Exactly right, retrofitting redundancy to a safety critical system is a nightmare, the spec will likely have been set in stone years ago and even changing a single component for one with a later revision number can cause months if not years of work to revalidate and certify the installation.
Of course, nobody ever gets the numbers right for spares holding or some smart arse decides it needs to work for an extra ten years..
Ran into this problem recently with some old kit used in the nuclear industry, swapping in an alternative part isn't an option and god forbid you try to change the configuration from the specs because the system might not perform to the manual in all conditions, conceivably resulting in the uncontrolled emission of radioactive material.
Paris, she knows all about uncontrolled emissions...
> They're using it for flight critical stuff too (they're calling it CDN or ARINC664)
03 July 2008
Thales UK welcomes contract signature for new aircraft carriers
The Ministry of Defence (MoD) has today signed the contract for the manufacturing phase of the carrier programme.
http://www.thalesgroup.com/
Coupled with the order for the aircraft to play on these stations going to USA...
Should be interesting if we follow them into Iran.
I can just see the headlines from the boarding parties:
"All your inspection personnel are belong us."
surely a custom mobo with 4 NICs (2 to connect to the service equipment and two to the contollers' network) would be the key here.
Expensive yes but you cant put a price on human life (unless your ryanair of course when that figure is precisely 0 after they've charged your credit/debit card)
Mine's the one with the 4 RJ-45 tipped cables sticking out of the pockets.
We once had a University subnet regularly going down, but only on every second winter morning. Turned out a part-time administrator had pushed their PC tower so far over on their desk that it was overhanging a radiator, which cooked the NIC (located in the lowest PCI slot). At a certain temperature it turned distinctly hostile...
(We also had the classic 'industrial vacuum cleaner in the backbone switch power socket every night' issue, but that's another story)