back to article US states join watchdog probing CenturyLink's Xmas data center outage that screwed 911 system

Wyoming is the latest US state to formally probe CenturyLink's network outage, which black-holed 911 calls over Christmas. America's comms watchdog the FCC, and regulators in Washington state, are also investigating the blunder – asking exactly how it happened, and why it took so long to resolve – along with Wyoming's Public …

  1. eswan

    Somebody plugged a 4-mbit card into a 16-mbit ring?

    ... turned off spanning tree protocol?

    1. Dimmer

      Doubt it, but ....

      If it was a spanning tree issue, they might not have root guard on the core. I have seen a customers kit take the root away from an isp / farm supply provider.

      I don’t work for these guys but I have seen them pull all nighters just like the rest of us. I would like to see the equipment vendor in the hot seat as well. When you pay that kind of money for networking kit, “odd, we have never seen it do that before” just doesn’t cut it.

      Time for the window, cattle prod and roll of carpet.

      1. Marty McFly
        Facepalm

        Re: Doubt it, but ....

        Vendor: You installed the kit, correct?

        Tech: Yes.

        Vendor: It was new in the package, correct?

        Tech: Yes.

        Vendor: So you opened the package, correct?

        Tech: Yes.

        Vendor: Your honor we move for immediate dismissal of the court case. The box clearly states that by opening package that the customer agrees to the license agreement. The license agreement clearly states we are not liable for any cock-ups caused by the kit beyond the value of its original purchase cost.

  2. swschrad

    of course, the FCC shuts down tomorrow...

    thanks to a little "mine is bigger" argument between two branches of government, some 15 agencies of the US government are in the process of closing due to no budget. the FCC is basically shutting down everything except oversight of life safety and the sale of spectrum.

    1. Anonymous Coward
      Anonymous Coward

      Re: of course, the FCC shuts down tomorrow...

      Wil, they shut off that Ejit Pai?

  3. Anonymous Coward
    Anonymous Coward

    Hmm

    Normally you wonder if this was a BGP route reflector doing stupid crap. These days it's just as likely to be an MPLS route reflector too. While it's been a long time since I had enable on what was then Global Crossing, they did suffer some MPLS traffic engineering issues late 2000 caused by Cisco's inability to support more than 10000 LSP's per router (as I remember it!). Perhaps RSVP was being signalled bad data and caused some links to overflow?

  4. Anonymous Coward
    Anonymous Coward

    Huawei?

    Q: Was the card made by Huawei?

    A: No. They need their cards to work reliably ready to assist the Great Takeover(tm), when it comes.

    Okay, that's a joke, but an anecdote: years ago, a colleague told me he did some work using a 4-transputer card (that dates it) to generate Ethernet frames by driving the line voltages directly. He said it was remarkably easy to upset (i.e. completely bugger) most of the network cards available at the time simply by sending malformed Ethernet. This incident suggests that not much has changed since.

  5. Fred Goldstein

    Most 911 centers do not use the Internet -- that would be ridiculously foolish. The failure in this case was of the optical layer. You'd think that would be localized. But a nice leaked outage report in Telecom Digest gives some better clues. They were losing optical connections all over the place. What can do that? My suspicion is GMPLS, which applies Internet routing techniques to optics. A bad card sent out bad GMPLS packets and the other devices didn't discard it as they should have. Hilarity ensued. The vendor is not named... you might want to google around though to see who sells to CLQ.

    1. emdeedee

      Telecoms Digest

      http://telecom2018.csail.mit.edu/archives/back.issues/recent.single.issues/latest-issue.html

    2. Wzrd1

      The failure in this case was of the optical layer.

      Had something very similar with a bank's home office. The. Entire. Network. Was. Down. Hard.

      Flooded with traffic. Set a sniffer, went out and enjoyed a smoke, came back and read jibberish on a copper network. But, I did get enough fragments to have the MAC and traced it to one cheap, off brand NIC. It was mangling packets in just the proper way to be broadcast to each and every switch.

      Padded the time, of course, since they should have had packet examination switched on in the core, which would've only dropped a small segment, if that.

      So, what we really have is, yet another case of "critical services" having a single point of failure.

      Because, public safety is number one...

      Indicated via a raised third digit.

  6. Anonymous Coward
    Anonymous Coward

    yeah, Telstra in Australia have used that as an excuse recently too... Funny how a cheap network card becomes a single point of failure, bringing down their entire network....

    Even more strange that it occurred after massive redundancies, and off-shoring support to cheap 3rd world countries. Totally unrelated to any management decisions of course....

  7. a_yank_lurker Silver badge

    Redundancy

    Have they ever heard of redundant systems?

    1. Rupert Fiennes Bronze badge

      Re: Redundancy

      They are almost certainly redundant, but this is no guarantee of failover if the issue isn't a link down. The worst network problems are usually intermittent issues that cause repeated failovers and churn in the routing process

      1. Wzrd1

        Re: Redundancy

        The worst network problems are usually intermittent issues that cause repeated failovers and churn in the routing process

        Which is when the management software marks that carrier/route as deficient and switches to the properly functioning, reliable carrier, triggers an alert to be ignored by management.

        But, I'd have digitally signed e-mails notifying management of single point of failure mode now, due to the primary carrier becoming unreliable. Something a plaintiff would find fascinating in any discovery case. And an absolute defense, as I'd re-warn management and the superior in the case of non-response. And preserve a "shut the hell up" response, proving a lack of due care and due diligence.

        Giving me quite a legal shield, while the highest levels of management have their pantsless sacks hovering millimeters from the fan blade.

    2. A.P. Veening

      Re: Redundancy

      "Have they ever heard of redundant systems?"

      Nope, only of redundant pay check receivers.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019