back to article Router crash downs CloudFlare services

During Sunday, US time, prominent Web services outfit CloudFlare sent an instruction to its routers in response to an attempted DoS, and instead took down its own network. In a rare example of detailed disclosure, the company has posted an explanation of what happened here. The network collapse occurred, the company explains …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    test?

    rule/ACL/traffic munge not validated and tested on their test kit before being rolled onto the production network? for shame.

    at that rate the admins will be having to fill in triplicate (or worse!) change request forms to appease the seniors...so wont be dynamic/agile in the future.

    test/verify before deploying into the wild (and if you dont have a test/dev network then maybe after this event they will have....oh wait a minute..I see their ploy now! ;-) )

    1. Bill the Sys Admin
      Facepalm

      Re: test?

      Because it was to stop a DoS attack maybe they didn’t have time to go through full testing? Maybe they don’t have a full replica network?

      Who knows...I just assume that next time they probably wont be so quick on the trigger.

      1. Lee D Silver badge

        Re: test?

        Except this company sell a CDN product that is supposed to relieve stress on servers when they are under DoS and provide (and I quote) "Always Online™" and "Rock solid reliability" so that even if your server goes down, your visitors can still see your content.

        So it's a bit embarrassing to not test, to just roll out, and not have an adequate testing procedure (I mean, rolling it out to all your routers before you notice is a bit stupid, no matter what).

        And I can attest that at least one site I'm aware of was down for quite a long time despite the fact that it uses CloudFlare CDN to keep itself online "no matter what" and was returning all sorts of errors even though the underlying origin servers were up. Next time, their accountants will be telling them to test before they deploy, I think.

      2. Matt Bryant Silver badge
        Happy

        Re: Bill Re: test?

        ".......maybe they didn’t have time to go through full testing?......" I've seen similar mistakes, usually they are a combination of management pressure - "fix that NOW" - and over-confidence in one's own ability. Many, many moons ago, there was a rumour of a ping of death for CISCO Catalyst routers (5000 models IIRC) and much argument amongst netties as to whether it would work or not. At company I was working for at the time, our network architect, having the authority to do as he pleased, was firmly in the "it-won't-work" camp and decided to test it against one of our routers, only to find not only did it work but it also propagated through all the same models in the network. Cue embarrassing and company-wide network outage which we definitely did not step up and explain to the customers!

  2. koolholio
    WTF?

    According to the 'simplified' rule... its filtering outbound dns port packets of a certain abnormal packet length... Wait a second, outbound packets??? >_<

    Just a thought? *shrug*

  3. Chris Miller

    I thought the (theoretical) maximum length of an IPv4 (and v6) packet was 65,535 bytes.

    1. jcrb
      Boffin

      I think someone meant to do that.

      Chris, I'll make you a bet, the packets weren't really "between 99,971 and 99,985 bytes long", they just had header fields saying they were, they sort of say as much when they say no packet should have matched the rule because no packets were actually that long, and that range of lengths was picked because the attacker knew a rule blocking them would crash the routers badly.

      1. Anonymous Coward
        Anonymous Coward

        Re: I think someone meant to do that.

        "that range of lengths was picked because the attacker knew a rule blocking them would crash the routers badly"

        Wasn't it nice of them to explain the attack vector to all 'bad guys' everywhere.

        1. This post has been deleted by its author

        2. M Gale

          Re: I think someone meant to do that.

          Well, IPv4 uses 16 bits to store the packet length. Basically, 65,536 combinations or 0-65,535. IPv6 has the same limit unless the "Jumbo Packet" option is turned on, in which case the packet can be up to 4GB in size.*

          So basically, it was an IPv6 attack with the Jumbo Packet option turned on. Why routers will even process a ping that's a Jumbo Packet, I don't know.

          *Wikipedia is your friend. Even if it isn't an academic source.

        3. Martin 71 Silver badge

          Re: I think someone meant to do that.

          Open disclosure. It means HOPEFULLY this will be patched thatmuch sooner.

      2. This post has been deleted by its author

  4. Anonymous Coward
    Anonymous Coward

    Honesty is the best policy

    Nice to see someone confess to a cockup.

    Perhaps the routers were in effect subject to the same or similar flaw as that which was being "mitigated" against. I'd have thought a router OS would be pretty keen on bounds checking.

    Cheers

    Jon

  5. Charlie Clark Silver badge
    Thumb Up

    Commendable

    While it is certainly embarrassing for both CloudFlare and Juniper I agree with the article that the best way to handle this kind of SNAFU is to open about it. CDNs are, despite the marketing blurb, a very technical product and with preventing DoS attacks one of their key reasons for existing. You're dealing not only with customers but also other networks and possibly, depending on the size of an attack, with the IETF. While exploits like these that depend on discovering esoteric bugs can be developed silently, fixes need to be public and pushed out across networks as quickly as possible.

  6. Jason Bloomberg Silver badge
    FAIL

    Blog server down

    Either they've ceased to be as fully disclosing as they were or their servers cannot handle the load ...

    http://blog.cloudflare.com/todays-outage-post-mortem-82515

This topic is closed for new posts.

Other stories you might like