During Sunday, US time, prominent Web services outfit CloudFlare sent an instruction to its routers in response to an attempted DoS, and instead took down its own network. In a rare example of detailed disclosure, the company has posted an explanation of what happened here. The network collapse occurred, the company explains …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Sunday 3rd March 2013 22:45 GMT Anonymous Coward

test?

rule/ACL/traffic munge not validated and tested on their test kit before being rolled onto the production network? for shame.

at that rate the admins will be having to fill in triplicate (or worse!) change request forms to appease the seniors...so wont be dynamic/agile in the future.

test/verify before deploying into the wild (and if you dont have a test/dev network then maybe after this event they will have....oh wait a minute..I see their ploy now! ;-) )

2 0
1. Monday 4th March 2013 08:40 GMT Bill the Sys Admin
  
  Re: test?
  
  Because it was to stop a DoS attack maybe they didn’t have time to go through full testing? Maybe they don’t have a full replica network?
  
  Who knows...I just assume that next time they probably wont be so quick on the trigger.
  
  1 0
  1. Monday 4th March 2013 10:01 GMT Lee D
    
    Re: test?
    
    Except this company sell a CDN product that is supposed to relieve stress on servers when they are under DoS and provide (and I quote) "Always Online™" and "Rock solid reliability" so that even if your server goes down, your visitors can still see your content.
    
    So it's a bit embarrassing to not test, to just roll out, and not have an adequate testing procedure (I mean, rolling it out to all your routers before you notice is a bit stupid, no matter what).
    
    And I can attest that at least one site I'm aware of was down for quite a long time despite the fact that it uses CloudFlare CDN to keep itself online "no matter what" and was returning all sorts of errors even though the underlying origin servers were up. Next time, their accountants will be telling them to test before they deploy, I think.
    
    2 0
  2. Monday 4th March 2013 10:15 GMT Matt Bryant
    
    Re: Bill Re: test?
    
    ".......maybe they didn’t have time to go through full testing?......" I've seen similar mistakes, usually they are a combination of management pressure - "fix that NOW" - and over-confidence in one's own ability. Many, many moons ago, there was a rumour of a ping of death for CISCO Catalyst routers (5000 models IIRC) and much argument amongst netties as to whether it would work or not. At company I was working for at the time, our network architect, having the authority to do as he pleased, was firmly in the "it-won't-work" camp and decided to test it against one of our routers, only to find not only did it work but it also propagated through all the same models in the network. Cue embarrassing and company-wide network outage which we definitely did not step up and explain to the customers!
    
    2 0
Sunday 3rd March 2013 23:13 GMT koolholio

According to the 'simplified' rule... its filtering outbound dns port packets of a certain abnormal packet length... Wait a second, outbound packets??? >_<

Just a thought? *shrug*

0 0
Sunday 3rd March 2013 23:29 GMT Chris Miller

I thought the (theoretical) maximum length of an IPv4 (and v6) packet was 65,535 bytes.

0 0
1. Monday 4th March 2013 00:01 GMT jcrb
  
  I think someone meant to do that.
  
  Chris, I'll make you a bet, the packets weren't really "between 99,971 and 99,985 bytes long", they just had header fields saying they were, they sort of say as much when they say no packet should have matched the rule because no packets were actually that long, and that range of lengths was picked because the attacker knew a rule blocking them would crash the routers badly.
  
  4 0
  1. Monday 4th March 2013 01:19 GMT Anonymous Coward
    
    Re: I think someone meant to do that.
    
    "that range of lengths was picked because the attacker knew a rule blocking them would crash the routers badly"
    
    Wasn't it nice of them to explain the attack vector to all 'bad guys' everywhere.
    
    0 1
    1. This post has been deleted by its author
    2. Monday 4th March 2013 03:22 GMT M Gale
      
      Re: I think someone meant to do that.
      
      Well, IPv4 uses 16 bits to store the packet length. Basically, 65,536 combinations or 0-65,535. IPv6 has the same limit unless the "Jumbo Packet" option is turned on, in which case the packet can be up to 4GB in size.*
      
      So basically, it was an IPv6 attack with the Jumbo Packet option turned on. Why routers will even process a ping that's a Jumbo Packet, I don't know.
      
      *Wikipedia is your friend. Even if it isn't an academic source.
      
      0 0
    3. Monday 4th March 2013 04:14 GMT Martin 71
      
      Re: I think someone meant to do that.
      
      Open disclosure. It means HOPEFULLY this will be patched thatmuch sooner.
      
      0 0
  2. This post has been deleted by its author
Monday 4th March 2013 00:49 GMT Anonymous Coward

Honesty is the best policy

Nice to see someone confess to a cockup.

Perhaps the routers were in effect subject to the same or similar flaw as that which was being "mitigated" against. I'd have thought a router OS would be pretty keen on bounds checking.

Cheers

Jon

2 0
Monday 4th March 2013 08:53 GMT Charlie Clark

Commendable

While it is certainly embarrassing for both CloudFlare and Juniper I agree with the article that the best way to handle this kind of SNAFU is to open about it. CDNs are, despite the marketing blurb, a very technical product and with preventing DoS attacks one of their key reasons for existing. You're dealing not only with customers but also other networks and possibly, depending on the size of an attack, with the IETF. While exploits like these that depend on discovering esoteric bugs can be developed silently, fixes need to be public and pushed out across networks as quickly as possible.

1 0
Monday 4th March 2013 14:47 GMT Jason Bloomberg

Blog server down

Either they've ceased to be as fully disclosing as they were or their servers cannot handle the load ...

http://blog.cloudflare.com/todays-outage-post-mortem-82515

0 0