Re: Kessel Run?
13 hours and change. In my defence, I was asleep for most of it...
I contributed to the massive DDoS attack against Spamhaus. What flowed through my network wasn't huge - it averaged 500Kbit/sec – but it contributed. This occurred because I made a simple configuration error when setting up a DNS server; it's fixed now, so let's do an autopsy. The problem I should start off by apologizing to …
"The keen eye will notice two other flaws in my server design. The first is that BIND isn't chrooted. This is because the spywaredomains.zones file from malwaredomains isn't really designed with RedHat-based operating distros in mind. If you were to chroot bind you'd have to post-process the zone file to cope with the path differences."
The paths are relative to the chroot, so say you chroot in /var/named, you could just copy the blockeddomain.hosts file to /var/named/etc/namedb/blockeddomain.hosts. No post-processing needed. Shared virtual hosting and fail2ban have nothing to do with this - chrooting BIND is there to make it harder to exploit bugs in BIND.
"The second is that DNSSEC isn't enabled."
One of the reasons why this is such an efficient traffic amplifier is that a DNSSEC signed zone can have a much larger response to a (small) query than an unsigned zone. Not saying that it isn't useful, but DNSSEC does require care, really wants modern software probably with rate limits (which are non-default build options / patches in most current implementations), and keeping track of development. There are exciting new opportunities to break your DNS with it too, of course.
Since there were a few comment suggesting djbdns for inexperienced admins - oh $DEITY no, please....
The particular implementation of BIND + chroot utterly refused to look in the chroot directory for /etc/namedb, no matter how much tinkering I tried. I gave up eventually and left it. As for the shared virtual hosting and fail2ban comment, that is there because most of the "bugs in BIND" we might care about are exploits that work if you have manged to gain a remote console.
SSH on an alternate port + fail2ban + not actually giving the information to anyone and having a very small user footprint means your chances of getting into the system to exploit BIND in that fashion are hella slim. There is always the remote possibility that you could use some sort of remote attack against BIND like that, but the chances are even smaller. In terms of the risk posed, I think I can get away with not chrooting the thing for the 2-3 moths between initial roll out of the service and the replacement of the unit with a CentOS6 box.
At least on CentOS6 the bloody chroot works right and the malwaredomains zone works without post-processing the text file. I should also point out that the DNSSEC implementation set up in CentOS6 is actually pretty good.
I almost stopped reading there, but you lost me at honeypot. As far as I know, a honeypot is a place where you profile/catch attackers. Wikipedia says:
"In computer terminology, a honeypot is a trap set to detect, deflect, or in some manner counteract attempts at unauthorized use of information systems. Generally it consists of a computer, data, or a network site that appears to be part of a network, but is actually isolated and monitored, and which seems to contain information or a resource of value to attackers."
I don't think honeypot is the correct term to use when setting up a machine for the security of your own users. I like your article Trevor, regardless of your DNS "sin" - yes, it really is that bad. However, as an tech author, you should really use common place terminology. Since I'm not in the UK, I could be wrong, and that is how IT professionals talk over there.
Yes. A honeypot is indeed where you profile and catch attackers. Why are you hitting the honeypot machine if you aren't clicking on stupid things or are an attacker? They honeypot allows me to catch not only attackers but stupid users. I would say that "redirecting a user to a honeypot machine that displays an error or educational message when they try visiting a site on the list, then logs the thing so I can find and LART someone" counts as a honeypot.
As for edge scrubber, the system also does IDS and DPS. It scrubs my datastream. It leaves on the edge of my network. What the hell would you call it?
If it's a ship and it goes through the gate, you call it a gateship. You only call it a puddle jumper if you need something that sounds good on TV. It's an edge device, it scrubs my datatream. Should I call it a boysenberry?
That is what WROK PALCE does to prevent damagement from viewing pr0n sites.
Boy, did some male execs scream when the female CEO got an email listing their attempt in real time to access a prohibited site. She often paid a visit to their office - unannounced!!!!
IT got called some really nasty names; but, what the hell, we were only following ORDERS!!!!!
If someone asks your server "where is www.google.com" a whole bunch of times then your server starts flooding google.com's DNS servers.
Not if your server is set up to cache recursive results for a period of time (which I believe is the default.) More likely attackers are asking for:
www1.google.com
www2.google.com
www3.google.com
.....
which would result in multiple lookups even if caching is enabled.
Feel free to correct me if I'm wrong.
Nope, you are 100% correct. If you are attacking properly that is exactly how you do it. (Actually, it is is the DNS for www.google.com you want to take down you attack with 1.www.google.com and 2.www.google.com etc.) That said, I was a little out in the weeds on describing the attack as is, and the sysadmin blogs are supposed to be 600 words. Had to leave out some details somewhere. :)
*interjects* You would need a method of applying an address answer limit... but then surely this could also be covered by:
http://tools.ietf.org/html/rfc2827 or http://tools.ietf.org/html/bcp38
it says primarily about forged packets, I assume that would be dns spoofing or even related to cache poisoning? Is there a difference between the two?
Network ingress filtering requires you be "part" of the wider internet, rather than merely the equivalent of a consumer with a fat pipe. We don't have access to BGP. We have no way of seeing, processing or acting upon the internet's wider routing table. Without this, the sort of ingress filtering duscussed in those documents simply isn't possible.
So what's left? Whitlisting systems manually that you want to connect to your DNS in iptables? How's that work when some of those units are mobile? Users with dynamic residential IPs, connecting from hotels or even over mobile links? What we really need is a DNS server and client infrastructure that allows for authentication of clients before they can look things up. DNS + TLS if you will. It might be time to start building something internally similar to opendns' infrastructure. I'll give it a thought.
Or, depending upon your network setup... you could implement the use of a router/switches iptables/netfilter (provided it has --match --hexstring and --algo filters) by matching the request for the recursive flag set on usually UDP packets inbound at a certain offset. I believe iptables/netfilter is included within most linux and unix distros. Zeroshell (a linux based router distro) may even allow you to enter raw commands to utilise this.
Wireshark is useful for finding what offset and the dns query flags--- which is the hex string you wish to filter for... you may also apply a rate limiter using the same patterns, but with the rate respectively.
Not quite right -
----snip----
If someone asks your server "where is www.google.com" a whole bunch of times then your server starts flooding google.com's DNS servers.
----snip----
The correct statement is:
If someone asks your server "where is www.google.com" a whole bunch of times while spoofing a source address of [one of spamhaus's external IPs] then your server starts flooding spamhaus's external IP address with large DNS replies. Local caching means nothing.
Then spamhaus blacklists your IP address
Then all of your email firewall's requests to spamhaus start being blocked
Then you can't evaluate incoming email traffic against spamhaus' database
Then you start letting spam in.
THEN since this is a DDOS attack from many improperly configured DNS servers, spamhaus' servers go offline.
This is a DNS amplification attack because small amounts of DNS specific traffic from one group of attackers to a single DNS server results in large amounts of traffic to the victim.
This post has been deleted by its author
so are you saying that TTL, expiry and any cache including an EDNS0 cache timeout are redundant and are of no effect in relation to caches and if that is the case... caches may aswell not exist...
If that is the case, I also think a cached response shouldn't have its own flag assigned to it?
To be honest, I wouldn't lose sleep over accidentally running a DDoS on Spamhaus. Everyone in the industry has been frustrated by them at some point in the past, and frankly, they are pretty much getting what they deserved.
There are plenty of other organisations who provide the same service but with less attitude.
Whenever I set up a new network/DNS zone, one of the first things I do is to configure the external version of the zone as MASTER on the edge DNS server (similar to your scrubber). However, my ACLs prevent external access from the Internet to DNS except by my ISP's DNS servers. I then configure (or request configuration - if the ISP is still in the dark ages) the zone on the ISP DNS servers as SLAVE zones with matching SLAVE entries on my MASTER. The domain's ICANN registered servers are then configured as the ISP's DNS servers. This serves several purposes:
1) All external DNS requests go the ISP's "properly configured", high throughput DNS servers
2) If my edge server needs to go down for maintenance it doesn't take external DNS offline.
3) The network admin maintains operational control of the domain and can do all the updates locally on the edge server
4) The edge DNS server's IP address is never published as a DNS server for the domain
5) The edge DNS server only handles zone transfers/updates to the ISP's DNS servers while maintaining its MASTER status.
6) Edge devices on the local network can do local-external and recursive lookups on the ISP's DNS servers while internal devices use internal DNS servers (especially when using private addressing).
I ALWAYS use a completely separate set of internal DNS servers and MASTER/SLAVE zones for internal authoritative access and recursive lookups - which also gives me the ability to blacklist bad domains there.
The "fix" is easy.
in general options you set "allow query {localnets;}" (and any networks you think should be allowed to make general recursive queries)
Then in each zone file you add "allow query {any;}"
Porblem solved. You won't send answers for domains you're not authoritative for, except to explicitly defined networks
It's not fucking rocket science, it's not hard and above all, it was what I was recommending 15 years ago to keep leeches off of DNS servers. You don't need DNSSEC or any of the other bullshit to reduce the nuisance factor of an open DNS server.
Additional hacks to rate limit responses have been published. These and DNSSEC help a bit, but not as much as the simple (in most cases 1 line) config change above.
And from Preventia:
Whilst I appreciate this is a known problem, it seems to be an area of increasing risk as the attacks only in the last few days there was the largest DDos attack to date. This 300 Gbp/s attack is making ‘the internet’ shake.The internet will fail before Prolexic would (we are a virtual second internet for our customers). With 800 Gbp/s attack bandwidth available (and in the process of tripling that), Prolexic are the ONLY service/solution/product in the world that could handle an attack of this scale and duration.
Actual human being alerted to suspicious behavior
Auto throttling of bandwidth cutting in even before a human response.
I think if they had the answer would be "quite a lot better than what actually happened."
Hopefully this will have given various sysadmins a wake up call to review their configurations and tighten up their procedures ( Unless the proverbial PHB puts their foot down and insists it cannot be changed because it would inconvenience the CEO)
This presumes some of them even realized they were involved of course.
Yes they are. Wide open is the default setting for Bind. Even DJB and MS wwere wide open last time I looked.
It's the same mentality which STILL defines any DNS entries in zonefiles with zero padding as octal, despite the RFC explicitly stating that IPv4 addresses are dotted decimals. I got royally flamed when I pointed that particular "issue" out 18 years ago and asked that the RFC or the software be altered for consistency (given they were written by the same person, it didn't seem to be an unreasonable request). Not long after that, spammers started using dotted and long hex/binary/octal/decimal URLs in spam (It took filter authors to nail that down. Bind is still open to that abuse)
"Yes they are. Wide open is the default setting for Bind. Even DJB and MS wwere wide open last time I looked"
I was thinking the the human alerting of exceptional behavior and the auto throttling until the cause was investigated.
That part of his configuration.
You can capture just dns requests from a dns server itself using a capture filter, such as this one:
"<CONNECTIONTYPE> host <GATEWAYMAC> and src net <LOCALNET/CIDR> or not src net <LOCALNET/CIDR> and port 53" (optionally omitting "and udp" and changing the port if configured differently)
of course you can specify destinations respectively, if you're doing this further upstream by using:
host <IP> or net <IPRANGE/CIDR> or mask <netmask> if its over multiple subnets
Which will capture all requests and responses to and from... Heres where it gets difficult:
You would just need to apply filters to this, using pattern matching for distinguishing characteristics but there may be need for utilising comparisons within the filters.
Last year I decided to switch my DNS pointers from the hosting service I have (GoDaddy) to my own. Alas, I forgot that while ns2 had the "recursion disabled by default" setting, ns1 *didn't*.
2 weeks later, I check out my bandwidth usage and notice that it's waaay off chart. Monkeying with iptables, I was able to pinpoint the extra traffic to port 53. Firing up tcpdump gave me a zillion DNS requests for some weird domain, which itself was pointing to CloudFlare as well. Ouch! I outright blocked port 53, sending it to DROP. I even switched the DNS order ... but I did notice that ns2 didn't get the zillion requests. So I went on checking and finally found out about both the open recursion configuration, and the default config switch. Even after securing my ns1 BIND, I still had to leave port 53 blocked on my main DNS 'till the request flood died out. It cost me a lot in bandwidth that month, but lesson learned...
It's not rocket science, I described the correct configuration for AusCERT back in 1999 in response to DDoS we were seeing then. (Modify the "bogon" list for the newer "end of IPv4, so let's use every Class A possible" list of bogon networks.) See AL-1999.004 at http://www.auscert.org.au/render.html?it=80