415 posts • joined 15 Oct 2008
Cleversafe? You must be joking. They lost all credibility when they disappeared their earlier open source implementation from their website. And even if they hadn't the dispersed data storage only works when you have an incredibly fast interconnect between the nodes which demolishes most use cases they were touting for it.
For real-world use GlusterFS is a far more sensible solution.
"I guess IBM is talking about a similar thing to ZFS where it it only rebuilds used blocks on the disc."
While that works when the FS is mostly empty or contains only large files in large blocks, rebuilding a mostly full vdev full of small files can take much longer than rebuilding a traditional RAID because you go from linear read/write to largely random read/write (150MB/s linear speed vs 120 IOPS which could be 480KB/s on 4KB blocks). Then again, if your RAID is under load during the rebuild you are going to end up in the random IOPS limit anyway.
You do know Nvidia released the source for their Tegra GPU drivers, right? And they had done so months ago.
Re: Add another layer of indirection...
"Not possible if you are already running under a hypervisor..."
I guess you haven't noticed that several hypervisors have been shipping for years with support for nested virtualization.
Re: This way the virt did fly!
"Our testing showed only about a 9% maximum performance drop versus native tin"
On what RDBMS/OS/hypervisor? Throughput of carefully tuned MySQL on Linux under ESXi at full saturation (100% CPU usage with at least twice as many concurrent active threads as there are CPU cores, hot pre-primed caches, read-only) drops by about 40% when you run virtualized on the same hardware with full hardware virtualization support. KVM is a little worse, Xen is a little less bad, but the difference is in the low single-figure % points.
Did your comparison involve pushing the server to the limit with a highly concurrent load or were you just measuring latency under low load?
Re: Real world experience "does not count"
I have no idea what Hitachi's support is like (never needed to use it), but Seagate's RMA process is very smooth and problem free. I can vouch for it as I have exercised it very extensively.
Re: Curse of the bad model
Unfortunately it's not about a single unreliable model - it's about multiple consecutive unreliable generations. Backblaze figures actually excluded some Seagate models because the failure rate was so ridiculously high that nobody would have taken the results seriously.
The other problem is that most manufacturers are having bad models and even lines increasingly frequently. WD green drives with their aggressive spin-down leading to vastly premature spindle motor and bearing failures are such an example, not to mention firmware bugs that lead to gems such as pending sectors that are either not found by extended self-tests (WD, Seagate) or are located at sector numbers that exceed the LBA limit without HPA enabled (Samsung).
Does that mean the drive encrypts everything by default and just throws away the key to "erase" the data?
Re: ARM needs standards
Funny you should say that. A standard is exactly what has been established recetnly. At around the same time AMD announced their "maybe available to developers in the second half of this year" Opteron A series, there was also an announcement of a standard way (broadly similar to BIOS/EFI on x86) for all the hardware on ARM to be presented to the software layer (e.g. memory maps, I/O ranges, etc.).
Re: The other way around?
That is pretty much the size of it. I wasted a number of days getting various ATI cards to work fully before I eventually caved in and bought a Quadro 2000 for testing. As if by magic, everything started to "just work". Modifying Nvidia cards isn't too difficult if you just want an Nvidia card that works in a VM.
Re: Data storage for shared systems
Indeed, ZFS is very much the way forward.
Just FYI - you can to PCI passthrough on ESXi as well, and many people have successfully gotten it to work with modified Nvidia cards.
This, however, is probably not a particularly suitable solution because you would need another machine (e.g. a laptop) to run the VM management tools from, whereas if you run KVM or Xen the management can be done from the local machine.
Re: The other way around?
I'm not sure what features KVM has and supports, I use Xen because it is far more mature and performs considerably better.
Recently the Xen guys have been working on adding an additional reset method - bus reset. This may or may not make it into the Xen 4.5 release, I'm pretty sure it isn't going to be in the upcoming 4.4 release, so you are looking at at least 6-12 months before the feature is in the release branch and available pre-packaged for your distro. That is a long time to be hanging on for something that might but is not proven to solve the problem. The Nvidia solution works perfectly now.
It is also not the only issue I have had with Radeons - there are many others. For example, the XP drivers have utterly broken and you cannot glue together multiple monitors into a spanning resolution above 3200x1600, which is completely useless when I need to stitch together two 1920x2400 stripes to get my IBM T221 to work properly. There are other issues as well that I am not going to go into now since they are off topic, but suffice to say that Nvidia suffers from none of those problems.
I would strongly advise you to stick with proven hardware and software. Anything that is bleeding edge and unproven is going to put you at a very high risk of running into bugs and regressions in hardware, firmware and software. This is another reason why I strongly recommend you get one of the Citrix approved workstations for VGA passthrough. In terms of software, something like EL6 + Xen RPMs from the source I mentioned is a good, stable, proven choice, and since you aren't going to get RH support for anything involving Xen you might as well go with CentOS or Scientific Linux, or see if you can get a vaguely reasonably support package on XenServer (based on CentOS).
Re: It's nice to dream...
Actually, you can pass through the GPU better than the CPU. Unlike with the CPU which you virtualize, you can pass the GPU device completely to a VM. Inside the VM, the GPU driver loads as it would if it was running on bare metal and provided you have a GPU for which the drivers work virtualized - ATI technically works but is too broken to be usable in a serious way, Nvidia Quadro cards work without any issues, as do GeForce cards modified into corresponding Quadros, but unmodified GeForce cards don't work because the driver has a whitelist of PCI device IDs which it will allow to boot up virtualized.
Trust me on this - I am typing this on a triple-seat virtualized rig, of which 2 seats are for heavy-duty gaming (with modified GTX780Tis) , and the 3rd is my Linux workstation.
Dual Booting Same Instance of Windows native and Virtualized
This generally doesn't work particularly well. All the underlying drivers will be different, and it is akin to replacing the motherboard with a completely different one - Windows doesn't handle this gracefully at all, and you will more often than not find that instead of greeting you with the login screen it will greet you with a BSOD when it finds that the paravirualized SCSI controller it was expecting to be finding it's C: partition on doesn't exist on bare metal.
Maybe he just doesn't have the space (he already said he has no desk space for a 2nd keyboard). Seriously creating a setup like this is not difficult if you have hardware that isn't buggy. I have a 12-core (24-thread) physical machine, with two VMs given 4 cores (8 threads) each, and I can still run big software builds in Linux dom0 while having two L4D2 or Borderlands 2 gaming sessions on the go on the same physical machine.
Re: The other way around?
There are not one but two options for passing through USB devices in Xen. You can use PCI passthrough to pass the USB controller through, of you can use USB passthrough to pass a specific device through. The former is usually a little more efficient, but the latter is more flexible (e.g. if multipler ports are on the same USB hub and you need to pass different USB devices on the same PCI USB controller to different VMs. For example, I have 2 VMs with a mouse/keyboard passed to each one via PCI passthrough, and it works extremely well.
Re: Apologies in advance since this is going to be long
Forgot to mention - RH only support KVM, so you are out of luck support-wise with Xen. KVM support for PCI passthrough is nowhere nearly as mature as Xen's, so your chances of success with KVM may be diminished. If you really want support with Xen, you can probably get something from Citrix for XenServer (which recently went free / open source, and the most recent version is based on CentOS 6, i.e. EL6).
Forget BTRFS - it doesn't matter whose distro you use, it is not going to make a turd into a diamond. If you want a similar FS that works, use ZFS (look at the ZoL port). I wouldn't touch BTRFS with a barge pole. If you use ZFS, you can put your VM's disk block device on a ZVOL and get a lot of advantages such as performance overhead free snapshots. Again, this is the setup that I use on my similar system.
Finally - you would probably be a lot better off asking a question like this on the Xen users mailing list rather than here.
Apologies in advance since this is going to be long
I have been running a setup like this for the past year or so. It is pretty easy if you get the right hardware. It is frustrating to the extreme if you have buggy hardware. When it works, it works fantastically well.
The setup I use is triple-seat. EVGA SR-2 with 96GB of RAM dual 6-core Xeons, 3 GPUs, 3 monitors, 3 mice, 3 keyboards. EVGA SR-2 was a terrible choice - it uses Nvidia NF200 PCIe bridges which have broken IOMMU support. I had to write a patch for Xen to work around it, but now it works like a dream. If you are not averse of spending more on hardware I would strongly advise you to buy one of the (bare-bones) HP or Dell machines certified by Citrix for VGA passthrough use and build it up from there. Having a reasonably bug-free motherboard is essential if you want it to "just work".
I use EL6, with the Xen and kernel rpm packages from http://xen.crc.id.au/.
If you get a machine with on-board graphics, use that for your host (dom0). Once you have configured it all you can just not have a console plugged into it. Alternatively invest into whatever cheap GPU you can get your hands on for dom0 - it won't matter much what you use. My advice would be to get something like an Nvidia GeForce 8400GS since it is passively cooled.
For your domUs don't even bother with ATI - they work right up to the point where you need to reboot domU, and then the whole host will need rebooting.
Go with Nvidia. A Quadro 2000, 5000, 6000, K2000, K5000 or K6000 work beautifully, but if you are aiming for something more than a Quadro 2000 they are ridiculously expensive. Instead you can modify a GeForce card. My advice would be to pick one of the following three, according to your performance requirements:
GTX480 and modify it into a Quadro 6000. This requires a minor firmware patch, no hardware modification required. Details on how to modify it are here.
GTX680 and modify it into a Tesla K10. This requires a simple hardware modification, removing one resistor off the back of the PCB. Make sure you get a non-Gainward model (they exhibit a weird limitation in what video modes they can display in domU - my modified Gainward 680 and 690 only work in SL-DVI modes. My MSI 680 works fine with DL-DVI mode).
GTX780Ti and modify it into a Quadro K6000. This requires a hardware modification, adding one resistor across specific two pins on the EEPROM. This mod is easy to reverse, but requires taking off the heatsink which on most models means voiding the warranty.
For details on how to carry out the hardware modifications on the Kepler series cards (680, 780) see the thread on the forum here: http://www.eevblog.com/forum/chat/hacking-nvidia-cards-into-their-professional-counterparts/
Whether the Nvidia card will work in a domU is purely down to the whitelist hard-coded into the driver, specifying which device IDs to initialize if the driver detects that it is running on a virtualized machine. The modifications described above simply modify the device ID, which makes the driver agree to initialize the card in domU.
Other than that, my setup is exactly like what you describe - pass specific PCI devices (USB, GPU, audio) to domU and it should all just work. With Nvidia GPUs you can reboot the domU as many times as you like and it will work fine. The only thing you will not get is the VM's BIOS POST and loading splash screen, but as soon as it goes into the GUI, you will get output on the monitor and it will work fine from there. As I said I run a triple setup, with two modified 780Tis for two virtual gaming rigs and they work beautifully even at 4K resolutions. The 3rd console is dom0 Linux.
If you are running Linux on Linux you might as well just use VServer (or LXC or OpenVZ). No need for full virtualization sucking away the performance.
I'd just like to point out there are several ARM boards available with SATA ports, including the recent SheevaPlug, GuruPlug and DreamPlug, TonidoPlug2, CompuLab SBC-A510, Cornfed i.MXQ board, Higher end SolidRun CuBox-is, TrimSlice (note: it's SATA port is actually running via a built in USB->SATA adapter because CompuLab couldn't get PCIe SATA working properly for some reason on Tegra2) just off the top of my head.
On the point of how long it takes - I'm pretty sure it would take me substantially less time to set up a DreamPlug with Linux from scratch to act as a DHCP, DNS and NTP server that it would take to install Windows Server and configure the services on an even remotely similarly specced Atom machine. The process for the DreamPlug would be:
1) Extract rootfs to disk (one-liner)
2) Copy modules/firmware to rootfs and the kernel/initrd to bootfs (2 lines)
3) Configure uboot to tell it where to get the kernel from and tell the kernel where the rootfs is (2 lines)
4) Reboot into new OS
5) yum install dhcpd bind bind-utils ntp ntpdate
6) dhcpd would need 3-4 lines in the config to tell it IP ranges DNS and gateway to serve plus any specifics for more fancy things, e.g. assigning specific IPs to specific MACs. NTP should need no extra configuration, and Bind comes configured by default to act as a caching name server.
All of that would take little more time than it did to type this message.
Re: High performance ARM
How are you meaningfully comparing revenue in a meaningful way between Oracle and a FOSS package?
Also how are you comparing the "volume" between MS SQL and a freely downloadable FOSS database? If we are comparing the number of downloads of MySQL (and don't forget to include the distro ISOs rather than just the deb/rpm/tgz as well as summing the totals for MySQL, Percona and MariaDB) with the number of copies of MS SQL shipped/downloaded from MS I suspect that the "volume" of MySQL deployments is some orders of magnitude bigger than MS SQL.
If you are going to compare statistics then make sure those statistics are meaningful in the context of all the items in the comparison.
This was discussed to some extent back when soft-float/hard-float issue on arm was being considered. In the end, the conclusion was that rather than coming up with a dynamic linker bodge that could handle both and rolling fat binaries that contained the payload for both, this taking up double the memory (or requiring awkward strapping that would purge the non-executable payload from memory after mmap()-ing it) it was more sensible to have a clean break. And you know what? This isn't such a big deal because the vast majority of software people run on ARM is FOSS.
The question is really why would you want a single hugely bloated executable that runs on multiple platforms? What problem does it solve that isn't much better solved by just having different binaries? You have to compile it once for each platform before gluing them all together anyway, so what does it actually gain you that shipping 4 different binaries (or a zip file, or shell archive containing all 4) wouldn't do just as well?
Re: "SUSE / Redhat for Linux stuff"
Aarch64 has not been in any official Fedora build thus far. Last time I checked there were only preview developer builds because there were still a few important packages that were broken on aarch64 due to package and toolchain bugs.
GCC took quite a while (a year or two) after ARMv7 hardware was easily available before the hard-float support was working sufficiently well to be suitable for full distro building. It would not be surprised if ARMv8 support at toolchain level takes a similar length of time to become sufficiently stable after the easy general availability of hardware..
Re: High performance ARM
It's funny you should say that Vendor only supports "enterprise class" DBs, and then you mention Oracle - when Oracle now owns MySQL. Oracle DBs don't even feature in the top 10% of biggest databases by size or throughput I have seen in the field, every one of those has been MySQL. Oracle is down there in the noise with PostgreSQL. I'm not saying that PostgreSQL isn't as good a database - far from it; I'm merely saying it doesn't seem to be as popular.
I'll let Lew answer the exact cost part (I had no idea he was reading the reg forums :)). I had already mentioned that seeing a product without a price tag was offputting.
Thankfully, for once the lack of a pricetag does not equal to a lack of a product.
Re: High performance ARM
Have you heard of ARM's big.LITTLE achitecture? It gives you 4 Cortex A15 CPU cores and 4 Cortex A7 CPU cores. The A7 cores are a little less powerful, but a LOT less power hungry, so depending on the load the system is on, it will use one or the other set of cores (or all 8 cores on the recent Exynos CPU systems such as the Arndale board). So ARM already provides this kind of functionality on a single slab of silicon.
As for your apps - if they are based on closed, proprietary technologies you are indeed screwed, but that was always going to be the case, starting with the cheque you write to the vendor for the licences every year.
Re: Database file compatibility
You can already do this with both MySQL and PostgreSQL. The on-disk formats and wire protocols are exactly the same, completely compatible, and endiannes independent. I do this sort of thing all the time.
I spoke to them this week and my motherboard is in the post. Definitely not vaporware, unlike the AMD offering. Don't get me wrong - what AMD announced looks _great_. I would love a powerful 8-core ARM (as in, more powerful than Arndale) with 128GB of RAM (mock build everything on tmpfs!) to replace my build farm of Sheev/Guru/Dream Plugs. Unfortunately, I cannot wait for another 6 months for that to become available - I need to get my EL7 ARM build going at a decent pace _now_.
Both Debian/Ubuntu and Fedora/EL based distributions now have their fully featured ARM variants, so for a typical Linux shop running a LAMP or similar stack there really is nothing to port - it has all already been done. I have worked with several clients whose LAMP(ish) stack projects migrated without any problems at all on the ARM variant of the same stack.
The biggest obstacle to addoption, IMO, has been the lack of good, standards compliant ARM motherboards - and by this I mainly mean *TX form factor. Random-slab-of-PCB form factor for the vast majority of ARM machines is a huge pain in the backside, as is a frequent lack of SATA ports for any more serious use. AMD's recent noise about the A1100 Opterons is welcome but the announcement was vastly premature - I spoke to them earlier this week and was told that the earliest these might be availble to developers is 2nd half of this year, which is not what you would expect after the announcement fanfare; call me old fashioned but when they talk about something the way they do I expect to be able to get my hands on one within a week or two at most.
Having said all that, there does seem to be one ARM offering that is head and shoulders above the rest in terms of standards compliance (*TX form factor, SATA), specification (4-core Freescale A9, 4GB of RAM) and cost (~£220 for the motherboard which includes the CPU, RAM and PSU, cheaper than comparable x86 Atom based offerings). To put the price into perspective, this is only about 2x the cost of a SheevaPlug, but has about 7x the CPU performance and 8x the RAM, plus the huge added *TX convenience. Best of all - you can buy one _today_. Google cornfed servers and you should be able to find it.
Re: VMware CEO ...
"VMware CEO Pat Gelsinger last year opined that x86 would still the be the data centre's CPU of choice even if ARM silicon consumed no electricity whatsoever."
The key point to me is that this is coming from the CEO of a company that doesn't have a port of any of their products for ARM. Of course he would say that - he doesn't want to have to invest a fortune into porting their product.
Xen, OTOH, already has an ARM port, and the thinner virtualization methods sush as VServer already run on ARM.
Re: My experience
My experience is similar, with nearly 300% failure rate during the warranty period on Seagates.
WD and Samsung exhibit more worrying "features", though, such as seemingly either lying about their reallocated sectors or reusing them; both possibilities are bad. (Observed by the pending sector counts disappearing on overwrite but reallocated sector counts not increasing from 0.)
Hitachis being least unreliable of the lot also tallies up with my own experience, although that is over low hundreds of drives rather than many thousands as per the study in the article.
Monopoly on Jobs
No pun intended in the title, but this sounds essentially like a monopoly on jobs. Has it really come to this? Wow...
"So the bet was that compilers are better placed to decide scheduling between multiple units in advance than a traditional superscalar scheduler is as the program runs. The job of organising superscalar dispatches has become quite complicated as CPUs have gained extra execution pipes so why not do it in advance?"
The problem is that most compilers are really crap, and even when they aren't most developers aren't educated enough in how to leverage them to tage advantage of them.
See here for an exampler of just how much difference a decent compiler makes:
The similar problem was faced back in the day of Pentium 4, which had a reputation of being a very poor performer, when in fact, with a decent compiler it outperformed the Pentium 3 by a considerabe margin.
Or to put it differently - it wasn't a crap processor, it was the software developers that were too incompetent to use it properly.
Server + SAN in one = DAS + LVM?
Or at least that is what it sounds like to me - consolidating a SAN and a VM host. Maybe this is a "new paradigm" for the virtualization hype types, but having read the article twice, the only thing that is "new" is that the disks are pre-configured (handy for those only up to the task of accessing them by clicking on pretty icons in the GUI).
Is this seriously intended to be seen as fundamentally different to running a Xen/KVM VM host full of disks, running FreeNAS in dom0?
Re: Wrong thread???
@Bronek - I guess you didn't pick up on the fact that I even quoted the fragments of the article I was specifically referring to.
If the whole point of RedHat is a support contract, then how come that CentOS, Scientific Linux and RedSleeve have a thriving community? Last time I checked Facebook uses CentOS, rather than RedHat, as do many, many other very large companies with thousands of servers.
People run RH clones for reasons or familiarity and stability (as in no chasing of continuously teleporting goal posts), and many companies want to do things that aren't supported by RH and/or have good enough sysadmins in-house that they don't need vendor support.
"Canonical is all right, but where is Red Hat? We were too early."
Not RH per se, but a 3rd party build based on RH sources RedSleeve is probably close enough if you don't need a support contract, and it's been around for quite a while now.
"and will likely want some kind of software ecosystem to be present as well."
Pretty much all packages used by most distros today build reasonably cleanly on ARM these days.
All AV software eats pretty much all available CPU + 10% as soon as you start opening files. I tend to use ClamWin + ClamSentinel (mainly because it's open-source, having used AVG, Avira, and half a dozen others in the past). Having AV running typically makes "patch Tuesday" a case of leaving the machine installing updates overnight.
Windows has no PV kernel available. All it has available is PV I/O drivers. This is not the same as a fully PV kernel.
From the article: "The instances mandate the use of Linux, rather than Windows, as they only support hardware virtualization (HVM) machine images"
This is erroneous. HVM can support any guest OS. It is the more performant PV mode that requires a PV aware kernel in the guest, which rules out Windows.
Re: put an Aakash in the hands of every school kid in the country in 5-7 years
I think you are missing the point. Support for ARMv6 and earlier was dropped for a lot of things, including the Firefox browser. It doesn't matter that you have Android 2.2 or 2.3 which might technically be supported - if it's running on an ARMv6 it is NOT supported regardless of the Android version.
What is to say that ARMv7 isn't going to be similarly dropped in a couple of years time as the 64-bit ARMv8 devices start to flood the market and everything older than that is neglected because it isn't what "all the cool developers work on"?
As for older versions of Android and app compatibility - only a couple of years ago the market was flooded with Android 1.6 devices, typically cheap Chinese slates and those had issues running current apps even back then when they were new. While compatibility has improved between more recent versions of Android, compatibility between 1.6 and 2.x was very poor.
Re: put an Aakash in the hands of every school kid in the country in 5-7 years
"I am sure that this will be able to run up-to-date web browsers for the next 5-7 years,"
How do you figure that? First Android device was released in late 2008, about 5 years ago. No Andoid device older than maybe 2 years will run a recent browser, and many Android devices released in 2013 will not run Chrome or Firefox, either, since they seem to require ARMv7 (ZTE Blade and Skate were ARMv6).
Other apps follow a similar pattern.
So based on track record of Android progression and compatibility, the chances of the majority of mainstream apps made in the next 5-7 years being able to run on _any_ Android device sold today don't look that great. The apps' backward compatibility with older versions of Android has never been all that great.
Re: Not another smart phone?
@ted - it sounds like a HTC Universal wouldn't be miles off from what you're describing (apart from the fact it's not ruggedized).
Let's hope this doesn't end up as yet another proprietary ARM GPU with no open source driver support, like PowerVR and Adreno. Mali open source efforts saw pittifully little support from ARM, which necessitated a reverse engineering effort that lead to the as yet incomplete Lima driver implementation.
Ironically, in a twist few could have seen coming, Nvidia opened up their accelerated Tegra driver, thus making Tegra the only complete open source accelerated GPU solution on ARM. It's a shame that Tegra didn't get a mention in the article, especially considering it is in many ways superior to the competition, both technologically and in terms of open source friendlyness.
"£125m had been spent on software code and a further £27m on software licences."
£125m on software development isn't that outrageous for a system of this scale that has to be as bulletproof in terms of security as possible.
£27m on software licences is, however, completely ridiculous. Whatever happened with the recent noises made about using open source where possible/appropriate?
Re: Any drive
375PB isn't enough for you? Really? What are you using to generate that much transient data? For argument's sake, I have a medium load server (moderate MySQL load, Apache, several websites of various descriptions, mail, etc.) that has been running for a couple of years, and according to dumpe2fs, it has averaged 800GB of writes per year. That gives a figure of a little under about 467,000 years life expectancy (give or take a few centuries).
Anybody telling you that write endurance is an issue today either doesn't have a clue what they are talking about, or they are trying to sell you something (or both).
Re: iptables is your friend
I didn't change the point at all - if you read the iptables rules I proposed you will see that it was specifically about throttling connections to port 25 to 3/minute. The elaboration on the TTL part for mitigating source IP spoofing DoS vector was there to clarify why that part is included at all, but it was never the primary point of the exercise. If you'd read and understood the iptables rules that would have been clear to you.
If the OP's problem is that he is getting spam floods from a small number of IPs, then the iptables rules listed will alleviate the resource pressure on the MTA by dropping the TCP connections to port 25 from any source IP that has exceeded 3 new connection attempts in any 60 second period. The OP's problem is that his mail server is running out of resources due to excessive SMTP spam traffic. Dropping a substantial chunk of that traffic via iptables at TCP before the packet ever gets as far as the SMTP layer in the stack will reduce pressure on the MTA and help the server not run out of resources.
A similar technique can also be used to mitigate brute force dictionary attacks on any service (simply tune the port number, connection attempts, and time period variables to suit), within reason (obviously it won't help as much against a distributed attack via a botnet with lots of IPs). The point here is that the OP didn't specify whether this was a heavily distributed flood or a flood from a small number of IPs. Without any indication to the contrary, I assumed (correctly or otherwise) that this was caused by a relatively small number of IPs. Still, even if it is a hevily distributed attack, you can whittle away a lot of traffic by using a method like this - potentially enough to make a difference between a usable system or a complete DoS.
Re: iptables is your friend
You're missing the point - if the SMTP traffic is coming from one server (or a small number of servers), that server's connectivity would get limited at TCP level. Connections over the 3 new connections/minute limit would get dropped at TCP level, which means they would never deliver SMTP payload. Processing SMTP payload is expensive, while dropping TCP connections is nearly free. So if he were to drop most of the offending traffic before it ever got past TCP layer to SMTP layer, the server wouldn't get overloaded.
Re: iptables is your friend
The TTL consideration is to stop a malicious party from performing the DoS where they try to fool the throttling into thinking that an IP has exceeded it's new connections limit. In classing terminology terms:
Alice periodically sends email to Bob. Bob throttles new connections from everyone to 3 per minute. Trudy spoofs Alice's source address and starts issuing new connections to Bob. Trudy doesn't care that the 3-way handshake never completes, beause all she is trying to do is fool Bob's throttling system into thinking Alice has opened too many connections in the given time period. Thus, if Alice then tries to send email to Bob, Bob's throttling system will deny Alice's legitimate connection. If we add the TTL check, and if Trudy's discance from Bob is different from Alice's (pretty decent chance), the Bob's server will not treat this as the same connection (it sees a connection source as (IP,TTL) tuple, rather than just IP), and this will allow Alice's connection despite Trudy's DoS attempt.
Of course Trudy can try to figure out what Alice's TTL to Bob is and try to attack it harder, but that's an extra hurdle to cross and we are not really looking at fighting an escalating arms race, just provide a good enough first-pass solution that covers the basic defence.
The iptables throttling fixes the issue by making no server able to open more than 3 (or some arbitrary number, up to the OP to decide what they want to set it to, but 3 is a probably a good start for a personal mail server where very high bandwidth miling lists aren't used) TCP connections to port 25. If it wants to send more than that, it'll have to time out, wait and retry later. Since MTA won't see any subsequent connection attempts from the source server in that minute, it won't be chewing through all the resources it is chewing through at the moment (because it doesn't have to accept the connection or receive the payload to analyze it). Therefore the little SheevaPlug won't be DoSed and will continue working as it's owner requires. The only caveat is that the occassional email from a relatively high bandwidth mailing list might get delayed once in a while.
- Does Apple's iOS 7 make you physically SICK? Try swallowing version 7.1
- Fee fie Firefox: Mozilla's lawyers probe Dell over browser install charge
- Pics Indestructible Death Stars blow up planets with glowing KILL RAY
- Video Snowden: You can't trust SPOOKS with your DATA
- Hands on Satisfy my scroll: El Reg gets claws on Windows 8.1 spring update