love that built to fail mantra
haven't had to rebuild or re-install a single server since moving out of EC2. After two years of using EC2, it was by far the most frustrating experience of my career.
I'd wager greater than 98% of operators out there are not equipped to handle such a built to fail system (because they have had no need to operate such a system themselves, they just repair and keep going, or perhaps in the VM world just move the VMs to another system (perhaps automatically)).
This sort of behaviour can be somewhat of a culture shock. Contrary to what could be considered common beliefs, it actually takes more skill (far more) to effectively operate in EC2 than operating on your own infrastructure (or on another cloud provider that utilizes more intelligent infrastructure).
I loved this article from El Reg:
It was (and still is) a masterpiece.
I knew EC2 was going to be a nightmare even before I originally started using it, turns out it was far worse than I expected.
Built to fail makes sense at very large scale, but makes no sense (too much overhead) at smaller scales. Most places are small scale(say 500-1k servers or less). Amazon should stop trying to shoehorn their operating model on the customers. But I'm not holding my breath on that one.
I've worked at nothing but internet facing software companies for the past 10 years and nobody has really ever built anything (software wise) that could live (e.g. no single points of failure etc) in a built to fail environment. Companies would rather focus on developing features for their customers than worry about making the application resilient to failure. And the level of automation required to handle such situations gracefully is extremely complex.