It's Cloud
What did you expect?
CRM services have been disrupted for companies all over the world after seven of Salesforce's instances went down. Salesforce said on its status update page, which was also down for a while, that service from seven of its instances was being disrupted, – though at the time of publication it was claiming to have whittled this …
This is a fundamental failure of a "critical" system. Customers will rightly be demanding to know why the system didn't fail over to DR at the first sign of trouble. Being down 10+ hours later is not acceptable for a company in SalesForces position.
Either their DR plan was incomplete, or it wasn't tested regularly. Either way, its a failing and one that will lose peoples trust in the product. Rather ironic that their status system is on http://TRUST.salesforce.com.
Salesforce.com's uses Dell low end servers with a RHEL cluster with Oracle and a bunch of other software on top of it. That is not a mission critical environment for cloud/hosting or internal use. I imagine some patch was incompatible with some other software or firmware level which then knocked down the entire cluster... or some bug was passed across the cluster. I have seen it happen many times. It isn't that RHEL isn't stable, or x86 servers are not stable or Oracle isn't stable, VMware, etc, etc.... It is that, over time and updates, the entire configuration doesn't maintain an enterprise grade level of uptime because all of those independent companies, most of which hate each other, are not testing configurations as one system.
But that's exactly what Salesforce is doing - with the results we see.
And even though they are having some issues, from what I see their reliability is above that of RIM with its proprietary Blackberry thingy, so they're not doing so bad.
Still doesn't convince me to put company data in a place where I cannot control access.
I don't think it's even about data access control. To me it's more of "would I want to get the sack because an outsourced supplier/SaaS vendor really fu*cked up?".
To those clients bleating about it - did you really expect the uptime they told you there'd be? Did you check how they were planning to achieve that? Was that expectation realistic given the problems that even the mighty Amazon has faced? So far, to me, cloud is just a shortening of "cloud cuckoo land" as that's where a lot of the companies are living.
Everyone tests the initial configuration, during the build, with these x86 roll your own type stacks, but it is impossible to retest every component of the system every time any one of those half a dozen plus vendors releases a patch or firmware upgrade. The changes get pushed through test really rapidly because they are constantly happening on a weekly basis and eventually something gets missed. There are just too many cooks in the kitchen and too many moving parts.