A power cut in London's Park Royal area has knocked out half the servers in a Rackspace UK data centre. The web server hosting firm is telling customers it is "working to gain further information and updates will be provided in due course". About 10 per cent of Rackspace's UK customers, with servers housed at the company's LonB …
petty it didnt take down the register with it lol
"managed hosting" a managed mess more like. you might as well use 1and1 save yourself the cash and be safe in the knowledge that you're getting what you paid for, rather than getting far less and a false sense of security!
1and1: The thing with offshore hosting is that if your main customer base is local to you then you have added a fail point into your plan. International peering. Abley demonstrated already this month when a peering issue developed between a certain massive UK isp and Germany. 1and1 where also known to be telling customers that the isp was blocking them, compounding the confusion many web users felt... a tad personal me thinks. the fact is almost everyone in Germany and neighbouring countries were was unreachable from the isp's network for some 12hrs.
Anyway, this sort of thing is inevitable every now and then. Current thinking does not seem to extend to hosting in 2 different locations as redundancy. Ok, so it'd increase running costs, but one could certainly then offer 99.99% online SLAs then, and meet them, thus protecting reputation and offset penalty costs...
Just a thought..
I've used Rackspace for three years now and have got servers in both London facilities. On the whole their mantra of Fanatical Service holds true, I've been extremely pleased with them.
Anyone but a fool knows that a 100% network uptime guarantee doesn't mean that the network will be up 100% of the time. It just means that they will compensate you if it isn't. Compensation is small for a single server, but given the number of servers they have it gives them a huge incentive to get things up and running again as quickly as possible (65 minutes today). The 1&1s of this world have no such incentive and might well have taken days to get back up and running.
A sensible person also keeps hot spare servers in a geographically separate data centre just in case ....
On the whole I still think it's worth paying the extra.
What I found great is this article was the first one on the register homepage and the Rackspace advert was prominantley to the right of it.
Odd isn't it...
You hear about all these amazing hosting companies with backup systems, but when it comes to the crunch, it never seems to work.
I have personal experience of one on the south coast of the UK who boasted of wonderful backup systems. When the power cut finally came, the UPS system kicked in, the generators kicked in, and everything was fine for a while...
... until someone noticed that the air conditioning wasn't connected to the generators, and the servers slowly cooked up and started throwing fits in their 19" ovens.
yer but 1and1 costs a LOT less, and come on having a UPS that works and a power generator that they can get running isnt exactly diffucilt. They should have a backup backup generator and if they want to get their credibility back then they better go buy one soon!
And they might as well sack whoever checks the generator!
Node4 are very good in Birmingham and they despite their price tag they offer a pretty decent service.
No backup power?
Exactly how can you run a data center for other people without some sort of backup power? Even my own microscopic co-location company has an automatic generator and a UPS to keep the machines up until the generator comes up to speed.
If I ever get more customers I feel that I'll have to invest in a second generator simply for redundancy. How can a "real" data center get away with this? Don't their customers have a guaranteed uptime agreement ( with penalties )?
Speaking as somebody who hasn't had a single sleepless night or major incident since my move to Rackspace 2 years ago, I'd have to say that it IS worth the extra!
I've had hosting with a couple of other managed hosting providers, and they've all had outages which have caused me major headaches (shouting customers, etc), but I've breathed a hell of a lot easier since we moved.
Rackspace have had a couple of incidents recently, but they all seem to have been resolved in a very timely fashion. This kind of a thing is going to happen (to expect 100% uptime is extremely foolish IMHO), but as Dr Who says, Rackspace do have the incentive to make the extra effort, and it shows.
According to Rackspace technical support half of the servers in "LONB" lost power, but the other half were without networking due to the same power fault affecting the network equipment for the whole facility:
"As a result of the power outage, the Lon B generators kicked in, however a further UPS issue caused several segments to lose power. Within one of these segments we also housed our internal network services and this effectively cut all connectivity to the Lon B facility."
Whilst the network issues were fixed by 15:59, the power failure caused occured at 15:18 and was restored at 16:54. That is not sixty-five minutes - more like 96. They also claim that everything has been tested now:
"Utility power has been fully restored to our Lon B facility. Power is working as normal, all UPS's are back online and the generators and auto start function have been fully tested. Our engineers are continuing to monitor the facility to ensure that it remains stable. We thank you for your support."
not the first time
There was a power outage in this location about 2 months ago.
I do not know if Rackspace was impacted. Anyway, as far as I know, Rackspace does not own the facility, so they have no control on the UPS / Generators..
I didn't think 100% reliability was possible...
I always thought we aimed for the five 9s? 99.999% reliability. :-)
Five '9's not really appropriate for rented web servers.
The 2,3,4,and 5 '9's is a baseline SLA tool designed to show how much downtime a system can tolerate before it is considered to affect the purpose of the system. For example in the case of a banking system, that is how long before the cost of bringing the system up is lower than the money you are losing, which in most cases is a very very short time indeed, so we aim for the 5 nines.
In the case of rented hosting, it's probably reasonable to suggest that a longer period of downtime should elapse before compensation payments are triggered.
The downtime is worked out roughly like this:
One nine 90.0% 36 days 12 hours
Two nines 99.0% 87 hours 36 minutes
Three nines 99.9% 8 hours 46 minutes
Four nines 99.99% 52 minutes 33 seconds
Five nines 99.999% 5 minutes 35 seconds
Six nines 99.9999% 31.5 seconds
So, over a year, 99% means that about 3.65 days downtime is expected. A hosting company that offers you an SLA of 99% is saying that you can't really complain until the overall downtime in the last 12 months reaches 3.65 days.
A 99.9% SLA is 8 hours in a year and so on.
Five '9's is only 5 minutes or so. I think it would be unreasonable to ask for a low-cost shared hosting company to aim for a five '9's target, so we should probably be sceptical of those that claim they can.
Uniquely, Rackspace offer a 100% Infrastructure Availability Guarentee. Which suggests they pay out for anytime your site or server is not available. That definitely sounds fanatical, and I actually find that promise more believable that some company that throws in the term 99.999% uptime.
However I am not a Rackspace customer, but I would be interested to know if any of their customers received credits or cool hard cash as a result of todays power outage.
I am thinking
that maybe these places need to make their own power like steel mills it won't solve all the problems but it would mitigate quite a few.Hey if electricity is your highest expense then controlling it's cost must be in your interest.
@David Bell RE: Uh..
I was told something different by the data center manager for a nameless colo provider housed in said EQUINIX (Formally IXEurope) facility in park royal. I was told that the blame fell on the sholders of equinix, who apparently provide the power infrastucture for the facility. I was also told that 1 out of 3 of the supposedly N+1 redundant generators failed to start and for some un known reason the other 2 generators where not able to feed the area of the datacenter that had an outage.
Five nines (99.999%) uptime (or 5 minutes per year) is really the best anyone can hope for. I did a fair amount of work in high availability clusters and stuff, and although it's theoretically possible to get better, in practice you still have application restart or failover time of a few seconds for each server or application crash. Those seconds will add up over a year.
Hardware fails. Software sucks. Power is unreliable. Murphy is alive and well and living in YOUR computer room. Good luck getting better than "five nines".
100% just possible
"I didn't think 100% reliability was possible..."
"I always thought we aimed for the five 9s? 99.999% reliability. :-)"
99.999% reliability...to the nearest whole percent is 100%
Rackspace have been having a few high profile problems recently, and I have been quite impressed by how they handled it. Circumstances beyond their control have caused the trouble, and they are learning from the experiences, to ensure it cannot happen again. We have servers in both areas that have been recently effected (Dallas and London), and our Account manager, along with the rest of the staff, have been nothing but helpful.
To me, thats what Fanatical support means. That when she s**t does hit the fan, they dont just leave us in the dark.
Actions speak louder than words
As has been seen countless times over recent weeks with reg-123, fasthosts, 1&1 etc. what matters is the support when something does happen, how quickly it is fixed, how competent those people making the decisions are, how honest the company is when communicating the problem, how they communicate, actions taken to try and prevent future issues and compensation.
On most of the above I would guess rackspace come out on top unlike most of the others mentioned. I don't use them but I've not heard anything bad about them.
I'm not a great fan of compensation I'd rather they focus on providing the best service they can rather than having the idea at the back of their mind that customers will be happy because they get compensated. Its usually a trivial amount compared to problems caused by the outage anyway and that money would be better invested ensuring problems don't happen again.
No system will ever be 100% available and there are always things outside your control and things that don't work as they should.
As a couple of examples, I've had power loss because of a lightening strike on the mains supply during a storm which blew the main trip on one of the main phases in the data centre because of this the N+1 UPS was disabled and therefore the generators didn't kick in.
Similarly there was once a small component issue at a data centre causing smoke to be emitted and I think it took a trip out, the alarms went off and the data centre had to be evacuated even though the engineers knew what it was, once the fire brigade had given the all clear, the trip could be reset restoring power and then the faulty kit could be fixed.
"petty it didnt take down the register with it lol"
the register is hosted in lon 1 their datacentre by heathrow, or it was a year ago....
lonb is just a small facility of white box clone towers.. nothing meaty lives there :)
Rackspace customers have been told compensation details will follow shortly. Email was sent without proactively.
We are torn with Rackpace. It is expensive and they completely buggered up an upgrade we did recently. Been thinking about switching to Easynet.