Servers
One titsup server kills Brit-hosted Donhost websites for THREE DAYS
A fault in just one server at Brit web hosting biz Donhost took out thousands of websites and emails for more than three days. The service slowly found its feet again on Wednesday afternoon after the company officially confirmed it fell over on Monday at 7am. Affected customers posting in a help forum put the start of the …
Were the thousands of websites that failed
on the one server that fell over?
Re: Were the thousands of websites that failed
Could be a storage server which acts as storage for multiple front end servers.
Someone from the Don might know.
Re: Were the thousands of websites that failed
"We have created a new server to host the sites and services on and our system administrators are currently restoring all sites from our fail safe backups."
Odd, then, that they are restoring sites from backups. Because that means that there's only a single server hosting storage for EVERY website, and that doesn't have a redundant spare, a failover, etc. - just a restorable backup. There's little point farming all your storage out for every front-end server if you're then not going to have a usable replacment/failover with up-to-date data.
Years ago I used to use WebFusion for a variety of things. Nowadays, I'm loathe to touch it. They are more expensive and less capable than they ever were.
Re: Were the thousands of websites that failed
From what I know of their systems, I doubt it's ALL their hosting, probably just this set.
Re: Were the thousands of websites that failed
"Could be a storage server which acts as storage for multiple front end servers. Someone from the Don might know.
Netcraft says "F5 BIG-IP Apache/2.2.3 CentOS"
My guess (given they're suggesting customers move to a different brand) would be it's an old-style shared hosting server which is backed up but doesn't have multiple redundant servers. Would also explain how the server took out mail and web.
Maximum incompetence
Not much else to say, really.
Maybe once they're laid off, they can find a job at Microsoft, managing Azure, they'd fit right in.
Re: Maximum incompetence
You bad boy!!!
I just spewed coffee all over my monitor and keyboard.
Was it a windows server?
I think we need to know, so we can decide whether to throw our comments at the software, or the admins?
Re: Was it a windows server?
No you dont.
It's the admins. A failure like this always is. Failures can be mitigated in Windows, Unix, and Unix-like OSes
Re: Was it a windows server?
I have to agree, this is a damagement issue.
Re: Was it a windows server?
It was Linux / Apache so most likely it was hacked....
Next killer app...
Looks like the next killer app will be one to remind people to renew there SSL certs!
Clarificaton from Webfusion ..
How long does it take to swap out the hardware and restore from last nights backups. I would suspect that most of the delay was in finding someone technical enough to do the job, as they fired most/all of the technical staff - to save money.
"It was a technical failure that resulted in a complete system overhaul, which regrettably took longer than we had anticipated"
That's totally cleared up the issue for me.
One of our VPS hosted "servers" fell over at the same time, and while the service status page said there was a network issue their ("we are VERY busy at the moment", yet answered immediately) helpline said a similar thing had befallen it - a disk failure and being restored from backup. What puzzles me is their VPS are supposed to be RAID-5. Don't they keep an eye on the disks?
Additionally said VPS is down again tonight.
Used to use Donhost...
...and didn't mind that it there were cheaper alternatives (dedicated server) - I always got good UK based support and don't mind paying. Example: server died 10 pm one xmas day and was swapped out and restored within a couple of hours.
When the server was looking overdue for replacement they were keen to move me to Webfusion and so far so good but this event is worrying - as is that the support often seems to be off-shore and I've been disappointed with "experts" who try to fob me off with incorrect diagnosis and I have to prove to them I'm right and they're wrong.
There would have been less chances of downtime if it was a Linux server.
Erm, but it is a Linux Server (CentOS)
Website hacking / defacement statistics show that Linux is the worst OS to use...Most secure is Windows Server and Open BSD!
And so the nightmare continues...
First, server 50 went down early yesterday, and there's still some way to go before full services are restored - more than 30 hours later. Second, server 51 was down for over 4 hours today - and never even mentioned in the server status pages...
At least they are providing a very consistent service - of utter ineptness and bare-faced lies!
