Might be tempting fate, but...
I have stuff hosted by them, and I've not noticed any interruption in emails coming in. Just checked, and websites appear to be up and running. Maybe this fault only affecting part of the data centre?
Nottingham-based web hosting company Heart Internet has gone TITSUP* due to a power outage at what it claims is "one of the most efficient and resilient data centres in Europe". Aggrieved customers have been in touch with The Register to claim Heart had initially blamed the issue on a DDoS attack, before the company …
Heart Internet was bought by Host Europe Group a couple of years ago so although their offices are in nottingham they use HEG's Leeds data centre along with 123-reg who where also affected by the outage.
90% of their data probably does go via London but they do peer at ixLeeds.
Yeah, I spent some time this morning recovering databases from backup as the sudden power interruption wasn't particularly appreciated by InnoDB.
They did a reasonable job of updating their webhostingstatus page, but I can't help think a cursory notification email would have been nice once they were aware there was an issue
I found it strange that it appeared to be down earlier on, then apparently came back on, then went down *again* (much more permanently) after that. The original DDOS claim made more sense in that context.
Anyway, yeah. Our database got ****** up as well. I'm incredibly glad I'd made a backup of all the databases (for genuinely coincidental and unrelated reasons) the previous evening, so we only lost a day's worth of data instead of several weeks.
Spent a while wondering why the VPS itself seemed to be up yet the MySQL connection wasn't (since it connected via localhost on the same server), then realising that mysqld was constantly dying, *then* realising that this was because the database was corrupt due- probably- to the power-off.
Eventually got the database in a repaired-enough-state to make a mysqldump backup, but the whole caboodle (i.e. the underlying InnoDB storage, not just the database content) was still obviously corrupt and it was approaching midnight, so I wiped and reinstalled from the backup anyway.
Nice way to spend an evening (ahem). The fact that even this was the "lucky/good" version of the scenario says a lot, but it did have the benefit of teaching me more about MySQL...!
At any rate, I'm scheduling regular backups from now on. :-O
The pilots flew a fully functioning plane into the ocean because of a faulty sensor, believing it and and ignoring everything that indicated otherwise.
At least nobody got killed here but my experience of failures is you keep one spectacled eye on the manual and the other scepticalled eye on whether you really are where you are told you are. The report does suggest the procedure rather than the fault was the problem.
The first law of Disaster Recovery is to treat any Disaster Recovery Plan as having a flaw and you need to spot it before it pearly gates you.
"Is this in any way related to the article?! Almost thought this comment had been transferred from somewhere else!"
The parallel that a faulty sensor caused them to initiate a faulty procedure that likely caused an unnecessary catastrophic failure of a data centre when the power was OK anyway. But apart from that and not double-checking the cause - you are right, nothing whatsoever.
Sorry I come from an age when computers ran on valves and failure was omnipresent and you had a lot of practice coping and sorting. These days the very reliability means disaster recovery is rarely tested in reality. And simulated failures are never quite the same. That's why you don't implicitly trust procedure. Its a help, not a master.
"The pilots flew a fully functioning plane into the ocean because of a faulty sensor, believing it and and ignoring everything that indicated otherwise."
Two out of three airspeed sensors failed simultaneously, out of a group of three. The two which failed were identical devices which failed identically. A third dissimilar one continued to function normally, but was outvoted by the failed ones.
There was a known weather-related problem with the particular model of (failed) sensors in use on that particular model of aircraft, with an Airworthiness Directive (?) already in place to remedy it.
The "two identical simultaneous failures in three sensors" is one of a number of "must never happen" circumstances in the AF447 picture. Fixing any one of them would probably have led to a different end result.
"The first law of Disaster Recovery is to treat any Disaster Recovery Plan as having a flaw and you need to spot it before it pearly gates you."
The "two identical simultaneous failures in three sensors" is one of a number of "must never happen" circumstances in the AF447 picture. Fixing any one of them would probably have led to a different end result.
The sensors iced up for around one minute, causing the autopilot to disconnect. The pilots paniced, stalled the aircraft, and ignored the stall warnings until the plane flew into the ocean at 200mph. The main thing to fix was pilot training. Air France does not have a great record, over the last decades the only airline with a greater number of passengers killed in crashes is Aeroflot.
And they flew *into* bad weather (which caused the airspeed failure) rather than round it. And when the crew in the cockpit got out of their depth, they didn't *immediately* seek assistance from the senior captain who was "resting" at the time. Coincidentally those are arguably training related too.
Gross oversimplification, obviously, but *lots* of things went fatally wrong, and fixing any one of them would probably have changed the outcome.
This illustrates the value of having proper procedures and of actually following them. Just like a proper disaster recovery - expertise, procedures, details.
with an Airworthiness Directive (?) already in place to remedy it.
There was not. EASA ATA 34 is dated the 31st of August 2009, with the AF447 incident occurring two months earlier.
Fixing any one of them would probably have led to a different end result.
Nevertheless, the stated procedure in the Flight Manual for the condition in which they found themselves - and which the CVR reacords them discussing - is to fly straigh-and-level at nominal cruise power for 60 seconds. Had they followed this procedure, the issue would have been resolved. But they didn't.
Once they were into the full stall, standard procedure - which is drilled in from very early on in a pilot's career - is to drop the nose. PF actually tried to do this, somewhat tentatively. But PNF had the stick hard back, when he should not have been touching it at all. Thus the aircraft remained at high AoA, and the stall continued all the way down.
TL;DR: although the Thales pitot tubes fitted to that aircraft were a bit crap, the aircraft was downed not by equipment but by the flight crew failing to fly the aircraft. They thought it was unstallable. It's reminiscent of the Titanic being unsinkable...
My apologies Vic, I may have got my Airworthiness Directives confused - there were plenty of ADs and manufacturer recommendations on this subject:
"When it was introduced in 1994, the Airbus A330 was equipped with pitot tubes, part number 0851GR, manufactured by Goodrich Sensors and Integrated Systems. A 2001 Airworthiness Directive required these to be replaced with either a later Goodrich design, part number 0851HL, or with pitot tubes made by Thales, part number C16195AA. Air France chose to equip its fleet with the Thales pitot tubes. In September 2007, Airbus recommended that Thales C16195AA pitot tubes should be replaced by Thales model C16195BA to address the problem of water ingress that had been observed. Since it was not an Airworthiness Directive, the guidelines allow the operator to apply the recommendations at its discretion. Air France implemented the change on its A320 fleet where the incidents of water ingress were observed and decided to do so in its A330/340 fleet only when failures started to occur in May 2008."
https://en.wikipedia.org/wiki/Air_France_Flight_447 (lost on 1 June 2009)
The Airworthiness Directive you reference did indeed come a couple of months after AF447, but as you'll see from the above, the Pitot tubes had plenty of previous.
Multiple pilot errors from multiple pilots were indeed a very important factor. But one of several important ones.
as you'll see from the above, the Pitot tubes had plenty of previous.
The pitots in question weren't exactly top-notch, but as you state, Airbus suggested they be changed. Had this been a safety issue, an AD would have been issued, and the change would have been enforced. This is an upgrade, not a safety recall...
Multiple pilot errors from multiple pilots were indeed a very important factor. But one of several important ones
The plane was completely flyable with the problems that occurred. *I* could have flown it to safety. What crashed this aircraft was the pilots - and PNF in particular - not doing very standard stuff when it was required. Had PNF not been holding the stick back - when he shouldn't even have had his hand on it - PF would have recovered from the stall. At that point, everything would have been back to normal.
It's really not useful to try to blame kit here; this was pilot error, and any mitigation we might try to draw simply obfuscates the root cause of the crash, thus rendering it more likely to reoccur. I don't think anyone wants that.
Anon because, well just because ...
I know what it's like when you tell manglement that we just need to do this and this, like this and ... oops it all goes quiet. I also know what it's like when you switch it all back on again and that essential box that you don't normally touch because it ... just works ... now refuses to power up again - and you finally get to find out what it does, or rather, now doesn't do.
But having just moved our own website out to them because otherwise it goes off when we lose our connectivity (or power, or ...) - we then find out website goes off because the hosting people lost power. The irony wasn't lost on us.
If you are moaning about losing business where is your redundancy? Where is your DR plan? If you are reliant on your website & email for your business you should have thought about happens when it goes TITSUP as it will.
No service is perfect and electrical issues like this aren't that uncommon. When power goes down unexpectedly things break and things get corrupted. There will be a limited number of staff running round trying to fix everything. Some things will require an expert to fix. Some things will take a long time as they may need to resource extra hardware, recover backups or rebuild huge arrays.
So take this opportunity to sit back and make a plan for next time, preferably make sure you have your own local backups, perhaps replicate your operation elsewhere so you can switch over in the event of an issue and then have a plan when that doesn't work either.
Just changing provider won't solve this issue as it can quite easily happen somewhere else.
If you haven't got a plan for whats happened then you probably haven't got a plan for when your office gets flooded or burnt down (many flooded this winter and 3 local businesses near us burnt down to the ground recently due to an electrical fault).
It's amazing how many businesses make the same complaints when they lose their internet connection sometimes for days (remember the London telecoms fire?) especially when they rely on VOIP or SKYPE
A National business based in Hull lost its offices due to flooding last winter. They had a plan and were back up and running elsewhere the next morning and turned over £1m using 45 laptops the following day.
In summary don't put all your eggs in one basket!
Small businessess don't always understand IT, seen plenty of small companies set up, with the computer systems just put in there by some guys they know who knows computers. All very well but usually the home experts don't think in the way of buisness uptime and continuity, backups covered by buying a removable hard disk from PC world.
Yeah I know not really a excuse from the business owner (they should know everything about running a business yada yada), but some people genuinely do not know enough for it to even occur to them, or believe the promises sold to them enough to believe all will be good. Companies that have started off because someone had a good idea for a local business or service sometimes end up becoming a company more through accrued sales suddenly making someone think "hmmn I could run this as a busines lets do it!" They often can be on edge margins, and having interesting learning experiences about cash flow etc, as well doesn't suprise me this happens.
Any company big enough to hire IT and haven't and who have then cocked up, well maybe you should look at your directors bonuses and wonder if you would be better spent using some of it for someone who looks after that sort of thing for you.
"Small businessess don't always understand IT,"
Well then tough luck. If your entire business relies on it then you'd better make sure someone working with you DOES understand it. Its like a taxi company saying it doesn't understand cars and didn't realise that sometimes they can break down and need servicing.
I do not think you read the full post about small businesses and how they sometimes grow, which did sort of address the issues about that point.
Most business don't realise they need it till it goes. Its not like a taxi company forgetting about cars that would be daft.
Its more like a company manufacturing something, making sure those plant machines are all working, and forgetting about IT which is not actually there business but rather something that supports it, IT and emails coming in doesn't necessarily seem important, plus generally speaking IT systems do usually behave themselves. It's probably closer to the Taxi company remembering to do structural checks on their building.
"If you are moaning about losing business where is your redundancy? Where is your DR plan? If you are reliant on your website & email for your business you should have thought about happens when it goes TITSUP as it will."
Well some of us do have sophisticated DR plans. But, as I have posted above, they are always inadequately tested and there is always an unknown risk in activating them. I have been caught on that. Which is why, on failure, you want to know if the cause is known, being acted on and you know their best estimated time to fix with frequent updates.
Then you can take a calculated risk on whether to ride through the failure or bring up the back-ups. With normal DNS TTL's of 60 minutes this may not bring immediate relief, then you have to switch back and re-synch everything. That of course is if the DNS isn't in the same DC!
That's one reason I don't host with Heart. I don't have confidence that I am going to hear the whole truth straightaway. I host with suppliers who still manage to have resilient status servers and will reply to tickets when they have issues. The repliers are trustworthy engineers not computer illiterate customer services who can't tell the difference between a DDoS attack and a power failure!
> Hearts gone downhill then
They got bought by 123-reg and things went downhill so fast Swiss people were banging cowbells.
We had a DS with them that used to fall off it's VLAN every Monday morning when they rebooted it. So we moved. Unfortunately we forgot they still manage some of our DNS so we briefly vanished off the internet anyway.
We do have a DR plan (based on Amazon WS, other cloud suppliers are available) but like all businesses we face a go/no-go period - the effort of invoking DR procedures might not be worth it for a 10 minute outage. My gripe is that we had no information from Heart on which to act. ETAs for recovery were, as it turns out, as accurate as Chinese GDP figures.
<quote>In summary don't put all your eggs in one basket!</quote>
DR plans are all fine and good, until the PHB's tell you that they (DR plans) do NOTHING to increase shareholder value.
One tires of banging their head against a wall trying to get PHB's to understand the consequences of NOT having a DR plan in place; but all they see is the shrunk (year end) bonus pool.
Rarely do PHB's ever face the consequences for their shortsightedness.
Having been on the receiving end of this in the past I can tell you that if you are a small business servicing other small businesses / no capital start ups your customers sometimes don't want to pay for a DR plan, some can be convinced it is required but many point blank refuse, they still think DB driven Websites should cost a few hundred quid and the idea of paying around £20 a month on top of their hosting for a reasonable offsite backup plan tends to get pushed aside / met with "you are trying to extort more money out of me".
This does not however prevent them from shovelling the blame and the stress onto you despite what your contract (and the very clearly worded email specifying they have elected for no ongoing support and that it is now their responsibility to maintain backups of their system going forward) says, small business is a lot about customer relationships and in these exact cases that is where they fall down.
I've had a couple of sites knocked out, which was at best annoying as I wanted to get some updates done yesterday, but given the amount I pay I'm not going to lose sleep over it. I've seen far worse screw ups with customers that pay far more than Heart or it's re-sellers charge. Just amazed that some of those screw ups never made it to The Register!
Well not always. I've told the story before about when the power substation at bottom dollar Rackshack blew up and put them off grid for a week without one of their 25,000 servers ever noticing. And where a pretty good DR plan wasn't quite good enough and they had to physically rebuild it in real time as it came under increasing strain.
Sadly Head Surfer has sailed away. But my replacements who are competitive with Heart et al just ooze professionalism. If there is an issue it is explained in technical terms, no fobbing off. Not only do they fix issues quickly but go on to sort root causes in a transparent manner.
And while I have found excellent (as well as awful) low cost DC operators in the US and in continental europe - all my attempts to find a budget UK DC have ended in tears.
Something about how we value engineers?
Chant until you believe again that nothing bad can happen anymore since you're on the cloud.
Shit happens, and when it does, Cloud means there's jack all you can do about it.
I can buy that it is a freak chain of circumstances. I can even imagine that there is no single one person that is responsible - neither the contractor, or local employees, maybe even the procedure was fine and Murphy's Law was just strolling through that day.
What I cannot buy is this insane, lemming-like attitude of "let's put all we got into The Cloud because everything will be all right and don't look at the invoice, ever."
No, something will go wrong and, as anyone with Internet access can find out, there is not one single Cloud provider that has not had problems staying online for whatever reason. So yeah, if you're going to put the one thing that makes your money - your website - on the Cloud, it's up to YOU to imagine where you might be if that cloud disappears for a while.
Instead of just drinking the cool-aid and looking at your shiny, 99.99% Uptime guarantee.
It's The Cloud. THERE IS NO GUARANTEE.
You're preaching to the converted - I happen to think the Cloud is a stupid IT professional's idea of a clever thing to rely on. That is why we are not based on the Cloud, but as a DR option it's not bad (being able to spin up multiple cores in seconds works for me).
Actually, my biggest worry about The Cloud (other suppliers of rain are available) is that they are a hackers Holy Grail. Why attack hundreds of separate enterprises, when you can hack into Amazon? I'm sure they have great security experts, but they have to be lucky all the time whereas the hackers have to be lucky only once.
"Cloud" is just marketing bullshit. A long time before "Cloud" existed shared hosting existed. And it had the same problems then as "Cloud" does now. First and foremost, when it goes wrong you're beholden to someone else to fix it. If you want to take responsibility for your own IT, don't give it to someone else to look after. "Cloud" or otherwise.
What's you preferred strategy? Self hosted? Co-lo with multiple providers?
There seems to be the view in management where I work that inevitably we'll reduce the self hosted kit and move it to Amazon / Azure. This in a non-Internet business where best performance of the out-dated app comes from having the servers close to the users.
Curious to know what the could naysayers are doing.
Cloud doesn't mean there is jack all you can do about it, unless you threw in the cloud and didn't think about DR. That's the same if you do it in the cloud or by other methods. Whatever method you are using if you have thought about failure then you probably have an automated failover and simply carry on running, the advantage of cloud is you can probably add some extra processing power cheaply and quickly at your DR site to cope with the full load. Traditional methods probably mean you having full duplicated services (doubling the cost of operation).
Good job I've been at the other side of the world for this... On the other hand.... Damn my inbox is going to suck on Monday.
Also anyone else know the number of a less error prone dedicated hosts? (they lost comms to a server bank for about 22 mind last month so this is possibly they start of an ongoing trend)
Actually it's a trend that started well over a decade ago. I thought that Nottingham sounded familiar and it all became clear seeing that Heart are part of Host Europe who were based there. HE were a disaster in 2001, made no better when pipex bought them, and sound just as flakey now. People who worked there when we had troubles told me that everything was a shambles, and when we used them we suffered several power outages, as well as someone else's hard drive complete with recoverable data put back into our server as a second drive when working on a drive upgrade; the drive they put in was also from Windows which is why the so-called engineers couldn't mount it on Linux and said that all data was lost - accessing the raw drive revealed the terrible reality of what they had done. Totally incompetent and leaving them to host in the US was a breath of fresh air. As for cloud, we switched almost entirely entirely away from bare metal to AWS and DO for a few years now, and that's been a great move too.
How can they say it's resilient now?
Because resilient doesn't mean failure-proof.
a : capable of withstanding shock without permanent deformation or rupture
b : tending to recover from or adjust easily to misfortune or change
A resilient service comes back again, just as Heart apparently has.
Has this advert paid the usual Reg advertising tariff?
I realise that the advert's presentation is a lot less offensive than many of the more intrusive ads featured on the Reg these days, but in terms of content it looks a little off: this poster has signed up today specfically to make this one all-caps advert/post.
They say 50% of adverts don't work. Usually it's hard to tell which work and which don't. Not a problem in this case.
[If mods wish to delete this and the post to which it refers, that's fine by me. As is the "leave it as is" option.]
Posting anonymously this time as I am worried Heart will take a dislike to my comments.
You can measure a service not by how well it works when everything is going fine, but by the actions taken when it goes wrong. Heart used to be fantastic, until about a year ago that is. It doesn't surprise me that we are now 40 hours on and still the service status is showing continued faults and without any idea of how long it will take to get service resumed.
From last year we started waiting longer and longer for tickets to be answered, sometimes days at a time. When we did get a reply, the answer was often confused or we might have to point out the solution and beg for that solution to be implemented.
I understand that a big organisation will not want to talk to it's customers directly, but it is frustrating to when you have a need to escalate a problem and there is no-one willing to do so and when the staff on the front line don't have the answers.
I'm so glad we moved our sites away from Heart.
Yeah, we have two Hybrid VPS' with Heart. The one with the small database was back up fairly quickly, luckily I guess that database wasn't corrupted because of minimal size/traffic and that VPS was on a switch that got back up and running fairly quickly.
Different story with the other one, bad switch prevented reaching it for several hours then when I could shell in, InnoDB had taken major exception to suddenly going dark and spat the dummy. I'm not a DBA, so I raised a ticket with Heart asking them to step in an fix MySQL so I could go about restoring our database. Then their support site had a lie down and it was a further several hours before they got back to me. In that time, I learnt how to recover InnoDB and fixed MySQL myself, then proceeded to restore the database from a backup as there were tables aplenty missing :/
All in all, Cphulkd, Roundcube, Eximstats and a couple of our databases were goosed, along with some other random file system corruptions here and there.
We don't pay for Premium Hosting (although I notice that has the same SLA), so I had to go learn stuff, but if we had been paying Premium, we couldn't communicate with Heart anyway, that's what really annoyed me about the whole event. Sure things happen and you need to accommodate that, but when you can't communicate with the people and are consequently waiting to see if they're doing anything so you aren't both working the problem at the same time, it gets more than a little frustrating.
The same VPS died last year for a while too, resulting in about the same downtime as this time. I wonder how their SLA is calculated, it's more like 99.6% from where I'm standing.
Maybe some virtual BBUs on those Virtual RAID cards?
Quick tip (in the absence of this critical advice from Heart). If innodb_force_recovery =6 won't allow a mysqldump without die-ing, delete & re-created /var/lib/mysql, reinstall with mysql_install_db and restore from backup. Easy when you know how... no thanks to Heart for the fairly obvious tips there...
But now the file system my other VPS has gone read only! Tried to reboot via control panel and it won't come back up at all... Assuming they don't come online by the morning, anyone got any tips on any other small biz friendly, tried & tested, not too costly, UK based VISPs, with decent support...?
Due to a failure with HEART INTERNET's service my servers have been corrupted. Customer websites have been broken since last Wednesday.
HEART are telling it is now my problem and I need to fix my server myself.
Websites have been corrupted. My customers have been hit hard.
I am now left with one server which has 12 key websites DOWN , crucial to my business and a server which is not functioning.
I have raised tickets and am being fobbed off.
I spend thousands of pounds a year with Heart Internet. How have others been treated in the 'Downtime' they call it? In fact it's more than downtime it has caused so much more trouble...
Angry customers, loss of revenue for myself and customers, lost time for development work, now facing a huge mountain to re-build work etc..
This is appalling. This really needs to be flagged-up as possibly one of the worse hosting issues in years.
Seriously annoyed - and still no-one calls me (despite requests to discuss the issue). No one senior emails or responds to a ticket, instead I have people who say "oh there's a problem, you'll need to fix it yourself".
Heart - this is going to be a widely known problem! I am awaiting your response.
Sorry to hear of your problems. We too have had on-going issues, but as usual when the going gets tough you are met with silence from Heart. They used to be good, but now we have to class them as a low grade service. We have been using Siteground for some of our customers. Not perfect by any measure and often very tight on resources, but for speed, reliability and service they are in a different league.
We make use of Cloudflare and so can shift a site to a new hosting service within minutes and that is what I have had to do for some of my customers.Simply once the files are in place, change the IP in CF to the new location. Not so easy if you have SSL etc. but for the mainstream it works.
Biting the hand that feeds IT © 1998–2019