back to article Heart Internet outage... three days and counting

A number of Heart Internet customers remain unable to get online for three days thanks to a hosting failure. On 28 January, the Nottingham-based web hosting biz explained in its status update page that one of its KVM hosts (KVMhost69) had suffered a disk failure, which had forced the file-system to mount in read-only mode. " …

  1. Professor Clifton Shallot

    KVMhost69

    KVMhost69 - some kind of head alignment problem, I guess.

    1. Korev Silver badge
      Pint

      Re: KVMhost69

      For you Prof. Shallot -->

    2. big_D Silver badge

      Re: KVMhost69

      It says a lot about their disaster recovery practices!

      I thought the whole point of virtual machines is that they aren't restricted to any physical hardware. Why can't they just spin up a new machine and move the VMs over from the read-only array or recover from backups? Or just move the VMs over to other hosts with free capacity?

      My previous employer had a power outage and lost our first production server and the backup server (both mainboard failures) when the power came back 2 hours later (complete industrial estate lost power). We managed to recover all VMs from the first server onto the second server within a couple of hours and jury-rig a backup process until 2 new servers and SANs could be installed a couple of days later.

      The VMs were a little sluggish as the second production server was carrying its load and the load from the first server for a couple of days, but everything was back up and working.

      The IT manager was on his own and managed to get the infrastructure back online in a few hours, surely an ISP, whose job it is is to provide IaaS, must have some sort of disaster recovery procedure in place to deal with such problems? I mean, that is their bread and butter after all, they are not the underfunded IT "department" of a manufacturing concern...

      Edit: I posted this against the next thread... :-S

  2. Alan J. Wylie

    failed drives in the Raid-10 array

    Plural! If it's not very bad luck, how long had the array been running degraded?

    1. Adam 52 Silver badge

      A long time ago I learnt the hard was that if you by a set of identical drives and subject them to am identical workload then they all fail at the same time.

      1. Anonymous Coward
        Anonymous Coward

        and they fail even quicker when a company uses cheaper consumer grade disks instead of disks designed for 24/7 use in a data centre environment...

        1. Alan Brown Silver badge

          "they fail even quicker when a company uses cheaper consumer grade disks "

          A lot of stats show that if anything the more expensive drives fail faster. It was certainly the case for all our scsi-UW drives.

          There's at least one filesystem out there which works on the basis of "Disks are crap. Deal with it" - where loss of a drive or two isn't a big deal, vs systems with expensive raid systems and expensive disks that don't get adequate supervision and where loss of a drive is a performance-sapping event.

          In any case, any outfit which doesn't have monitoring setup to send out a distress call when a RAID drive dies isn't fit for hosting other peoples' VMs.

    2. CrazyOldCatMan Silver badge

      how long had the array been running degraded?

      Was also my first thought. So, either they have had an uncommon run of bad luck (which can be mitigated against by doing things like not having all the drives in the array from the same production batch) or their monitoring and supply arrangements are nothing short of shocking.

      I've added them to the reasonably-long list of "companies with which to not do business"..

  3. Anonymous Coward
    Anonymous Coward

    Status page shows that two kvmhost had issues. How many more are going to fail due to Hearts inability to maintain it's own servers properly? Surely they data centre team have received notifications of a degraded array? Assuming they actually have a data centre team and they haven't all been poached by a rival hosting company...

    Obviously they learnt nothing from the last two major incidents they had in 2016 and 2017, or are they aiming to have an incident every year as some kind of twisted anniversary gift?

  4. Anonymous Coward
    Anonymous Coward

    glad I moved

    Previous employer used to use a heart server and I convinced them before moving that it was high time to get an upgrade. Managed to get everything moved before all these incidents really kicked off.

    1. Adam 52 Silver badge

      Re: glad I moved

      Why would anyone pay Heart £15/month when you can get something better from AWS for $10/month?

    2. Ken Moorhouse Silver badge

      Re: Managed to get everything moved

      A Heart Bypass Operation.

  5. Ken Moorhouse Silver badge
    Coat

    KVM Host Failure

    Couldn't they have just bought another monitor from PC World?

  6. Black Rat
    Joke

    KVM host failure

    Because you can't blame mum for unplugging the server so she could hoover your room

  7. Anonymous Coward
    Anonymous Coward

    #Metoo

    I've lost a client over this, which is really saddening.

    Never been quite so frustrated and upset with their service in over 5 years

    1. Dominion

      Re: #Metoo

      Every time there's been a buyout, Heart -> Host Europe -> GoDaddy the level of service has degraded. Another week and I'll be off them for good.

      1. Anonymous Coward
        Anonymous Coward

        Re: #Metoo

        I can see myself leaving very soon too :/

      2. CrazyOldCatMan Silver badge

        Re: #Metoo

        Every time there's been a buyout, Heart -> Host Europe -> GoDaddy

        Same happens across the industry (and is the reason why I haven't been a customer of Demon Internet for a number of years..).

        Also, GoDaddy is one of those aquisition canaries - when a company gets bought by them you know it's well on the way to dying and it's time to bail. Much like Capita..

      3. Alan Brown Silver badge

        Re: #Metoo

        "Every time there's been a buyout, Heart -> Host Europe -> GoDaddy the level of service has degraded"

        When it comes to virtual hosting, redundancy is best attained by hosting with multiple providers in different locations.

        Likewise with RAID cloud storage. Each element in a different cloud provider.

  8. Jay 2
    Meh

    Unimpressed

    It doesn't give off the impression that their setup/kit/etc is ideal. Things happen, but you shouldn't really ever be in the position of having a disk fail in a RAID 10 to cause such an outage. If it was multiple disks, then either someone wasn't monitoring or they used crappy disks (no excuse for either if hosting is your business!). And that's before you even mention the fact that their estate doesn't seem to support moving virtual machines to other hardware as other commenters have mentioned.

    The only bit I have empathy with is when a disk goes, the RAID controller does its best, but a filesystem makes itself read-only until you can run an fsck/check. But still, at worst that's a reboot and a few hours. And if your business is hosting, you should be able to withstand a piece of physical hardware breaking.

    1. Anonymous Coward
      Anonymous Coward

      Re: Unimpressed

      "when a disk goes, the RAID controller does its best, but a filesystem makes itself read-only until you can run an fsck/check"

      That'll be news to most commentards on here! I've never had a situation where a single failed disk in a RAID configuration causes that to happen.

  9. Anonymous Coward
    Anonymous Coward

    If you could sense the conflict of emotions I'm feeling right now ...

    Where I used to work, they had a massive push to move everything to Heart and get rid of all in-house "anything". No they (manglement) didn't do any of the things I mentioned in internal discussions - like looking into their capabilities to see if what they promised was something they could provide. Their portal isn't the best I've used, especially for DNS which is (IMO) a right PITA to work with.

    Anyway, now I've been made redundant reading this gives me a right feeling of Schadenfreude. They (my ex employer) got rid of everyone with a clue and have been busy making screwups after screwups.

    On the other hand, I feel bad for any of the customers who have been affected by this. As a professional I really dislike seeing customers screwed up - and then usually lied to to avoid taking the blame.

    But there is another hosting outfit that promises "we can move your entire Heart hosting setup to us - automatically". Apparently it was setup by the people who setup Heart before it got sold on.

  10. Anonymous Coward
    Anonymous Coward

    The Cloud...

    Other people's computers you have no control over.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like