back to article Google broke its own cloud again

Google has 'fessed up to breaking its own cloud. Again. The most recent mess occurred on June 28 when Google Compute Engine SSD Persistent Disks in us-central1-a “experienced elevated write latency and errors in one zone for a duration of 211 minutes.” The mess meant that disks probably stopped accepting writes and instances …

  1. Anonymous Coward
    Anonymous Coward

    The Cloud...

    Other people's computers you have no control over

    1. LittleOldMe

      Re: The Cloud...

      You mean other people's computers that cost much less to run and are far more reliable than the ones you do have control over? That cloud? Where do I sign.

      1. Anonymous Coward
        Anonymous Coward

        Re: The Cloud...

        "You mean other people's computers that cost much less to run and are far more reliable than the ones you do have control over? That cloud? Where do I sign."

        Are they? Our dev server here has been chugging away for 6 years with only 1 replacement RAID drive required. I wonder how many Cloud failures there've been in that time. It seems all the good reasons to switch over to networked PCs & mini computers from rented mainframes in the first place has been forgotten by the current generation of chumps. They'll probably get a clue one day , but I won't hold my breath.

        1. Anonymous Coward
          Anonymous Coward

          Re: The Cloud...

          "Our dev server here has been chugging away for 6 years with only 1 replacement RAID drive required. I wonder how many Cloud failures there've been in that time. "

          You're extrapolating from a datapoint of one and comparing its reliability (possibly an outlier) with a whole cloud. What you perhaps mean is "I wonder how many Cloud failures there've been in that time that affected my one dev server".

          1. Pascal

            Re: The Cloud...

            "You mean other people's computers that cost much less to run and are far more reliable than the ones you do have control over? That cloud? Where do I sign."

            It's a good debate to have for sure but it's not nearly as "rainbow and unicorns" as that statement claims.

            "Cost much less" ?

            In some cases. In others, not. We've selectively targeted workloads that would be cheaper in the cloud as candidates, other workloads, not so much - in fact, some would cost many times more.

            "far more reliable" ?

            Google Compute Engine's SLA is 99.95%. That's a very good claim, but that one 211 minutes alone sets them at 99.5% for that month. A 10% credit towards the next month (as per their SLA) doesn't make up for 3.5 hours of unscheduled chaos.

            In the end it depends on how critical your systems are and how good you are at maintaining them. I trust Google to know their shit, obviously, so yeah their cloud is very reliable. But I also understand that my SLA (the one I provide my customers) is the last thing on their mind when things go tits-up (and things invariably do). For these "absolutely must not fail", where you can afford to plan specialised backup / redundency / disaster recovery scenario, you can definitely be more reliable than the cloud. Or at least, when all hell breaks loose, you get direct control of the fixing process.

            ^^^^^

            As you can probably see from that, I'm a bit cloud-shy. I do see it as "another guy's computer, that won't even take your calls when things go wrong" (well, you sure as hell are not going to talk to "the guy" unless you have a lot more clout with Google than I do)!

            My general thinking is that I'll happily "cloud" anything that I would have run on a rented server at the local datacenter/colocation facility. Anything more serious than that, and I get scared.

            1. quxinot

              Re: The Cloud...

              I suppose the proper way to argue is that it isn't just someone else's computer that you have no control over--it's also someone else's staff minding it.

              That you've never met. Never vetted. And can't fire (individually) when appropriate.

          2. Anonymous Coward
            Anonymous Coward

            Re: The Cloud...

            "You're extrapolating from a datapoint of one and comparing its reliability (possibly an outlier) with a whole cloud."

            I've worked in almost a dozen IT depts over the last 25 years. In all that time I've have NEVER seen an entire data centre go offline that brought the firm to a standstill. That however seems to be par for the course with a lot of cloud services these days and whenever it happens we get the old "yes , but their reliability if you look over X years is blah blah blah" excuse. I don't give a flying monkeys about X years, I care about day to day and if an entire days business is lost and/or trades or standing orders are not processed it could - certainly for a financial services corp - mean the whole ship going down.

            Cloud services are for clueless SMBs who don't know how or can't be arsed to run their own systems because some know-nothing accountant is in charge who only see's the cost, not the value of internal systems.

            1. Anonymous Coward
              Anonymous Coward

              Re: The Cloud...

              Hmm, your experience doesn't match mine. Some time in the last few years we had a company-wide outage for a couple of days caused by some unfortunate interactions between power surges, UPS'es, management decisions and design compromises. Anonymous coward because, well...

      2. Anonymous Coward
        Anonymous Coward

        "You mean other people's computers that cost much less to run and are far more reliable"

        .... "than the ones you do have control over? That cloud? Where do I sign"

        =====================

        Sure...

        .....But they're also far further away if and when a cable gets cut...

        .....And as a guide, judging Cloud by terms of service today is about as reliable as painting in water...

        .....Who can say what the costs will be or what the service will be like once there's greater adoption.

        .....That is unless of course you agree to pay an extra premium to be a tier-one-customer etc...

        .....In short the external / portable drive is not completely dead yet.

        .....And if ever there are some well-secured hacker-free Router NFS solutions, it'd be handy to keep one at a family residence nearby, that you can drive to in the event of long data center / net outage.

        .....Auto-rerouting on the internet still fails from time to time too. Happened a couple of months ago as reported here on the Reg.

    2. Brian Miller

      Re: The Cloud...

      Right, it's your responsibility to spend more money on them for more redundancy in their network when they trip over the cable.

  2. Alistair
    Windows

    @boltar - datacenters

    I've seen, *cough* 4 DC power downs in 19 years. Two of them were outright mechanical failure of power delivery systems, one was human error on the part of a power provider and the fourth was bad planning of a change, resulting in a fire that took out the switches.

    It is possible to have well supported and managed DCs go completely south. I suppose it should be rarer than in my experience, but it *does* happen.

    Cloud, however, when it goes wrong, being larger, more widely spread and such, can take a *hell* of a lot longer to recover from the outage. The first three took between 6 and 9 hours to recover from, the last, for a smaller, outlying DC took 18 hours, primarily due to having to wait for the fire department to sign off on the rebuild of the damaged bits.

    < times are from *oops* to *everything is back to normal* >

    1. Anonymous Coward
      Anonymous Coward

      Re: @boltar - datacenters

      "It is possible to have well supported and managed DCs go completely south. I suppose it should be rarer than in my experience, but it *does* happen."

      Sure, but any sensible company will have at least 1 backup centre. It seems when cloud services go down they take all backups with them and the whole thing is gone until such time as they fix it. No failover, nothing. The chances of 2 seperate company datacentres going down at the same time, while possible (massive natural disaster, sabotage), are so remote that theres not much point worrying about it , since if it does happen then frankly you've probably got more important things to worry about, like how do I escape from this flood/volcano/zombie apocalyse etc.

      1. Anonymous Coward
        Anonymous Coward

        Re: @boltar - datacenters

        But if you were going to use cloud you would use totally separate cloud operators just like you would use two totally separate data centres.

        The problem is people chuck everything on one cloud and expect it to be somehow totally resilient. One of anything is always bad, 2 is better, 3 is best.

        If you rely on storing your backups with the same supplier you deserve everything you get.

        1. Anonymous Coward
          Anonymous Coward

          Re: @boltar - datacenters

          "The problem is people chuck everything on one cloud and expect it to be somehow totally resilient. One of anything is always bad, 2 is better, 3 is best."

          True, but the extra administrative complexity both on the accountancy and sys admin side not to mention differing APIs if you're using them in some sort of RPC capacity would probably put most companies off.

      2. Anonymous Coward
        Anonymous Coward

        Re: @boltar - datacenters

        Seen the situation where DC2 was working just fine, but the damage at DC1 prevented failing over to it

  3. Mike Shepherd
    Meh

    "But on this occasion..."

    Says it all really.

  4. Anonymous Coward
    Anonymous Coward

    Tell us the brand/type of the SSDs

    So we can avoid them.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like