back to article Google broke its own cloud, again, with dud DB config change

Google's again 'fessed up to cooking its own cloud. This time the mess was brief – just under two hours last Monday – and took down its Memcache service. The result was “Managed VMs experienced failures of all HTTP requests and App Engine API calls during this incident.” There's a little upside in the fact that Google now …

  1. Anonymous Coward
    Anonymous Coward

    DevOps

    Making the change straight into production and using your customers as testers

    1. teknopaul Silver badge

      Re: DevOps

      Highly doubt that. DevOps is all about test pipelines. Isn't fool proof but I doubt they have not go change control on prod config.

  2. boltar Silver badge

    I wonder how long it'll be ...

    .... before CTOs realise that having all your business critical apps at the mercy of a 3rd party provider (over and above the ISP) is a Really Bad Idea and that having a local failover server might be useful. It strikes me that these people think a business contract is some magic wand that makes all IT issues disappear never to be seen again. I guess you just can't educate pork.

    1. Anonymous Coward
      Anonymous Coward

      Re: I wonder how long it'll be ...

      When OpEx is no longer the preferred accounting method.

      1. Anonymous Coward
        Anonymous Coward

        Re: I wonder how long it'll be ...

        I wouldn't say that OpEx is preferred now. People are usually set up for CapEx because that is the way it has worked forever.

        It isn't so much an OpEx vs CapEx thing. It is a wasting a lot of resource time thing. Cloud charges for something close to actual utilization. On prem charges you for peak utilization 365 days of the year.

        1. Nate Amsden

          Re: I wonder how long it'll be ...

          Almost no IaaS cloud charges for close to utilization. They charge for provisoning. Exceptions typically include object storage.

          Go provision 100 8 cpu vms let them sit at 99% idle and see how much it saves vs running at 80% utilization.

          Go provision 30TB of amazon EBS storage and write 10gb to it, do they charge for the 10Gb? (my main storage arrays operate at about a 10:1 over subscription model and that approach has worked fine for me for a decade).

          If you have a real solid handle on utilization and capacity requirements and ongoing capacity testing then public cloud can be good. Otherwise your most likely either going to be paying out the ass (previous company peaked at 500k/mo roughly 10x what was needed), or you will be having a lot of problems.

          Certainly it is possible to "get it right", seems very few and far between though.

  3. dc_m

    Funnily enough, google apps is running like a dog right now.

    1. Anonymous Coward
      Anonymous Coward

      It's free. You are the product being sold.

      1. Anonymous Coward
        Anonymous Coward

        Really, it isn't free.

  4. joekhul

    Regtard Alerts

    Here comes all the usual Regtard comments from people who have never released a line of code into production in their life. Let's preview what is sure to be the highest voted comments;

    "I have never had a bug in my life. Anyone who has is a loser." - RegTardiusMaximus

    "Trusting 3rds parties is ridiculous. BIGLY" - RegTrumpTard

    "You are the product. YOU. YOU!" - RegTarden Little

    1. boltar Silver badge

      Re: Regtard Alerts

      "Here comes all the usual Regtard comments from people who have never released a line of code into production in their life"

      I've released probaby half a million lines of code into production in the last 10 years. Some mine, some other peoples so you might want to give your nursery school attempts at patronising a miss.

      ""Trusting 3rds parties is ridiculous. BIGLY" - RegTrumpTard"

      We had a major outage with an offsite VM supplier recently that meant our clients couldn't log in to our front end. Luckily we DO have onsight failover systems. But yeah, what the fuck do I know?

      You prize ass.

  5. Claptrap314 Silver badge

    Former Google SRE here. What folks don't understand is just how big Google really is. Google is figuring out how to do SRE as we speak. And they have so many applications that you cannot simply decree best practices throughout the stack.

    The legacy system problem is significantly worse in Google SRE than in most places because the original work was done by sysadmins. These guys were and are smart and dedicated, but they had the skillset and mindset of sysadmins, not professional programmers. As such, a lot of the legacy software is unmaintainable, and requires deep or complete rewrites.

    And the whole thing is just so big that you just cannot know where all the wtf's are lurking, let alone which ones are likely to bite you next.

    So there will be major fails. What will be interesting to watch is what their incident rate looks like compared to AWS at similar points in maturity. I don't think they claim to be caught up already.

    1. Nate Amsden

      Can't imagine it's that bad at google. I have been in the SaaS space for 14 years and have seen exactly 1 SRE (though at the time he was a "performance engineer" maybe not quite a SRE but the term SRE didn't exist at the time as far as I recall)any of the companies I have worked at.

      1. Claptrap314 Silver badge

        I'm sorry. Did I say it was "bad"? The industry as a whole is trying to figure this stuff out, and Google is naturally at the forefront of a lot of it & therefore making lots of mistakes. Mistakes that hopefully later entrants will be able to skip entirely.

        The only thing I was trying to point out is that one of their earliest mistakes was to use the wrong people for the job of SRE. Again, this happened before they even understood that SRE was something that needed to be created, so it's not "bad" in the sense of "worse than other places", but merely "worse than someone whose not been there might expect".

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019