back to article Microsoft explanation for Visual Studio online outage leaves open questions

Microsoft has posted a resolution report on a recent problem with Visual Studio Team Services, a cloud-based code repository and developer collaboration platform. Users have been unable to log into Visual Studio Team Services "Between 09:10 and 14:28 UTC on 04 Feb 2016, customers attempting to log into their Visual Studio …

  1. allthecoolshortnamesweretaken

    Cloud-based, you say?

    1. Paul Crawford Silver badge

      Yes, it pisses down on you from time to time.

      1. Anonymous Coward
        Anonymous Coward

        it pisses down on you from time to time.

        Not like the local IT department then, who piss you off all the time.

  2. Anonymous Coward
    Anonymous Coward

    99.9% SLA

    Well, they've still got just under 3 hours left this year.....

  3. Anonymous Coward
    Linux

    Of course

    This is why my Nan and I only code in assembler, using an assembler I wrote in machine code. NB Watch the skies, they're coming!

    1. Anonymous Coward
      Anonymous Coward

      Re: Of course

      I'm know! I'm glad you see it too!

  4. Pascal Monett Silver badge

    "built on the enterprise-grade infrastructure of Microsoft Azure"

    Enterprise-grade, except memory management and automatic failover are conspicuously absent.

    So it's more like piling sub-par software on a platform that can fail for several continents after a wrong DNS command.

    So it's totally Microsoft-grade quality, then.

  5. Steve Davies 3 Silver badge

    did the testing team do its job?

    Obviously not. Seems to be an increasingly common thing with MS these days.

    And they still want us to put everything in their cloud? Clod more like.

    1. a_yank_lurker

      Re: did the testing team do its job?

      New Slurp slogan: "Quality Assurance? Wh0 needs stinking QA?"

    2. getHandle

      Re: did the testing team do its job?

      Perhaps they need some DevOps!

      Ahem, sorry - don't know what came over me...

      1. al_langevin
        FAIL

        Re: did the testing team do its job?

        DevOps = Developers. Who do you think caused the problem? DEVELOPERS!!!

        Get back to your coding and check your Stored Procedures while your at it. Microsoft needed QA to check the epic fail by their developers.

  6. Asylum Sam

    Test it?, nahhh, we'll know if it works when we switch it on.

  7. Anonymous Coward
    Anonymous Coward

    Microsoft Visual Studio goes tits-up !!!

    There .. corrected title for proper elREG house style ...

  8. Doctor Evil

    MS developers

    "A SQL stored procedure that was being called was allocating too much memory in one of the critical backend SQL databases"

    You guys used Visual Studio to code that, didn't you?

  9. raving angry loony
    FAIL

    Ah yes, the "cloud"

    Or, in other words, central processing where everyone is affected if something goes wrong, instead of distributed processing where only the local node is affected.

    Well done world, you went all the way around and completely ignored the lessons of the last 50 years. Glad I got out of the business when I did.

    Oh, and just in case someone forgot:

    BWAHAHAHAHA!!

    Warned you to stay away from the evil empire! Told you so!

    BWAHAHAHAHA!

  10. x 7

    I propose that all future references to "cloud" should be prefixed by "cuckoo-"

  11. al_langevin

    Has less to do with the cloud and more to do with developers writing crap Stored Procedures. The world is still filled with crap developers who haven't progressed beyond copying somebody else's crap code.

    If Microsoft had proper QA, then this would have been checked.Yeah for continuous integration because we have to push crap code out as fast as possible. Epic fail.

    1. x 7

      "Has less to do with the cloud and more to do with developers writing crap"

      problem is, with the cloud you are beholden to someone else writing crap and someone else's quality policies

      far better to keep things local: that way if there's a fuckup, you know its YOUR fuckup and YOU can resolve it

    2. Anonymous Coward
      Anonymous Coward

      @al_langevin

      "Has less to do with the cloud and more to do with developers writing crap Stored Procedures."

      I disagree, it has everything to do with the cloud. Because all it does is create a single point of failure and the moment it goes awry many people get to suffer from it. And you see this issue all over the place...

      "No, I don't need to backup my downloaded (bought!) products because I can always download them again".

      Sure, until the moment when the companies website goes offline or whenever you need to do a re-installation without an Internet connection. Then it becomes a different issue.

      "Who cares if this game needs to authenticate itself online?"

      I do whenever I want to play it on my laptop without having an Internet connection. In fact; this goes for any kind of serious software. The moment an Internet connection is demanded then I'm really not that enthusiast anymore about its usage.

      In the end it boils down to the simple strategy of not putting all your money on a single horse.

  12. oldcoder

    5 hours... 99.5% uptime guarantee...

    So that leaves them about 3.7 hours of downtime for the rest of the year.

    Not likely to meet that 99.5% uptime. And given Microsofts history with uptime, the next outage will last several days...

  13. Anonymous Coward
    Anonymous Coward

    Some things are still better done offline

    Games and productivity are good examples.

    All the obsession about cloud and online collaboration is putting the cart before the horse. The online stuff should be an optional bonus feature, rather than a foundation.

    Perhaps this is the brave new world of 'software-as a service', 'always-on Internet connection' software non-ownership of leasing/subscription.

    1. Anonymous Coward
      Anonymous Coward

      Re: Some things are still better done offline

      "Perhaps this is the brave new world of 'software-as a service', 'always-on Internet connection' software non-ownership of leasing/subscription."

      Also a never ending stream of EULAs which are designed to deprive you of your rights.

  14. Bitbeisser
    Mushroom

    Well, there is a reason why I stay away from M$ $QL server as far as I can through the installation media...

    1. Anonymous Coward
      Anonymous Coward

      I can actually through throw the DVD install media pretty far (DVD's are like frisbees) :)

  15. richardcox13

    If you want some detail...

    A much more detailed write up:

    https://blogs.msdn.microsoft.com/bharry/2016/02/05/vs-team-services-incidents-on-feb-3-4/

    and

    https://blogs.msdn.microsoft.com/bharry/2016/02/06/a-bit-more-on-the-feb-3-and-4-incidents/

    The latter includes some rather low level details...

  16. Alister

    SQL Server 2014 memory allocation

    Reading the blog here:

    https://blogs.msdn.microsoft.com/bharry/2016/02/05/vs-team-services-incidents-on-feb-3-4/

    It appears there is a serious bug in SQL Server 2014.

    In the SQL Server 2014 query optimizer they made significant changes to the cardinality estimation. I’m sure they were improvements but not for this query. The cardinality estimation was used to estimate a memory grant for the query (SQL preallocates memory for queries to avoid spills to disk, which are big performance problems and additional memory allocations, which create the possibility for deadlocks. The cardinality estimate is an important input into the memory request).

    In this query, the memory grant estimation shot up from something pretty small to 3.5GB. Given that the server only has 48GB, that meant that it could run very few of these queries before it ran out of memory, causing every query in the system to back up and, essentially, serialize. That caused a traffic jam and resulted in so many of our customer requests timing out/failing.

    The ultimate resolution, for now, is that we added a hint to the query that tells the query optimizer the maximum memory grant to use for the query. It’s expressed in % of memory and, for simplicity’s sake, we set it to 1% of the memory available for this (or more on the order of 160MB). That was enough to unclog the system and allow everything to flow freely.

    It is not clear from the blog whether this is a custom version of SQL Server 2014 used internally by Microsoft, or whether it is the production release. If it is the latter, then anyone running SQL Server 2014 in SQL Server 2014 compatibility mode is likely to suffer issues with massive over-allocation of memory to queries and stored procs.

    Maybe El Reg can clarify this?

    1. richardcox13

      Re: SQL Server 2014 memory allocation

      > It is not clear from the blog whether this is a custom version of SQL Server 2014

      > used internally by Microsoft, or whether it is the production release.

      No it isn't a custom internal version, but the SQL Server used in Azure is not the same as the version you would deploy locally. See Books Online reference for lots of differences. That said they are mostly the same.

      > anyone running SQL Server 2014 in SQL Server 2014 compatibility

      > mode is likely to suffer issues with massive over-allocation of memory to

      > queries and stored procs.

      "is likely": no, not likely. Otherwise current users of SQL Server 2014 (which has been around now for almost two years) would have noticed.

      However you could hit the same bug: in which case raise a support issue to get early access to the fix.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like