back to article Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage

BBC techies have no idea why the load on its database "went through the roof" last weekend, when Auntie was struck by a huge, two-pronged outage that caused its iPlayer service and website to go titsup. During the downtime, the Beeb was pretty reticent on social media about what had gone wrong, preferring instead to simply …

  1. Warm Braw

    58 application servers and 10 database servers for metadata

    Hm. That could be the problem, then...

    1. Anonymous Coward
      Anonymous Coward

      Re: 58 application servers and 10 database servers for metadata

      Why? Care to elaborate?

      1. Warm Braw

        Re: 58 application servers and 10 database servers for metadata

        >Why? Care to elaborate?

        My (limited) understanding is that these servers do not exist in such quantity just to provide capacity and redundancy but because the metadata has to be patched together from a wide variety of existing systems that were originally intended to serve other purposes.

        Even the programme content itself seems to come from a variety of sources (some from source material, some off-air or at least off-playout-system).

        Complexity makes for problems that are hard to both diagnose and fix.

      2. Daggerchild Silver badge

        Re: 58 application servers and 10 database servers for metadata

        Because typically the only dynamic part is which part of a linear data progression you select, as the answer will in nearly all cases be constant, it being a historical record.

        So you'll need some form of nearby storage of a linear constant string of data, probably chopped into manageable chunks, of which only the latest portion is typically hot.

        Yes you *could* put it into multiple databases, frontended by multiple application servers.. but there is another hierarchical system with mass low latency access to large datasets and automatic in-memory caching of hot data..

  2. Dr Who

    It would make sense that if the cache was wiped, the load on the database servers would suddenly shoot through the roof as every request would have to be served from the original metadata. The cache failure may therefore be the root cause of the problem, not a coincidental second problem.

    1. Matt 21

      Makes sense

      An up-vote from me. However, I'm surprised the load is that high that it can't be server by 10 DB servers. Even if each request required a write I'd expect 10 DB server to manage between 10000 and 30000 requests a second (the ones I'm working on here certainly can). and even more if it was only reading.

      So, if we take the higher write count, assuming they transactions aren't that large we're looking at 1.8 million requests a minute. I find it hard to believe that the British public could put that kind of load on the system for hour after hour (Wiki says that around 70% of access is from the UK and very little at all if we only look at iPlayer)..

      I believe the all time peak for hits on the BBC was 1 billion in one day after the 7th of July bombings. So at the rate of transactions/queries outlined above it should only take 9.3 hours of processing time even without caches.

      I'm sure it's more complicated than that as some pages require more than one lookup but I'm still unsure why removal of the cache knocked the site out for so long.

      PS. The numbers are just rough figures to play with.

      1. Dr Who

        Re: Makes sense

        I agree. It's certainly unlikely and it does sound too simple. But it fits. Each time they switched from their emergency site back to the full site the thing died, presumably because the missing cache caused a database overload. So they then switched to the emergency site for several hours, probably to restore the cache from a backup, meaning it wouldn't have to be rebuilt organically.

        Fun as it is shooting in the dark, it would be rather nice for the Beeb's technicians to provide El Reg with a full explanation, so that we can all take away the learnings (you've no idea how much I hate that phrase but I'm sure they use it a lot at the BBC).

  3. John 156

    Outsourcing to India looks like the cause.

    1. iRadiate
  4. John Deeb
    Pirate

    speculation

    "The timing of the outage came just days after the BBC's Internet Blog ... celebrated the fact that it had been nearly a year since the Corporation .... moved live processing into the cloud".

    Perhaps somebody got the wrong idea and the timing this week with the internation media contest "how to pizz off 75% of the Russian population" might also have provided fuel. As mentioned above, the caching service might have been targetted and then it's just a question of stressing the load.

  5. John Brookes

    Don't know if it's related, but...

    ...iPlayer Radio now has 1mth catchup. I'm pretty sure it was 1 week before the weekend....

    Both of those statements are based on the 'days left to listen' bit below each listing in the category sectionof the android app...

    Can anybody with a better memory than me confirm it changed at the w/e?

    1. GlenP Silver badge

      Re: Don't know if it's related, but...

      I noticed that, however as of late last night some programmes from after the outage (e.g. Monday's I'm Sorry I Haven't a Clue) still weren't available so I dispute that the system is back to normal.

      1. William Towle

        Re: Don't know if it's related, but...

        > I noticed that, however as of late last night some programmes from after the outage (e.g. Monday's I'm Sorry I Haven't a Clue) still weren't available so I dispute that the system is back to normal.

        The system is still returning to normal service, evidently; further to my previous post, Pick of the Pops [Saturday] became available mid-week, as has the third episode of "It's a Fair Cop" ... which is roughly on schedule however the bbc.co.uk availability data says ep1: three weeks; ep2: four weeks; ep5: five days.

        Clue fans will be happy to read, from a get-iplayer search earlier (Fri AM):

        11473: I'm Sorry I Haven't A Clue: Series 61 - Episode 3, BBC Radio 4, Comedy,Highlights,Popular,Radio, 3 days 20 hours ago - Harry Hill joins regular panellists Tim, Graeme and Barry. Jack Dee hosts.

        11474: I'm Sorry I Haven't A Clue: Series 61 - Episode 4, BBC Radio 4, Comedy,Highlights,Popular,Radio, 0 days 1 hours ago - Harry Hill joins regular panellists Tim, Graeme and Barry. Jack Dee hosts.

  6. Mage Silver badge

    iPlayer

    I don't give a fig about iPlayer*.

    But iPlayer and normal web content should not be the same servers.

    (* I don't have enough Cap for ANYONES video, not YouTube or Netflix (So I buy DVDs) and most of iPlayer doesn't work outside UK, which is where I happen to be. I get all the Broadcast UK content live fine though none via DAB platform and my media PC can record 2x DTT, 2 x Satellite (from four satellite positions) and 1 x Analogue Radio simultaneously (100kHz to 1300MHz, Analogue includes up to 8kHz bandwidth narrow band data such as PSK or FSK Weatherfax) UHF Digital reception, Motorised Sat Dish and 28 + 19 + 13 + 9 E sat reception with 16 outlets).

    1. Valeyard

      Re: iPlayer

      you seem to have started talking about one thing and then listed lots of random letters and numbers, when in reality everyone is thinking "but you still pay for the bbc licence, right?"

      1. Mage Silver badge

        Re: iPlayer

        There is no BBC Licence. There is a UK tax collected by BBC that partially funds BBC. They refuse to collect it outside UK.

        Of course some countries there is a TV tax applicable even if you can't get local reception.

        The BBC may be partially funded from UK TV tax. It's not a BBC tax though. It's a tax for being able to receive UK TV in the UK.

        1. Anonymous Coward
          Anonymous Coward

          Re: iPlayer

          "There is a UK tax collected by BBC"

          If you think that then buying bread and milk is a tax on living, buying an air-con unit is a tax of comfort, buying wine is a tax of relaxing after work...You sound like some sort of disgruntled old man here.

          And fyi I have no problem getting iPlayer when I am abroad! Chrome even has plugins to make it seemless.

          1. itzman
            Paris Hilton

            Re: iPlayer

            Hard to make it seem less than it already is..

          2. Anonymous Coward
            Anonymous Coward

            Re: iPlayer

            It is classed as a tax

            http://www.ons.gov.uk/ons/guide-method/classifications/na-classifications/classification-articles/public-sector-broadcasting/broadcasting--how-ons-will-classify-public-sector-broadcasters.html

        2. Caesarius
          Flame

          Re: TV Tax

          <rant>And I feel bound to point out that I pay for all channels funded by advertisements, even if I watch no TV ever anywhere.</rant>

          Sorry: that was a bit off-topic.

  7. graeme leggett Silver badge

    Probably not related

    But when I went to the BBC homepage half an hour ago, I was greeted with a page from yesterday complete with "Wednesday, 23 July" in big letters.

    And refreshing the cache didn't shift it.

  8. Daggerchild Silver badge
    Gimp

    Your techies have failed you, come to the Cloud, we have unicorns...

    Cloud will make all your problems go away, trust the cloud, no nasty techies who will tell you not to do things. Databases made of candy, worldwide loadbalancing with every bite!

    Don't listen to your internal techies. Throw them away! Let go of your clue. Release yourself from responsibility and self-reliance. Open yourself to the freedom of total dependency. If anything ever goes wrong, you can blame us for everything, rant at us, pick up the phone and scream at us, it's all goooood. Come to ussss... come.....

    *starts humming Hotel California*

    1. Anonymous Coward
      Trollface

      Re: Your techies have failed you, come to the Cloud, we have unicorns...

      Yep, all those in-house techies and expertise really helped on this occasion!

  9. Richard Wharram

    So they moved away from Kontiki to their own platform?

    Did they?

  10. Zog_but_not_the_first
    Joke

    An alternative explaination...

    I believe the outage was caused by Samantha rummaging around in the record department with one of the archivists.

    Over to you for the details...

    1. Chemist

      Re: An alternative explaination...

      "I believe the outage was caused by Samantha rummaging around in the record department with one of the archivists."

      I believe the outage was caused by Samantha rummaging around the department archivist whilst going for a record - fixed (although really I Haven't a Clue)

      1. graeme leggett Silver badge

        Re: An alternative explaination...

        You'll feel better - once you've had your tea?

  11. JPL

    Cricket Live Scores not fixed yet - same problem?

    The live score page (http://www.bbc.co.uk/sport/cricket/live-scores) currently shows the close of play score for the first day of the second test match between Sr Lanka and South Africa. That's OK.

    But it also shows the in play score for day 4 of the first test, which ended with a SA victory on day 5 (20 July). Not quite what I would call a live score.

    1. Anonymous Coward
      Anonymous Coward

      Re: Cricket Live Scores not fixed yet - same problem?

      To be fair though it IS cricket, and the person who updates the scores probably fell into a coma whilst waiting for some excitement in the game and hasn't yet woken up.

      1. Anonymous Coward
        Anonymous Coward

        Re: Cricket Live Scores not fixed yet - same problem?

        Two sorts of people in this world:

        1. Kind patient nice people.

        2. People who don't like cricket.

  12. Andy The Hat Silver badge

    But what caused the excessive load?

    They still haven't really said where the excessive load came from - a failure internal to the system or something like a DoS external to it ...

    If it was an internal failure it's unlikely that both systems (geographically seperate hot mirrors) would fail at the same time - unless there's a fundamental bug in the (replicated) system? The who point of their system redundancy (as I understand) is to kill one system if the other fails which they could obviously not do which infers the caching is not a redundant system ...

    All this suggests to me the failure could have been triggered very close to the external gateways which again suggests external influences rather than internal ones.

    Would the BBC tell us if they could be crippled by a DoS attack?

    1. PeeKay
      Black Helicopters

      Re: But what caused the excessive load?

      Couldn't help but notice that the iPlayer apps for IOS and Android were all updated yesterday...related perhaps?

      1. Anonymous Coward
        Anonymous Coward

        Re: But what caused the excessive load?

        thanks or reminding me to check the app store - radio is back on iPlayer for Windows Mobile

  13. Anonymous Coward
    Happy

    who cares....

    It's all shite anyway.

    I weened myself off the shouty drama, posh twit presenters and insane stylised editing years ago.

    A couple of years ago I weened myself off the 'News' Lies. Can't even stand the radio.

    Caught myself watching the '6 O'Clock Lies' at a friends house the other day and the lobotomised presenter was talking about the 'recovery' and 'unprecedented fall in unemployment' with a straight face! ROFL HAHAHAHA!

    1. Yet Another Anonymous coward Silver badge

      Re: who cares....

      >It's all shite anyway.

      In our Time, News Quiz, Sorry I haven't a clue, Old Harry's Game, Cabin Pressure, Hut 33, Museum of curiosity, Absolute Power + anything written by John Finnemore or Andy Hamilton

      Of course the radio-with-pictures is unwatchable crap.

      1. Anonymous Coward
        Anonymous Coward

        Re: who cares....

        John Finnemore - worth the licence fee alone.

  14. This post has been deleted by its author

  15. Lee D Silver badge

    - He confessed that "restoring the service itself is not as simple as rebooting it (turning it off and on again is the ultimate solution to most problems)."

    BLOODY MICROSOFT HAS A LOT TO ANSWER FOR. No, a reboot is NOT the ultimate solution to most problems. It's a temporary stopgap when you confess that you have no idea what happened and cross your fingers that the unknown problem will never happen again.

  16. Badvok
    WTF?

    1200 devices supported - WTF?

    Anyone care to have a go at enumerating that list? No wonder they haven't got around to supporting the Xbox One yet.

    On the other hand this might indicate a significant flaw in their system because it might mean that there is a completely separate software to support each version of Android on each different handset!

    1. Anonymous Coward
      Anonymous Coward

      Re: 1200 devices supported - WTF?

      1200 devices sounds a pretty small list. All those different TVs, PVRs, BluRays, Game Boxes, Tablets, and other non-standard bits of kit over the years. PCs and things using web browsers are easy as the browsers keep getting updated. But think of the TVs whose firmware tends to get abandoned after a couple of years. No more updates and just have to hope that manufacturer followed the spec exactly.

      I have been involved with supporting a few bits of abandoned hardware over the years. PVRs built by companies who then go bankrupt. Each change of iPlayer, Freeview, etc all comes with us all crossing our fingers that the old boxes will keep stumbling onwards. Hardware manufacturers don't always follow specs correctly, so I bet the Beeb has all kinds of daft little work-arounds for some of these bits of kit.

      IMHO - This new iPlayer update seems to have been bodged through testing. It may look nice on a tablet, but it is hopeless to now try and use on a big screen TV with a remote control. It makes me wonder if this is an area that could have caused an issue. Some of those older hardware products could just plain be having problems accessing iPlayer data and asking stupid questions to the databases. The bizarre lack of usability testing makes me think this could be a distinct possibility...

  17. Forget It

    Normally these weird poorly explained outages

    are due to installation of NSA friendly backdoors.

    But why would they be interested in what we are

    watching listening to?

  18. Anonymous Coward
    Anonymous Coward

    Treacle telly

    Still seemed a tad sluggish last night (Wednesday), using my telly, rather than the PC. Still, I suppose we should be grateful for reasonably watchable quality from the Beeb over the net when it's working properly. Have you seen the foul pixellated stuff put out on the ITV player? Faces quite often degenerate into a Google unrecognisable blurred look if moving fast. Tried to watch Darling Buds of May last night and it reminded me of good ol' VHS. Bloomin' 'orrible quality on my Panny HD set. No - on second thoughts, even my old Super VHS tapes look surprisingly good on that. Let's equate it with 8mm movie film.

  19. Anonymous Coward
    Anonymous Coward

    Has the BBC moved its databases over to XML-based MarkLogic now, or will these have been unfashionable SQL databases?

  20. AndrueC Silver badge
    WTF?

    So was it a coincidence that the weather site was also very flaky?

    1. John Brown (no body) Silver badge
      Coat

      Flaky weather in a summer heatwave? Thats' no joke!

  21. Robert E A Harvey

    Thursday night

    On the web interface the category "Drama - crime" was empty, despite having 20-odd entries on the IOS app. It all seems a bit ricketty still

    1. Robert E A Harvey

      The next Saturday

      The 'on denand' content for the news quiz was some dodgy music. I tweeted, they deleted the episode.

  22. angelo c

    Maybe a call to AWS could avoid any such future events, I have a brilliant contact!

    Amazon Web Services

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like