back to article I got 502 problems, and Cloudflare sure is one: Outage interrupts your El Reg-reading pleasure for almost half an hour

Cloudflare, the outfit noted for the slogan "helping build a better internet", had another wobble today as "network performance issues" rendered websites around the globe inaccessible. The US tech biz updated its status page at 1352 UTC to indicate that it was aware of issues, but things began tottering quite a bit earlier. …

  1. Spacedinvader
    Trollface

    Is El Reg

    pondering if maybe they depend on its services just a little too much?

    1. Martin Summers Silver badge

      Re: Is El Reg

      Well it seems these issues have only been happening since El Reg moved to using them. I think it's just so popular they keep knocking Cloudflare over. </end brown nosing>

      1. diodesign (Written by Reg staff) Silver badge

        'knocking Cloudflare over'

        Aw shucks. Don't let the other websites know, they'll only get envious.

        C.

        1. Youngone Silver badge

          Re: 'knocking Cloudflare over'

          Wait, there are other websites?

          1. Flywheel Silver badge

            Re: 'knocking Cloudflare over'

            Nah, that's Fake News, surely ?!

    2. diodesign (Written by Reg staff) Silver badge

      Re: Is El Reg

      Given that we've faced multi-gigabit DDoS waves in the past for annoying black hats, Cloudflare's CDN is particularly useful in staying online at the moment.

      We are planning to expand our infrastructure tho to improve connectivity (and then IPv6 etc etc)

      C.

      1. Nick Ryan Silver badge
        Pirate

        Re: Is El Reg

        Given that we've faced multi-gigabit DDoS waves in the past for annoying black hats, Cloudflare's CDN is particularly useful in staying online at the moment.

        ...and Microsoft and Apple and IBM and Sun and Google and Adobe.. and [n]. :)

        Unless it turned into a sales advertorial for cloudfare, a write up of the scale and what it takes to keep el reg online, it would be quite an interesting write-up for us commentards to read. Without wanting to encourage more attacks of course...

  2. ArrZarr Silver badge
    Happy

    I will most certainly be using Cloudflare's update including the phrase "...caused primary and secondary systems to fall over." as a reason to include the term "Falling over" as a technical term for TITSUP* situations. If it's good enough for their CEO, it's good enough for me.

    *Total Inability To Send Users Pages

    1. Anonymous Coward
      Anonymous Coward

      Gin or Vodka?

      When Cloud Faire engineers go on a bender at work; is their favorite spirit Gin or Vodka? Obviously day drinking is a requirement for working at Cloud Flare... But all their techs should be reminded that drinking Gin or Whisky is preferable to Vodak; as then management can tell customers their Brainiacs were drunk, not Stupid!

    2. jake Silver badge

      "falling over" or "fell over" ...

      ... has long been used for a system crash, ABEND or other TITSUP[0] event. See here. I don't know how far back the term goes, but it was in common use when I was hacking the pre-BSD kernel at Berkeley.

      [0] Today It Totally Stopped User Processes

      1. ArrZarr Silver badge

        Re: "falling over" or "fell over" ...

        It has, however, always been a fairly informal way of indicating a TITSUP*. Hopefully we can now treat it as a technical term.

        *Toppling IT Servers Uncovering Problems

        1. jake Silver badge

          Re: "falling over" or "fell over" ...

          You may call it "informal", but I've been hearing it at Board level meetings and seeing it written in failure reports for several decades now. At the very least, it's in the common vernacular.

          When you think about it, it is one of the few technical terms that you don't have to translate into single syllable words before the C* suite understands it. Handy.

          1. ArrZarr Silver badge
            Happy

            Re: "falling over" or "fell over" ...

            Screw the C-Suite, I'm talking about client comms here - the people doing the work at the client will also need to translate "Greatly increased CPU load leading to cascading server failures" into "Fell over", but the externally facing paper trail is the formal bit.

            1. jake Silver badge

              Re: "falling over" or "fell over" ...

              I've been using it with clients for decades, too.

              It's a known thing everywhere I'm aware of.

              Try it (in your example "The computer was overloaded and fell over"). Report back.

  3. Falmari

    Ah so that explains why access to El Reg was crap about an hour ago.

  4. Pen-y-gors Silver badge

    502

    Got a 502 on ElReg a bit earlier - assumed it was just Putin or the Chinese DDOSing again.

  5. LeahroyNake Bronze badge

    All eggs one basket?

    Now it seems that you can put your eggs in lots of baskets and if one of them goes bye bye than so does whatever is depending on it / ElReg.

    1. Vometia Munro

      Re: All eggs one basket?

      Hmm, saw "eggs" and managed to read the rest of the sentence as being about lots of breakfasts and digesting.

    2. VikiAi Silver badge
      Go

      Re: All eggs one basket?

      Distributing your eggs in many baskets, then putting all your baskets in one cart.

  6. Chris G Silver badge

    Old man shouts at cloud

    That was me, I was also shouting at a pc and a tablet!

  7. DJV Silver badge

    "bad software deploy"

    Aha, the old BOFH pad of random excuses!

    "Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again."

    ...until the next time, that is!

    1. John 104

      Re: "bad software deploy"

      It's called testing. It requires a test lab. And not the dev's laptop. I've worked in IT for around 20 years and I've worked at 1 company that actually had a copy their production environment to test on. We never had a deployment failure. Not once. Everyone else just mangles together something in a half baked effort and then management screams bloody murder when a deployment goes sideways. This is, of course, after being told that spending on a proper lab would be ideal...

      1. IGotOut

        Re: "bad software deploy"

        I take it you understand the size of Cloudflare? That will be one hell of a test lab. And even then something as a slight different NIC can take it down. All the best laid plans etc

        1. Yet Another Anonymous coward Silver badge

          Re: "bad software deploy"

          Couldn't they have tested on a spare internet?

        2. yoganmahew

          Re: "bad software deploy"

          Yeah, the whole 100% test coverage is spin from some consultant based on some academic with a toy network. It's complete spoof.

      2. Claptrap314 Silver badge

        Re: "bad software deploy"

        Yeah, I've worked at G. Studied some of the FB papers. When you do this stuff at scale, even when you do it right, human error happens. That includes when you try to figure out which human error can happen and what to do about them.

        I've also done microprocessor validation at AMD & IBM, so even if all of your code and processes are perfect, it is in fact possible (although HIGHLY unlikely) that the processor executing the code will itself have a different idea.

        So if you were for a period of time at a place that had good processes, and was small enough that no fails happened anyway, that's wonderful. But don't expect that experience to scale, because it does not.

      3. Potemkine! Silver badge

        Shit happens

        Problem is with such a huge system you can't have a testing environment the same size than the production one, there's no InternetTest network.

        Of course, having a test lab is a very good thing and avoids a lot of problems, but there may still be real-life conditions that can't be emulated, it cannot be an absolute guarantee against failure.

        IT is so complex I even wonder why it doesn't fail more often ^^

  8. chivo243 Silver badge
    Pint

    Scared

    We were doing some switch upgrades today, and EL Reg didn't come up afterwards, then BAM 502! I was never so glad to see that message, we were done and packed up already!

    1. Claptrap314 Silver badge

      Re: Scared

      My first connect test is ping 1.1.1.1. Let's not insist that TCP is working right away...

  9. Ken Moorhouse Silver badge

    fingering a colossal spike in CPU usage

    Is it Patch Tuesday already?

  10. Anonymous South African Coward Silver badge

    Today's excuse:

    Dynamic Programming Interrupt

    Sounds feasible. Okay, let's do it.

  11. Blockchain commentard Silver badge

    Strange - when I couldn't get on el Reg, it said something about Cloudflare. Went onto another UK Cloudflare based site and it worked fine. Looks like they were probably just taking el Reg down.

  12. The Original Steve

    Independence

    Am I the only person who is a little uncomfortable about Cloudflare? Not just it's dominance in the market it plays in, but also that El Reg uses it.

    I have nothing against them, and actually think they are a great company who have done some incredible innovation. I have no issue with them per se. But it just doesn't fit right to me that the mighty El Reg - who operate using open source (https://www.theregister.co.uk/about/company/website/) - have such a dependency on a commercial 3rd party.

    Where does it end? The ethos of El Reg comes across to me as being fiercely independent which I like (they have cynicism for all IT vendors equally), but being so dependent on a sole provider just doesn't seem right. I'd like to think that they have half their servers in one colo, and their others in a different one, with different telco's (inc backhauls) supplying connectivity.

    I know that they'll likely be dependent on lots of commercial 3rd parties (from hosting to water supplier) but the (valid) DDoS comment aside it's an optional choice to place your tin behind Cloudflare, not a technical necessity. Proudly declaring your technology stack which is all open source on your website just doesn't seem to fit with funneling every inbound packet over single for-profit 3rd party. Might as well use Microsoft/Oracle/IBM (urgh - I feel dirty even writing that) if you're going to give up any semblance of ownership and independence by slinging everything to a commercial 3rd party.

    (I know that Cloudflare are also big users and contributors of OSS - it's not that I think it's proprietary - it just doesn't seem to fit with the independent nature of El Reg. I have a huge amount of respect for both organisations and wish them all the very best)

    1. Anonymous Coward
      Anonymous Coward

      Re: Independence

      Independent, not wealthy. And there is no opensource fiber.

      1. This post has been deleted by its author

    2. Anonymous Coward
      Anonymous Coward

      Re: Independence

      Sometimes the IT that El Reg bites, bites back I guess.

      1. Anonymous Coward
        Anonymous Coward

        Re: Independence

        >Sometimes the IT that El Reg bites, bites back I guess.

        Nearly but perhaps..

        IT, biting the hand that feeds El Reg

    3. Marco Fontani

      Re: Independence

      it just doesn't fit right to me that the mighty El Reg - who operate using open source [...] have such a dependency on a commercial 3rd party.

      We also have another hard dependency on a commercial third party in the form of the providers of the servers we use; same goes for the commercial third party OS installed in the load balancer, the firewall, etc. as well as other bits and pieces which there's either no free software or open source version available for, or for which it's infeasible to use one. I don't think it's avoidable much. Where should one stop? Organically in-house grown free BIOS-laden servers?

      DDoS comment aside it's an optional choice to place your tin behind Cloudflare, not a technical necessity

      Having a sorta kinda CDN in front of the infrastructure provides other technical tangible benefits. Substitute Cloudflare with Akamai or Fastly and it'd be kinda the same, modulo feature set. Should we hand-roll our own CDN? I strongly prefer not to, and I do like the fact I don't have to as there's a commercial service available which can do it for us. The only other alternative would be to not have one at all, and that'd be worse for us, even worse than having to hand-manage a home-rolled one.

      Unfortunately, as all things - sometimes things go TITSUP and there's not a lot we can do about it.

      At other times, some of our previous ISP's network went TITSUP - and there wasn't a lot we could do about it, either. We can control some things; just not all of them; or, if we can - it's probably too time consuming to control it down to the tiny bits.

      What we can and do control is what's running on our servers, and that's a fairly healthy mix of mostly free and open source software, with some commercial stuff peppered in-between.

      Just my 2c :)

    4. IGotOut

      Re: Independence

      I've used Cloudflare on my little website and even with that I can see the advantages.

      Faster response times, less load on the server, huge savings on bandwidth, Geoblocking and on and on.

      They said what the fault was and they are trying to work out what happened.

      1. llaryllama

        Re: Independence

        Once you get into page rules and other features it's extremely powerful for the price. Most of the pages on our site are static so I set up page rules to cache them along with all the images, fonts etc. used by dynamic and static parts of the site. You can block or challenge visitors with lots of parameters to fine tune. Oh and you get brownie points with Google search rankings for having a fast site as well.

  13. Ken Moorhouse Silver badge

    Next time...

    ...the residents of Guadalajara would appreciate a bit more warning.

    https://www.bbc.co.uk/news/world-latin-america-48821306

  14. JohnFen Silver badge

    Single points of failure are bad

    While I'm unaware of Cloudflare acting in an objectionable manner, the widespread use of Cloudflare has long caused me a great deal of nervousness, and this sort of thing is one of the reasons why.

    I think it's a mistake for so much of the internet to be so centralized. It's a huge part of why the internet has become so brittle.

  15. Doctor Syntax Silver badge

    "the internet is a brittle thing"

    When it was first designed it was supposed to be resilient, proof against chunks of infrastructure being taken out in a nuclear attack. "Routes round damage" was a popular term.

    Somehow we forgot.

    1. Totally not a Cylon

      The internet is fine, it's just all the pages which are broken.

      A quick shufti at the page source is revealing.

      Back in the day we would hand code html to get the page and all images into a few KB to ensure fast loading on 300baud modems. It also made sites very resilient.

      Now it's all js script and dynamic pages with bits from ten's or hundred's of different sites; it only takes one of these to be Titsup to kill the original site. All because of 'metrics' and 'tracking'.

      1. defiler Silver badge

        300 baud? HTML? You have a valid point without having to exaggerate. 9600bps was pretty mainstream when web pages first appeared, with 14400 available.

        You're right, though, in that I spent a lot of time hand-coding HTML, and squeezing every last byte it of images. Now bandwidth is plentiful so nobody cares. It's a choice between paying a human to optimise stuff Vs paying for bandwidth. Humans are expensive.

        Edit - you're definitely right about the tracking/metrics though!

        1. Blane Bramble

          Of course your (V32) 9600bps modem operated at 2400 baud.

    2. jake Silver badge

      "When it was first designed it was supposed to be resilient, proof against chunks of infrastructure being taken out in a nuclear attack."

      Oft repeated, but simply not true. The networks that were designed to survive nuclear attack included the "Minimum Essential Emergency Communications Network", or MEECN, and the prior "Survivable Low Frequency Communications System" or SLFCS, Besides, if you use an ounce of common sense, it only stands to reason ... no military would design a command and control system that inherently wasn't securable, and the Internet was not then, and still isn't, securable.

      In The Beginning, the first two nodes of what became TehIntraWebTubes were at SRI and UCLA, conceived, designed, implemented and run by students and professors. With no Pentagon oversight, input or anything else "intellectual". Money, yes. Oversight, no.

      Boiling it down to basics, the (D)ARPANET was just a research network designed to research networking. The "survives nukes" myth came about much later ... The only reason it was built to be resilient is because the existing hardware was really, really flaky.

    3. doublelayer Silver badge

      Even if we could magically decentralize CloudFlare and make people write nice HTML or at least store their own scripts, the internet wouldn't be a lot less fragile. The reason for that is that there are very few places that process all our traffic. There's only one line leading to your house that actually works, but that's a short length that isn't the main issue. The issue is that there's only one line that connects your ISP's local unit to whatever center they have for sending it out of local, and only a few lines (or maybe just one) connecting large areas to other large areas. What happens when cables stop working? Large parts of the internet lose connectivity. Routing around that kind of damage requires a web of lines, but a lot of the world operates on chains of lines instead. It's hopeless; the internet can't really route around damage. We just put our systems in lots of parts so we can weather most small disconnects and otherwise we're hoping nothing really bad happens.

  16. Pirate Dave Silver badge
    Pirate

    You might know...

    it fell over for me right as I clicked the link to go to the Comments section for the Deep Nudes story. At first I thought the company webfilter was gonna squeal about so many semi-naughty words on a page, then realized the 502 message was coming from Cloudflare. Phew, that was close...

    Not that I read El Reg for fun at work. It's "Industry News", not leisure reading. Yeah...

    1. Anonymous Coward
      Anonymous Coward

      Re: You might know...

      Reading el Reg at work is staying abreast of industry trends, especially with articles like the Deepnudes one

      1. Anonymous Coward
        Anonymous Coward

        Re: You might know...

        abreast

        I see what you did there.

        1. Anonymous Coward
          Anonymous Coward

          Re: You might know...

          Shamelessly stolen from BOFH.

  17. Jove Bronze badge

    Cloudflare are not the only ones that have problems ...

    ... it appears that BT's services took a hit, at least for part of the country, earlier today.

    1. TimR

      Re: Cloudflare are not the only ones that have problems ...

      And irony of ironies, down detector was down for a while - in UK, at least...

      1. Anonymous Cowtard

        Re: Cloudflare are not the only ones that have problems ...

        https://downforeveryoneorjustme.com/downdetector.com

  18. Potemkine! Silver badge
  19. fastmack

    don't update software on live systems daft or what raymond petrie aberdeen scotland

    1. Alister Silver badge

      don't update software on live systems

      Right, they should have updated the backup internet first

  20. Aladdin Sane Silver badge

    I was dangerously close to having to do some actual work. Please don't let it happen again.

    1. phuzz Silver badge
      Pint

      Pretty much all our customers use Cloudflare now, so after answering the panicked phone calls ("nope, it's cloudflare, nothing we can do, sorry"), it was pretty much beer-o-clock until the sods got it back running again.

  21. fredesmite Bronze badge

    Remember - Cloud computing

    Is nothing more than some provider let you run your crap on their machines that other people are using at the same time.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019