Is El Reg
pondering if maybe they depend on its services just a little too much?
Cloudflare, the outfit noted for the slogan "helping build a better internet", had another wobble today as "network performance issues" rendered websites around the globe inaccessible. The US tech biz updated its status page at 1352 UTC to indicate that it was aware of issues, but things began tottering quite a bit earlier. …
Given that we've faced multi-gigabit DDoS waves in the past for annoying black hats, Cloudflare's CDN is particularly useful in staying online at the moment.
...and Microsoft and Apple and IBM and Sun and Google and Adobe.. and [n]. :)
Unless it turned into a sales advertorial for cloudfare, a write up of the scale and what it takes to keep el reg online, it would be quite an interesting write-up for us commentards to read. Without wanting to encourage more attacks of course...
I will most certainly be using Cloudflare's update including the phrase "...caused primary and secondary systems to fall over." as a reason to include the term "Falling over" as a technical term for TITSUP* situations. If it's good enough for their CEO, it's good enough for me.
*Total Inability To Send Users Pages
When Cloud Faire engineers go on a bender at work; is their favorite spirit Gin or Vodka? Obviously day drinking is a requirement for working at Cloud Flare... But all their techs should be reminded that drinking Gin or Whisky is preferable to Vodak; as then management can tell customers their Brainiacs were drunk, not Stupid!
You may call it "informal", but I've been hearing it at Board level meetings and seeing it written in failure reports for several decades now. At the very least, it's in the common vernacular.
When you think about it, it is one of the few technical terms that you don't have to translate into single syllable words before the C* suite understands it. Handy.
Screw the C-Suite, I'm talking about client comms here - the people doing the work at the client will also need to translate "Greatly increased CPU load leading to cascading server failures" into "Fell over", but the externally facing paper trail is the formal bit.
It's called testing. It requires a test lab. And not the dev's laptop. I've worked in IT for around 20 years and I've worked at 1 company that actually had a copy their production environment to test on. We never had a deployment failure. Not once. Everyone else just mangles together something in a half baked effort and then management screams bloody murder when a deployment goes sideways. This is, of course, after being told that spending on a proper lab would be ideal...
Yeah, I've worked at G. Studied some of the FB papers. When you do this stuff at scale, even when you do it right, human error happens. That includes when you try to figure out which human error can happen and what to do about them.
I've also done microprocessor validation at AMD & IBM, so even if all of your code and processes are perfect, it is in fact possible (although HIGHLY unlikely) that the processor executing the code will itself have a different idea.
So if you were for a period of time at a place that had good processes, and was small enough that no fails happened anyway, that's wonderful. But don't expect that experience to scale, because it does not.
Problem is with such a huge system you can't have a testing environment the same size than the production one, there's no InternetTest network.
Of course, having a test lab is a very good thing and avoids a lot of problems, but there may still be real-life conditions that can't be emulated, it cannot be an absolute guarantee against failure.
IT is so complex I even wonder why it doesn't fail more often ^^
Am I the only person who is a little uncomfortable about Cloudflare? Not just it's dominance in the market it plays in, but also that El Reg uses it.
I have nothing against them, and actually think they are a great company who have done some incredible innovation. I have no issue with them per se. But it just doesn't fit right to me that the mighty El Reg - who operate using open source (https://www.theregister.co.uk/about/company/website/) - have such a dependency on a commercial 3rd party.
Where does it end? The ethos of El Reg comes across to me as being fiercely independent which I like (they have cynicism for all IT vendors equally), but being so dependent on a sole provider just doesn't seem right. I'd like to think that they have half their servers in one colo, and their others in a different one, with different telco's (inc backhauls) supplying connectivity.
I know that they'll likely be dependent on lots of commercial 3rd parties (from hosting to water supplier) but the (valid) DDoS comment aside it's an optional choice to place your tin behind Cloudflare, not a technical necessity. Proudly declaring your technology stack which is all open source on your website just doesn't seem to fit with funneling every inbound packet over single for-profit 3rd party. Might as well use Microsoft/Oracle/IBM (urgh - I feel dirty even writing that) if you're going to give up any semblance of ownership and independence by slinging everything to a commercial 3rd party.
(I know that Cloudflare are also big users and contributors of OSS - it's not that I think it's proprietary - it just doesn't seem to fit with the independent nature of El Reg. I have a huge amount of respect for both organisations and wish them all the very best)
it just doesn't fit right to me that the mighty El Reg - who operate using open source [...] have such a dependency on a commercial 3rd party.
We also have another hard dependency on a commercial third party in the form of the providers of the servers we use; same goes for the commercial third party OS installed in the load balancer, the firewall, etc. as well as other bits and pieces which there's either no free software or open source version available for, or for which it's infeasible to use one. I don't think it's avoidable much. Where should one stop? Organically in-house grown free BIOS-laden servers?
DDoS comment aside it's an optional choice to place your tin behind Cloudflare, not a technical necessity
Having a sorta kinda CDN in front of the infrastructure provides other technical tangible benefits. Substitute Cloudflare with Akamai or Fastly and it'd be kinda the same, modulo feature set. Should we hand-roll our own CDN? I strongly prefer not to, and I do like the fact I don't have to as there's a commercial service available which can do it for us. The only other alternative would be to not have one at all, and that'd be worse for us, even worse than having to hand-manage a home-rolled one.
Unfortunately, as all things - sometimes things go TITSUP and there's not a lot we can do about it.
At other times, some of our previous ISP's network went TITSUP - and there wasn't a lot we could do about it, either. We can control some things; just not all of them; or, if we can - it's probably too time consuming to control it down to the tiny bits.
What we can and do control is what's running on our servers, and that's a fairly healthy mix of mostly free and open source software, with some commercial stuff peppered in-between.
Just my 2c :)
Once you get into page rules and other features it's extremely powerful for the price. Most of the pages on our site are static so I set up page rules to cache them along with all the images, fonts etc. used by dynamic and static parts of the site. You can block or challenge visitors with lots of parameters to fine tune. Oh and you get brownie points with Google search rankings for having a fast site as well.
While I'm unaware of Cloudflare acting in an objectionable manner, the widespread use of Cloudflare has long caused me a great deal of nervousness, and this sort of thing is one of the reasons why.
I think it's a mistake for so much of the internet to be so centralized. It's a huge part of why the internet has become so brittle.
The internet is fine, it's just all the pages which are broken.
A quick shufti at the page source is revealing.
Back in the day we would hand code html to get the page and all images into a few KB to ensure fast loading on 300baud modems. It also made sites very resilient.
Now it's all js script and dynamic pages with bits from ten's or hundred's of different sites; it only takes one of these to be Titsup to kill the original site. All because of 'metrics' and 'tracking'.
300 baud? HTML? You have a valid point without having to exaggerate. 9600bps was pretty mainstream when web pages first appeared, with 14400 available.
You're right, though, in that I spent a lot of time hand-coding HTML, and squeezing every last byte it of images. Now bandwidth is plentiful so nobody cares. It's a choice between paying a human to optimise stuff Vs paying for bandwidth. Humans are expensive.
Edit - you're definitely right about the tracking/metrics though!
"When it was first designed it was supposed to be resilient, proof against chunks of infrastructure being taken out in a nuclear attack."
Oft repeated, but simply not true. The networks that were designed to survive nuclear attack included the "Minimum Essential Emergency Communications Network", or MEECN, and the prior "Survivable Low Frequency Communications System" or SLFCS, Besides, if you use an ounce of common sense, it only stands to reason ... no military would design a command and control system that inherently wasn't securable, and the Internet was not then, and still isn't, securable.
In The Beginning, the first two nodes of what became TehIntraWebTubes were at SRI and UCLA, conceived, designed, implemented and run by students and professors. With no Pentagon oversight, input or anything else "intellectual". Money, yes. Oversight, no.
Boiling it down to basics, the (D)ARPANET was just a research network designed to research networking. The "survives nukes" myth came about much later ... The only reason it was built to be resilient is because the existing hardware was really, really flaky.
Even if we could magically decentralize CloudFlare and make people write nice HTML or at least store their own scripts, the internet wouldn't be a lot less fragile. The reason for that is that there are very few places that process all our traffic. There's only one line leading to your house that actually works, but that's a short length that isn't the main issue. The issue is that there's only one line that connects your ISP's local unit to whatever center they have for sending it out of local, and only a few lines (or maybe just one) connecting large areas to other large areas. What happens when cables stop working? Large parts of the internet lose connectivity. Routing around that kind of damage requires a web of lines, but a lot of the world operates on chains of lines instead. It's hopeless; the internet can't really route around damage. We just put our systems in lots of parts so we can weather most small disconnects and otherwise we're hoping nothing really bad happens.
it fell over for me right as I clicked the link to go to the Comments section for the Deep Nudes story. At first I thought the company webfilter was gonna squeal about so many semi-naughty words on a page, then realized the 502 message was coming from Cloudflare. Phew, that was close...
Not that I read El Reg for fun at work. It's "Industry News", not leisure reading. Yeah...
Biting the hand that feeds IT © 1998–2019