Major irony alert
"Ironically, outage monitoring sites DownDetector and isitdownrightnow.com were also offline, thanks to the issue."
Tuesday's Amazon Web Services mega-outage knocked offline not only websites big and small, by yanking away their backend storage, but also knackered apps and Internet of Things gadgets relying on the technology. In fact, the five-hour breakdown was so bad, Amazon couldn't even update its own AWS status dashboard: its red …
My Wacom Intuos tablet had a driver update (on Windows) that totally changed the settings GUI and gave me a cloud based parameters storage and loading 'facility'. The idea is that I can more easily manage its settings and have them backed up in case of accidents and also, of course, easily migrate my tablet between different computers.
I uninstalled the drivers and reinstalled from the CD then blocked Wacom software at the firewall.
Indeed this 'instead of' methodology that many IoT thingies have is a serious worry.
The IoT aspects should be a layer on top of, not instead of, a working self-contained system.
App settings - sure, back them up so you have them preserved or migratable to a new system. But don't have them as the sole storage for settings.
"Serves you right."
I can't help but think you're being a bit harsh here - for instance, ever since the Dawn of Time, if you don't feel like using 156345346 different remotes the only universal ones that are sold anywhere are basically the Logitech Harmony series; and yes, they come with a cloud-only config tool, whether you like it or not. Yes, they do _work_ without being online*, but they cannot be reconfigured. Believe me, I would never willingly chose such a setup but what choice do I really have...?
* I have no idea whether a similar issue is at hand in this particular case, or if this remote wasn't working at all...
It's been a few years since I had / used my Logitech Harmony remote, but back then the tool to configure the remote is online as you say. However, who is changing macros on their remote on a daily basis? Once it's setup you normally only need to change the config when adding a new device.
Agree with the overall sentiment, but as the Harmony remotes rely on an enormous database of known devices it sort of makes sense for it to be online. (As the database gets updated daily)
" Someone complaining they couldn't change their mouse sensitivity?"
From the Razer site:
Razer Synapse is our unified configuration software that allows you to rebind controls or assign macros to any of your Razer peripherals and saves all your settings automatically to the cloud. No more tedious device configurations when you arrive at LAN parties or tourneys, as you can pull them from the cloud, and get owning right away.
"If you want a thing done well, get a couple of old broads to do it."
-- Bette Davis
If neither of those broads is named Alexa. I just now asked my Echo if it was all right, and it responded, "Great! I'm ready to stroke your man-parts or whatever!" Um, okay.
I think I speak for all of us when I say that my sympathies go to the poor soul who couldn't order a coffee. OH. MY. GOD. S/he's probably still shaken by the experience.
There's a reason why this nonstop idiocy is called a "first-world problem." The reference isn't meant to be self-congratulatory or a compliment.
This is great : a cloud service falls down so hard it can't even notify customers it is down. And, way down the line, thermostats can no longer be changed, mouse settings are frozen and God knows what else.
This is absolutely perfect and should happen a lot more often until people finally get fed up and demand things that work ALL THE DAMN TIME, like they used to before this happy-happy age of sharing everything with the NSA whether you want to or not.
IoT ? Not while I still have a functioning brain, thank you very much. My light switch does not depend on the Internet and never will.
“IoT ? Not while I still have a functioning brain, thank you very much”
I agree with the sentiment 100%. However I fear IoT will be foisted onto the unsuspecting public, in various guises, whether they want it or not. ‘Smart’ meters being a prime example, which are presently being aggressively deployed by energy companies in the UK.
"... ‘Smart’ meters being a prime example, which are presently being aggressively deployed by energy companies in the UK."
Not agressively enough !!!
I have been with 2 Electricity/Gas Suppliers that have 'shouted' from the rooftops their wonderful 'Smart' meter functionality ..... only to be told that they could not fit a 'Smart' meter due to the existence of Solar Panels (new install covered by FITS) or not available in your area !!!
How can you design a Smart Meter that is unable to cope with self-generation of power when they are everywhere in the UK and not exactly leading edge Technology.
The 'Not available in your area' was only discovered AFTER I had changed Supplier !!! :(
[I was assured that I could get a Smart Meter when I queried about it and mentioned the Solar Panels !!!]
people finally get fed up and demand things that work ALL THE DAMN TIME
My retro-style analog light switches, coffee maker, thermostat, range, fridge, garage door opener, laundry equipment, etc. all functioned perfectly throughout the Great Outage. They also had the additional benefit of being half the price of their (dis)connected brethren.
I do wish that the cat had gone offline for a while, though. That actually would have been kind of nice.
It's terrifying that we've gone from a network designed to survive nuclear attack without loss of communication, to a situation whereby a single company's IT failures affects tens of millions of people.
Whilst you can argue that the majority of the disruption is, in the scheme of things, minor, the IoT is pointing towards a lot more serious issues further down the line. Imagine what could happen if say, self driving trucks relied on AWS for back end updates of road closures, and due to a crash, couldn't be notified of temporary road closures, nor updated to signal that they should park up.
Any critical service like that should be built with multi-region availability. AWS has 14 regions to choose from and easy DNS features for latency and healthcheck based routing.
Don't get me wrong, this outage was annoying and Amazon's multiple AZs per region are meant to prevent an entire region falling over. For us it gummed up a bunch of batch jobs we had running and we lost time rejiggering them to not lose data. But our frontend is multi region, clients got directed to west coast and didn't miss a beat.
Should have. But let's take a peek inside a dev's mind after it happened. Something like this....
"But, but the time to market was tight and the protocols were complex and AWS hardly ever fails and beside it was going to cost extra and my fried told me no one else does it."
I think that just about sums up most of the people who did this.
BTW in real engineering there is the idea of a Licensed engineer. If you design a building and it's built as you specify (IE all materials and procedures followed) and it falls down below design loads it is your fault.
I think that was more the devs boss mind rather than the dev - most devs would not have the luxury of being consulted on things that potentially involve big cash (choice of cloud, backups, hybrid solutions, failsafes), just get told its this solution, work with it. In very few places do devs have a decent degree of input into solutions, mainly just treated as coder for hire with your thoughts / opinions ignored
AC - obv!
"Should have. But let's take a peek inside a dev's mind after it happened. Something like this...."
Another common failure with various applications is the assumption that if an internet connection is available at the beginning of a session, then it's there forever.
Broadband is a lot more reliable than dial-up was, but things still go wrong at the client side, and as more and more work moves to mobile devices, this is a problem which won't go away any time soon.
Exactly because it was designed as a distributed system with different paths, and not as a single monolithic architecture putting all the eggs in one basket.... but it was designed by scientists to address an issue, not by MBAs trying to understand how to reinstate big monopolies and extract as much money as possible from users...
And now we've got networks where some faults have a tendency to go nuclear. How quaint.
Granted, it is rather hard to account for a possibility of getting unwanted positive feedback somewhere in the system that'll lead to catastrophic overamplification. Especially if you can control only a small part of the system.
But just for fun, I'm going to snap into the old git mode and blame it on whippersnappers having no experience with op-amps these days.
yes and no.
Part of the ARPANET brief was to design a network that had no single point failure hence no command centre. It's "use case" was to allow different remote access by other institutions to various specialized machines (DEC 10's the ILIAC IV supercomputer of the time).
The Bell System Electronic Switching System (ESS) had RAM and ROM elements which were designed to be rad hard as well.
The Kaikoura earthquake in New Zealand knocked out communications. Trucks were stranded between slips, a train was too. In those cases the human driver realised the problem and applied the brakes.
Kaikoura is still cut off from the North by massive slips on the road/rail line. Initially various mapping and route finding apps were not updating to the alternative inland route bypassing Kaikoura so motorists and trucks were being directed down the blocked coastal road. The police had to permanently man a checkpoint and turn vehicles around with new instructions on the alternative route.
So we already know the sort of problems an internet reliant automatic vehicle would face.
Add in that significant parts of NZ have no cell phone coverage, too remote, mountainous, unpopulated to make it economic. Woman caver near Nelson recently fell and injured herself. No cell phone coverage made getting a rescue a problem. Emergency services have radios so once in place it worked, but we are so reliant on cellphones now.
Isn’t the Met moving from radios to a cellphone based system? . . .
Can't even TURN OFF your oven? Talk about shitty design! If basic functionality like that is dependent on an internet connection, what happens if the manufacturer goes out of business, or simply decides that it is tired of supporting 10 year old products and takes down the cloud site it relies upon?
Too bad the general public that is suckered into buying this useless crap doesn't see news like this. I guess we need something like that to cause a fire that kills children to make the national news before it reaches the public consciousness and the deserved blacklash comes against non-tech companies putting "internet" and "IoT" into their products for marketing reasons without any understanding of the consequences.
Who knows, maybe the HTML5 UI on the oven's integrated touchscreen linked to a cloud-based JQuery for its "OnClick" action for the "off" button so when that didn't load there was nothing to execute* **...
*Yeah, I know ancient fossils sometimes tell tall tales of ridiculous "clickable links" that once were purportedly integral parts of webpages and didn't need code to be executed on a click, but those are obviously just invented stories right up there with those hilarious "frames" that clearly never really existed...
** Okay, in all actuality this is probably a case of "I went out for some milk knowing I can turn off the oven with the cookies remotely form the supermarket and then it all just fahahahaileeeeed.... *sob* *sob*"
"Nest warned customers that its internet-connected security cameras and smartphone apps were not functioning properly – as in, weren't recording video footage – as a result of the AWS blunder."
....Where is the ability to cache for x hours / days in an offline mode???
"Other IoT devices were also impacted and caused some rather surreal scenarios for their owners. We're told that cloud-connected lightbulbs, thermostats, ovens, and similar gear, stopped working properly as their backends fell over."
....Oven burned house down cos cloud backend failed. Insurance will pay?
I have an internet-connected DVR with 5 cameras. I can view live or recorded footage on my smartphone wherever I am in the world.
Guess what... it records to a local 1TB hard drive, and depends only on my broadband line. I can't believe there are devices out there that cease to function without an internet connection. Surely an ISP or local phone provider exchange would be more common than an entire AWS DC failing, and manufacturers would have realised the flaw in their design by now?
Well that's the whole point.
AWS DC failures are rare enough that this bunch of companies thought they did not need to code migration into their "cloud" software.
Result. "Cloud" reverts to 1 site server farm.
Server farm fails.
System is borked.
~ When a leading IoT supplier like Nest has zero fault tolerance regulation is badly needed. But the US has opted out because no one else will follow, so they claim. But this is a disaster.
~ I doubt they'll even add heartbeat safety to ovens or similar appliances etc, in the event of overheating when remote smartphones lose connection etc.
~ The demise of tech journalism is lamentable. It takes security specialists on unknown blogs to research / reveal weaknesses. Meanwhile all mainstream journalists do is sing IoT's praises.
You want local storage so you aren't dependent on the internet (i.e. thieves cut the fiber to your building before breaking in) but also cloud storage so taking the DVR with them doesn't help.
Of course, they could do both, but I'm probably assuming too much intelligence from the average thief thinking they might come up with doing even one of those things...
"If an S3 region becomes unavailable, S3 itself should route applications to another region that hosts a redundant automatic copy"
That would make things simple for users. Chris Mellor's article points out that S3 stands for Simple Storage Service. So now we know that's simple for Amazon, not for the user.
TBH I thought so to. That was (it seemed) the USP of a cloud system.
But apparently not.
So for anyone who's not coded those features into their software AWS is just a remote sited server farm which you don't own.
Maybe other cloud providers are better at this than AWS.
But does anyone know?
My understanding is that Amazon's cloud is supposed to be redundant. If you hosted on another cloud for redundancy, you made everything more complicated and your odds of an outage probably went up due to that extra complication unless you really know what you're doing.
"They don't but marketers saw something shiny d and demanded it."
s/demanded it/convinced the gullible they needed it/
And could see the potential for gain for them and the companies they work for, with all the lovely data they'd get, and more potential control over those customers.
To slightly misquote Gary Numan's (Dark) lyrics:
hostility connectivity to lead the faithful and the blind."
A lot of disinformation here regaridng Philips Hue bulbs.
The Hue lightbulbs work perfectly well without any IT whatsoever. Guess how?? Yep - that trusty old light switch turns them on and off. Just like a real bulb. So i can turn off my phone, my router and my PC and i can still turn them on and off - using a light switch. They perform just like normal light bulbs.
If i want to change them to a different colour, (or turn them on and off remotely), my router needs to be on, because the app to change the colour is on my phone and therefore my phone needs to be able to talk to the bulbs (they are not Bluetooth bulbs).
Following so far? ;)
If my router is connected to the internet, the bulbs can then download patches and interface with services such as IFTTT. But they certainly do NOT require internet connectivity for basic operation.
>Why do we need internet connected light bulbs??
Most commonly to be 'in when you're out' - but a side benefit for disabled users is a Smart Home setup is now measured in £100/1000's instead of £10,000's. Hasn't killed suppliers of the latter and the related gravy trainers who somehow still claw staggering amounts from the NHS, but has changed a lot of lives......and before you cite this as an example of Cloud's unsuitability - specialist AT is not only hugely expensive but incredibly unreliable, buggy as hell and implemented using decades old tech which wouldn't find an application elsewhere.
Your points are good ones, but there is a related issue. IoT services for the elderly and the disabled are useful and good. BUT the economics of these applications are not good. Takeup amongst the elderly and confusion over how to make it all work allied to aged cussedness mean the market amongst the elderly is likely not large even though the benefits are manifest.
So, to make the economics of this work they have to sell them to fit, healthy, able Joe Public and for us the utility beyond ‘Ooh! Shiny!’ is simply not there.
You read things like the guy who couldn’t get his kettle to boil. When after 8 hours it finally worked he had to eat his tea in the dark as his lights were downloading an update and were offline. His lights had power, the bulbs were not blown but they could not be turned on when needed. This is a health and safety issue and means the products are not fit for the likes of the elderly. Do you want your Granny to fall and break her hip because her lights won’t turn on because they are offline?
> Muscleguy & AC above that...
Good points, yes the elderly need support, including my 96 yo mother living alone until recently. Struggling with failing eyesight and memory loss she couldn't read the time on her very large analogue clock. Enter the Raspberry Pi, a large TV screen, a bit of Python/Pygame code (with hi contrast colours) and we solved the problem. :-)
With touch sensitive table lights and socket mounted timers (etc) we weren't hankering for internet light bulbs - and certainly no AWS in the loop.
- See; I still don't get it. ;-)
I'm not sure about nest but quite a few shiny IoT soho DVR systems sell the off-site cloud storage as their continuing revenue stream; small-print says something like "give us £30 per month or your system will not work"
I found a nice netatmo (French!) IoT biometric indoor motion-sensitive DVR that records to local microSD (with a free option to additionally dump to DropBox) the biggest problem for me from netatmo is the DVR will only boot up with their specific netatmo 5V micro-usb PSU wall-wart, I had planned to supply via USB UPS.
Interesting for me is Heroku. Now we all know that Heroku lives on AWS, no issue there, but they reacted by taking their whole management API down. Which meant that nobody could scale to meet the change in demand or reconfigure to route elsewhere. And they were down for at lot longer than AWS were.
Tested and found wanting, in my view.
Most of the status pages I've seen seem to be run by the marketing department rather than directly linked to the service they claim to be monitoring. They generally don't admit there's a problem until several hours after it started, and use weasel words to minimise the apparent size of the problem. I don't trust them.
Or, at the very least a separate set of isolated infrastructure inside your cloud property. This outage is a bit more than egg on the face to be blamed on power. It is obvious they use the same infrastructure for their dashboard and that is plain stoopid.. They bought this one through design flaws and I wouldn't blame anyone for getting off of AWS as soon as they can. I don't precisely know what happened here - could be a simple DNS issue, or something more severe - but it shows the AWS infrastructure design is very flawed. Hate to say it, but there should be a watchdog org that verifies public cloud provider infrastructure. So much rests on that infrastructure that something needs to be done. I once heard a budding cloud entrepreneur tell about how he started his cloud storage business in his garage. Oh, right, I bet he never told his customers where his secure and standards-conforming data center was.
Sometimes it happens for quite anecdotal reasons. At one company it was because MD did not like the humming noise coming from their server room just over the corridor.
So IT department got an order to scrap servers and move everything to the cloud. They weren't happy. In a true BOFH spirit they contemplated splashing few grand on soundproofing and setting up a company à la "Cloudy McCloudface Ltd." to issue invoices. But eventually they chose to be good sports and went for a reliable colocation company. Which happened to have "Certified Cloud Solutions Provider" prominently written on their marketing brochure.
Hopefully a wake up call for connected device makers, don't make your tools reliant on the internet!
Wake up call ha. more like they hit the snooze button. The wake up call should of been when video game makes made single player mode require an always on intent connection and made the game save to the cloud. I believe gears of wars players ran into issues were the servers crashed for days and you could not play.
One of the congress critters actually bought clue and the DMCA was update for games that require an always on internet for single player mode.
As of October 2015, always-online games with single player modes that now have had dead servers for six months and longer are now exempt from DMCA prohibitions on circumventing copyright protection
"Why do we need internet connected light bulbs?
Most commonly to be 'in when you're out'"
Yea, well, I use time controllers. Do the job perfectly barring power failure when you won't have a light anyway. Their only issue is re-setting the time (and finding the unintuitive instructions), I find more intellectually challenging then setting up a new cloud instance ... but then I'm a born masochist.
Chances are the clock in a mechanical timer is an electric one. When the power goes out, the clock stops. When it comes back on, unless you are exceedingly lucky and have had a multiple of 12 hour (or 24 hour if you have a 24 hour clock) outage, the clock will be wrong and you will need to set it.
But it's usually a matter of turning it until it's correct again.
"In when you're out" - I saw some funny coloured LED lamps on sale just the other day; I was wondering why they seem to have that strange reflector-like shape until it dawned on me these are supposed to project a randomly flickering muted RGB lightshow onto your wall - not entirely unlike the one made by a turned-on TV set in a room - for the benefit of "legally-challenged uninvited guests"...
"Chances are the clock in a mechanical timer is an electric one."
Ha. A few weeks ago the clock on the CH boiler had a little problem. It would run until it came to the start of an on period. It's an electric clock so I'd expect it to work unless the mains went off but with mains behind it why should it fail like this? Replaced under warranty and I took the old one apart. It's a battery operated quartz clock with the battery charged, as far as I could see, by a diode & dropper from the mains. Presumably when the battery goes on the blink it doesn't have quite enough voltage to trip the switch.
As a price for the clock not stopping when the mains goes off - I can live with that as it would be just another clock to reset - I have a timer that fails to work at all after a few years service.
One of our customers does use AWS, but only as a third backup behind their two physical datacenters.
So, even if, as the sysadmins, you've done all you can to make the site redundant and fault tolerant, never underestimate the opportunity for the developers to fsck it right up.
the trouble with relying on cloud services is...
Could go down...
Probably will go down...
Guaranteed to go down...
All this is known, but people choose to ignore the realities of cloud storage.
I know it is all super convenient. But if you use cloud storage as the foundation of your livelihood, its just a matter of time before you get burned alive.
Question re the IoT devices.. do they fail open or closed?
If you were cooking a delicious roast dinner for the in-laws in an attempt to impress and had set it all up from work so that it would be ready when you got home.. would you get home to a cooked dinner or just a freezing cold flat?
Same with lightbulbs, if they were on would they remain on?
Wonder if any internet connected burglars noticed and figured it would be a great time to go on a bit of a spree knowing a whole bunch of security systems were suddenly not working. And whether the insurance would still pay up in that situation.
"Other IoT devices were also impacted and caused some rather surreal scenarios for their owners. We're told that cloud-connected lightbulbs, thermostats, and similar gear, stopped working properly as their backends fell over".
This gives a whole new meaning to "building on sand". Doesn't anyone take Engineering 101 any more? What do all those fools in "risk management" departments do all day long? Oh yes, that's right - bend over when the Big Cheese says, "We're doing it because it's cheaper and all the other bosses are doing it so I don't want to look old-fashioned and out of touch".
" I can't change my mouse sensitivity because @razer @razersynapse servers are down...
"Joys of the @internetofshit - AWS goes down. So does my TV remote, my light controller, even my front gate. Yay for 2017".
Funniest things I've heard since 1993, when Grady Booch told a conference about how some dinner guests of his spent an uncomfortable few minutes being ignored on the front porch. Seems it's not good design to wire up your front door bell through a server that sometimes goes down unnoticed...
Having recently emerged from an AWS exam, I thought that one of the selling points of S3 was that data is automatically replicated across multiple availability zones within a region without the customer needing to worry about the details. I also thought that the availability zones within a region were highly isolated from each other (e.g. separate data centres in different cities). I guess I'm wrong about at least one of those things.
At least the problem was largely fixed the same day. When problems occur within my employer's on-premises infrastructure, it usually takes several days to get it fixed, including a phase during which even the existence of the problem is denied.
Wrong XKCD reference. To reference a prior commentator, sure AWS will replicate data across availability zones in a region. In this case, the entire region had issues, so it was up to the customer to span multiple regions. In other words, this one applies:
Yes, loads of companies do in house stuff badly or don't bother with resilience or disaster planing etc. So the argument is that the Cloud is better.
Maybe from the point of view of the users in one company cloud is better than In house, perhaps you can't then order from one supplier when their in house IT falls over if they don't use cloud.
But the "cloud" could mean that no-one can order from anyone. Instead of just RBS or HSBC being down all banks, Mobile billing (so no mobile calls due to no credit, PAYG or Bill Pay), no ATMS, no POS, no card payments ...
Maybe fantasy today, but not as more companies outsource to cloud EVEN if it's done better than in house. Not as we head toward various mono cultures. It won't be a cyber war, but a Friday afternoon patch to Edge Routers, or load balancing, or DNS servers, or database etc.
The famines in the 19th Century (not just in Ireland) were due to mono culture.
The very concept and "savings involved" of Cloud Computing is heading all of the first World to a cyber potato event horizon.
There was a lecture at Gresham College the other day (free, open to public on a wide variety of topics, worth a look around the website as transcripts etc are put online) titled "Living Without Electricity" about our dependency on power for normal life, communications etc
The transcripts in this case didn't go up for a couple of days, probably as the address is
I assume the Prof might have found this ironic.
Biting the hand that feeds IT © 1998–2019