Cloud doesn't fix stupidity
Cloud computing has elements of a fool’s paradise. We’re told it’s elastic, infinitely elastic even, with thousands of virtual servers spun up in minutes. But there's just one problem ... this is palpable nonsense, short-term excess capacity being mopped up by early entrants to a large resource. Virtual servers are not magic. …
I think the math applied by the business here is that you can have at most N% of servers idle. Which means that the more servers you have AND more clients to provide the load for (100-N)% servers, the more servers in absolute numbers you can afford to keep idle. So, if your N% is 2% and you have 10,000 servers in total, that's 200 idle servers you can afford. If you have 100,000 servers, that number becomes 2,000. In practice, the more more load you have, the lower N% can become (due to statistic of random numbers) while giving you extra capacity for peak loads. Back to the example above, if you had 100,000 servers then N% could be perhaps lowered to 1.5% giving you 1,500 idle servers - which means that your base cost just decreased.
I am not trying to sell cloud services (not my business at all), just applying some common sense to the problem.
Better still - not all crunching jobs need to be *NOW*, many are *soon*, or RSN. If the 10k instances are a regular thing, they'll be scoped in hardware. If they're occasional, jobs can be prioritised, so the 10k instances can take priority on the hardware then hand it back to the lower priority work.
Mainframes have done this since just about forever, so I can't see why this is worth an article?
How dare you introduce reality into cloud services! :)
Maybe a fun question to ask: has anyone got a contract where such elasticity is made implicit? Anyone with a KPI that states they can grow their demand 100% in a day (or whatever other measure that would not be possible for "traditional" facilities)? If yes, question 2 would be: "have you audited this, and how?".
I don't think anyone has, really. Illusions, illusions.
The point of the Cloud is that there's many entities requesting many instances, making the excercise one of scale and availability, and making your physical capacity match "traffic".
"Virtual servers are not magic. They don’t exist in a parallel universe, decoupled from hardware. You still need physical servers, sitting in a data centre somewhere."
shows the real problem. You need a seriously fat data pipe for this to work, or you'll bottleneck rather horrible. It's all good and nice to have One Meeeeelion server capacity, but if you can't feed them they're next to useless. The world's infrastructure is simply not up to serving the Cloud Dream.
They are not magic but it is a bit naive to think they have to just drop to idle. There are classes of problems which can start and stop at a moments notice which require high CPU but are not necessarily time critical. This idle vm time can be sold by the second at a discounted rate on the understanding that your vm will start and stop as per the requirements of the cloud provider at that time, not yours.
Also I think there's an element of setting up a straw man - the article says "We’re told it’s elastic, infinitely elastic even, with thousands of virtual servers spun up in minute". But I don't know who's really saying that.
To take Amazon as an example, their EC2 "Purchasing Options" page (https://aws.amazon.com/ec2/purchasing-options/) is very clear that on-demand instances are not guaranteed to launch at times on high demand. If you try to launch thousands of servers, you've likely just brought on that high demand situation yourself so it will fail.
In practice I suspect that Amazon will terminate spot priced instances below the on demand rate to make room for your on-demand requests, then turn them down. They will then retain a certain limited amount of spot capacity which you're free to bid for. But clearly you can't bid for more than is there, and attempts to bid for too much will drive the price though the roof.
If you have a workload which requires you to be able to occasionally reliably fire up lots of instances, you need the third commercial model, which is (and the hint's in the name) reserved instances. You can reserve up to 20 instances a month for no cost, or pay up front to get reduced fees. More than that and you have to put in a manual request to them.
My suspicion is that if you say "can I reserve 1000 instances, please", they'll come back and say "please can you pay a bit up front, so we know you're serious". Then they'll try to flog any unused capacity as spot priced instances.
All pretty transparent, if you ask me...
As "infinite traffic" (ISPs around the world) or "infinite storage" (OneDrive), "infinite scalability" is... not infinite.
Let's accept this: we can't win a fight with marketing. Once they take hold of a word there is no hope for it to keep its actual meaning. We clearly need a new word to be adopted as international standard to mean "without bounds", like the "infinite" of the old days. Of course, over time that word will also be taken over, so we'll have to invent a new one. And so on. Suggestions?
Ultimately the "Cloud" is just someone else's servers they have to make a profit from.
You don't know what the security/privacy, backup or other arrangements are.
You might not know where they are.
You are at the mercy of your Internet connections.
Imagine a world were accountants think that it saves money to outsource IT as if it's a call centre (c.f. Banks, which is madness seeing as in reality IT is their core business) as well as the servers to run it.
Imagine there are eventually only four "cloud" providers, and maybe two ecosystems of software.
(c.f. Same false positive Virus warnings on hosted files/Websites. Two unconnected hosting companies in different countries).
The Hosting companies (aka Cloud) need to make profit. If you run your IT well then inherently cloud will cost more eventually than DIY (Connectivity costs + Hosting company profits). The Cloud does make sense for Web Sites, though if a high traffic site, perhaps co-location of your own server at a data centre is cheaper & better, more control). Really the Cloud makes no sense and is a backward step to 1960s & 1970s for stuff that should only be on your own LAN or own Corporate Network.
Good to see an article with some reality about Cloud.
If there is wide spread migration to Cloud of Banking, Billing, IT, etc, then one day a badly done patch will kill civilisation.
If no electronic transactions, no food. Riots & looting in less than 2 days.
Billing on Mobile will go down. So soon no calls.
Unless the "Cloud" ecosystem is fixed in less than 2 days, then you are looking at a cascade of failures ... no water, gas, electricity etc. Being able to buy & Sell and make calls (mobile or fixed) would be first to go.
How would you get cash from a Bank or Cash Machine?
How many people have Cheques or Cash, or will have in 5 years.
The Cloud hype has got to stop. Yes. it makes sense for minuscule to medium websites. For anything else it will eventually be a disaster of apocalyptic proportions.
The logic presented in the article (and also in my comment above) does not only apply to public clouds. It could also apply to a private server farm full of Docker instances (or similar technology). In which case you do have all the control over the factors listed above. But of course it only makes sense if indeed you do have large enough load to justify the expense of such a large server farm in the first place, which is not very common at all.
One mitigation to the problem suggested is Amazon's spot pricing.
When someone suddenly want 1,000 servers NOW they don't take ones that were running idle; they steal ones from people who had put in low bids for spot pricing. When sensible people want 1,000 servers they don't say they want them now, but use spot pricing to wait until the bank trading floor has finished its 1,000 server 'must be run at 4pm' job then grab all the spare servers.
That was my thoughts on the article, it will come down to a bidding "war" where you offer money for services and you don't get a cast iron guarantee of delivery, just a position in the scheduler based on who else is bidding for it and how much they are willing to pay.
What, you really need it to work? Maybe just buy your own server then...
Right. So you want to run your IT infrastructure as a market, in a conceptually similar way to the one the UK has chosen to run its electricity infrastructure, largely based on a combination of contracts and spot pricing, with "joined up thinking" nowhere to be seen.
Best of luck with that.
Earlier in November the Grid issued a "Notice of Insufficient Margin" followed a few hours later by a Demand Side Balancing Reserve notice (we, the Grid, will pay you to disconnect your loads) and by the purchase of a few hours of 50MW worth of spot market electricity at over £2000/MWh (vs well under £100/MWh at normal times).
This was largely blamed by Gridco on a 400MW outage at Ferrybridge. The fact that the 10GW or more of installed wind was contributing under 500MW to the grid wasn't mentioned. Ferrybridge and other coal fired stations will soon be closed, permanently, because their owners have failed to invest in flue gas desuphurisation which has been a requirement for the last decade and a half (Large Combustion Plant Directive). That closure of coal plant was happening before today's announcement that the UK is going to have another insane "dash for gas" to suit cameron and his fracking mates. Gas is handy stuff to have, and burning it simply to generate electricity is a dire waste of a limited resource.
Meanwhile, the head man at the Grid has paid himself a £million or two per year, and is now moving on to better things, having achieved excellent returns for his shareholders, and capacity margins of 1% vs 10%+ for electricity customers in the UK, with blackouts the likely consequence.
If you didn't see it on TV yesterday, the BBC/OpenUniversity co-produced new series Power to The People looks quite promising though you may have to read between the lines a little sometimes:
Maybe there are lessons to be learned there somewhere: variable/unpredictable demand, variable/unreliable supply, someone's going to end up unhappy, and usually it's the people paying.
If we learn from energy market something, it probably would be that guaranteed capacity for base load should be at the very centre of the whatever system we come up with. Which as you demonstrated above, is not exactly the case for UK energy market. However hybrid clouds might be getting you there - you have guaranteed capacity in your own server farm for your base load, and anything beyond that might (or might not, depending how your actual load compares to base load) be subject to spot pricing.
Well of course then you have bit of a problem if farm is designed for unrealistically low base loads, or if the farm is getting long in the tooth while the base load keep growing. Which I suspect might be also actual problem with UK energy market.
Author forgot to mention that the propensity for spare capacity is directly related to the cost of having the spare capacity (which is the cost of capital invested and ongoing costs) vs the potential reward for having the capacity available. A low cost of spare capacity vs high potential for reward (e.g. you can charge a lot of money for no-notice spinning up of 10,000 instances, and customers are happy to pay that much higher price) means that this may be possible in the future. Or may not - I'm not sure anyone is sure enough of the long term economics of public cloud yet to say which way it will fall.
Hotels mentioned in the article are a good example, but the author is too simplistic. If you take a tourist resort, the hotel may be huge and nearly empty in the Winter. The in the Summer it's full of people paying a premium, and it makes sense for the hotel operator to maintain large amounts of spare capacity in the off season as the cash in the peak more than makes up for it. I can imagine a very similar calculation being taken by cloud providers, if there is a demand for high price, no notice spinning up of large sets of VMs, a bit like there is a large demand for high price hotel rooms in Summer.
Sure there can be some scheduling. But when is it "winter" for Data centres? Certainly Party Conferences etc are just after the peak holiday time for this reason.
Sure there will be spare capacity from time to time, but not predictably and certainly not instant.
How often does the hypothetical company need this capacity? For how long and how much a delay? Only small volumes are going to be "on demand" instantly for as long as you want. It also needs to be stuff were latency and traffic to your own site isn't an issue, unless it's traffic with Public. So we are back to is Cloud any real use except for Websites?
it suggests that once we approach the cloud service demand plateau, pricing may become more nuanced, perhaps discounts applied when usage can be scheduled (or more likely: extra costs for instant access). Perhaps an element of prioritisation cost.
The main take away is that if your application is based around the concept of quickly spinning up compute resource, you may encounter greater expenses down the line.
The flaw in the argument is the assumption that the 10,000 instances have a server array to themselves whilst they're running. What's more likely is that rather than have a big server farm with 1,000 spare servers sitting around you have 10,000 servers all active running 9 VMs each or 5,000 running 8 and the 10,000 VMs just get spun up as additional jobs in each server.
Thinking bigger though, I'm sure there's people cleverer than I working out the maths behind not just those 1000 servers that need spinning up, but the ups/downs of all other customers elasticising into/out of the same set of resources.... I'll wager they get better than 90% utilisation and may even be flirting with over-provisioning most of the time - anything lower reduces ROI
> The flaw in the argument is the assumption that the 10,000 instances have a server array to themselves whilst they're running.
It really doesn't matter. The point is, in order to spin up 10,000 instances, there needs to be hardware available now with 10,000 spare instances worth of capacity. Yes in practice it may well mean 10,000 different servers all taking on one extra instance - but the point remains that prior to the usage and after the usage, there is hardware sitting there with 10,000 instances worth of spare capacity.
Now, if that spare capacity weren't needed, then (taking that worst case scenario), there are 10,000 servers all with at least one instance of spare capacity. So for every 10 servers, you can move one server's instances to those spare slots on it's nine mates, have a server doing nothing, which you could shut down or even not even have in existence.
As others have suggested, when things start maturing a bit, we'll start to see tiered pricing. Want a lot of computer power NOW - then pay through the nose for it. Want it "sometime in the next day or so" and it'll come cheap as it can be slotted in around those expensive high priority jobs. And probably several levels in between.
But then there is the question ...
If your 10,000 instance job isn't important enough to pay for priority pricing on - why run the job on 10,000 instances ? Fire up a small fraction and allow it to take longer.
Comparisons with hotels and rental cars don't work.
A better comparison is "Hello, is that Avis? I represent Foo and we have a contract with you that says I can I request 500 green Corollas with magenta interior for pickup at JFK at 21:00 tomorrow."
"Whaddaya mean I can't have them? We pay you to have them available to us at short notice. That's the contract. Imma get legal on ya ass"
That's the real-world comparison. Any other comparison is nuts.
"And what were they doing one minute after the 10,000 compute instances were spun down? Sitting there and idle again. That’s not very good resource management."
I call bullshit on this bit. Bad resource management is a company themselves maintaining 1000 physical servers, in order to use them once (or at some other level of infrequent).
Given a large/global and eclectic customer base demand volatility will be lower, so excess capacity can be at a much lower percentage.
Thank you for that.
And the Worstall response here would be to use the word "elastic" in its economic sense. Nothing is infinitely elastic nor inelastic. Everything sits on the spectrum of elasticity.
And cloud supply is obviously more elastic than dedicated supply of computer cycles.
At which point, well, yes?
"This is a scaling question.
If your hotel has 1,000,000 rooms, booking out 1000 of them at short notice is not a problem."
No, it's more like a question of safety margin.
If my supplier has capacity X, which is normally operated with a few percent safety margin, an order like yours for an extra 0.1% of X is neither here nor there, **regardless of scale**. An order for 20% of X is a problem, regardless of scale.
If the normal safety margin is typically 1% then a customer order for 2% capacity NOW is likely to be a problem unless other customers are willing to be kicked off without prejudice. Maybe that works if those customers get cheap service in return for being kicked off. Maybe.
See my essay on the way the UK grid currently operates.
"Bad resource management is a company themselves maintaining 1000 physical servers, in order to use them once (or at some other level of infrequent)."
That really depends on the impact on the organisation of being unable to handle an unusual peak in demand. Some won't care and it won't matter if service levels are appalling occasionally. Some will care and plan accordingly, because it will matter to them and others if they fail to perform when demand is high. Some won't care even though it will matter; they're the ones that will be out of business in due course.
Take your pick.
As in other fields Cloud providers will be coming to people like you and I saying "Can we rent part of your hard drive on an on-demand basis?"
This is already happening with such things as Torrents and research projects such as Seti.
The provider would have to have comprehensive redundancy on the resource, because they have no control over the user switching off their computer when there's some important data on there. In other words, the same data would be distributed amongst multiple users to ensure no byte of data was on one pc only. There is still a statistical risk, but presumably that would be advertised as part of the sales blurb.
AWS Frankfurt turned me down for 2 x m4.xlarge instances today citing insufficent capacity:
StatusMessage: We currently do not have sufficient m4.xlarge capacity in the Availability Zone you requested (eu-central-1a). Our system will be working on provisioning additional capacity. You can currently get m4.xlarge capacity by not specifying an Availability Zone in your request or choosing eu-central-1b. Launching EC2 instance failed.
That the cost of computing resources we have been charged for the last n decades is far in excess of their actual cost making it possible for Microsoft, Amazon and Google to create vast server farms with the excess capacity needed and still make a profit.
My guess is the OP doesn't like cloud computing and want's to rail against it. In my view, the only thing exposed is a lack of knowledge of the economics of cloud computing. I don't have any better insight but it take only the time to read the post to poke holes in the OPs line of reasoning.
Not everyone will want to start 10K servers at once or even at the same time. Nor will they want to run them indefinitely (they will run out of cash fairly quickly). Suppose the cloud platform already supports 1 million virtual servers. I run 4 instances in the AWS US-East availability zone and I don't think I'm alone so I think 1 million is not a fantasy number. 10K instances is 1% of that number. So, yes, I think a service providing 1 million instance will have many more than 10K instances in reserve.
The car rental fleets are several times the number out on hire because they know they need cover and customers want their choice to be available on-demand. It's not unreasonable that the cloud platforms do the same. If there are 1 million active instances having a spare 1 million available only seems unreasonable if the cost of providing the hardware is the same as you and I will pay. But if the cost to the cloud platforms is a fraction of that, then the economic concerns expressed in the post do not apply.
The spare capacity does not even have to be switched on until required so the only consumption is space which in many parts of the world is not at a premium and, anyway, the hardware will be stacked vertically.
Finally, the number of times 10K servers (or any other number) are fired up is probably well known and the statistics analysed constantly. Wouldn't you?
So I think the post illustrates a lack of knowledge on the part of the OP rather than anything that merits genuine concern.
Biting the hand that feeds IT © 1998–2019