Thank you, I learned from it.
I got into an argument with a friend of mine a little while back. This person abhors containers and virtualization, claims not to understand why anybody should need to use them, and refuses to deploy them in production. What I took away from this encounter were some valid gripes about the design of modern operating systems and …
I do take issue with the article's implication that process isolation inside an OS is somehow weaker than when run in separate VMs. All that a hypervisor is is an application running on top of an OS. Sometimes that host OS is Windows, or Linux, or MacOS. Sometimes it's something cooked up by VMWare, or similar. But it's all still just software, running on hardware. You can even see this first hand in VMWare. The various hacks for VMWare Workstation on Windows that allow running of OS X VMs are simply turning on various GUI features to allow you to choose Apple Mac as a machine type. The hypervisor in Workstation is actually the same hypervisor they compile for OS X, ESXi, etc.
Sure, a hypervisor might be making use of specific CPU extensions to help with VM separation, but then again it it can be just as buggy as any other piece of software. And use of those CPU extensions are not unique to hypervisors; AMD's latest ideas (memory space encryption) are available equally to OS'es to separate processes as they are to hypervisors to separate VMs.
The only reason to think that VM separation is somehow better than process isolation is that fewer bugs have been found. Note that the bug count in hypervisors is not zero. Exhibit A: Amazon's occasional global AWS reboots when they fix bugs in Xen. Etc.
VMs are, first and foremost, simply a convenient way to manage installed OSes and applications. No one should ever assume that hypervisors are more secure. A lot of people see VMs as being more secure, but that is just a dangerous illusion that can bite one in one's arse, which may be more likely to occur if one has been lulled into a false sense of security.
Why Does Containerisation Exist in the First Place?
One of the primary drivers I've seen behind containerisation is that devs see it as a way of not having to resolve dependencies on myriad different Linux deployment platforms. Here's a Docker container, away you go; or at least that's the idea. That's quite often their first, and only, concern. Yes there may be security and runtime management benefits too, but I've yet to see a dev choose Docker specifically for those reasons.
Except of course not everyone runs Docker... Or Unikernels. Or <insert technology name here>. By the time you throw in Vagrant (yes, people use that for deploying stuff too, and it brings its own flavour of dependency hell), the anarchy is worse.
A lot of containerisation has actually come out of the fact that the fragmentation of the Linux world's package managers means that no one knows which package management system to build for. So to resolve that they're inventing more elaborate ways of packaging up stuff. But it's all bollocks. More fragmentation simply makes the issues worse.
When you look at the chaos involved in packaging up a binary for distribution in the Linux world, you stop and marvel at how well both Microsoft (eventually) and Apple have done to bring sanity (i.e. one way of doing things) to Windows and OS X. Sure, they're proprietary platforms, but installers generally work, it's generally not a problem for end users to install a binary. Linux can only look at that and weep.
There's a lot of criticism aimed at MS for how Windows goes about these things, e.g. storing multiple versions of a DLL simply because multiple installers have dropped down their own preferred version. Well, so what? Distributing something as a Docker container is no better; each container brings its own dependencies with it. Objectively speaking it's a worse end end result; if two containers have exactly the same libraries, they're not de-duped (unless your storage layer does that for you).
Worse still, the near impossibility of having sensible install time dependency resolution on Linux has bred a certain laziness; more than a few devs don't even bother trying any more beyond a vague README.md. And this laziness is spilling over into ecosystems where it's actually quite easy (Windows, OS X). For instance, installing RabbitMQ properly on Windows is unreasonably hard, when it's perfectly possible to create an installer that'll do all the right things in the right order on any Windows box. Poor installation experiences guarantees that your software will never, ever enter the mainstream.
Gosh, that turned into a rant. Sorry.
Gosh, that turned into a rant. Sorry.
Don't worry about that. You described the operation of the veritable 'snake oil' (aka DevOps) vendor perfectly. Their products can magically solve any problem known to man at the press of a GUI button and it is so simple to use even an untrained monkey can run the whole business from a shack in the DRC at a fraction of the cost of us dinosaurs.
IMHO, and in a couple of years some new 'one size fits all magic potion' will emerge and a whole load more bandwagon jumping will go on.
“All that a hypervisor is is an application running on top of an OS.”
If it’s a Type 2 Hypervisor, yes.
“The hypervisor in Workstation is actually the same hypervisor they compile for OS X, ESXi, etc.”
Workstation is a Type 2 Hypervisor, same as Parallels and VirtualBox.
ESXi is a Type 1 Hypervisor, same as Hyper-V. Yes, even Hyper-V when running “on top” of Windows is a Type 1, because the OS you see is actually running on top of Hyper-V itself when you add the role/feature.
I’ve always been a tinkerer, so VM’s have always been handy to use to experiment with. Now that I am on a team managing a large company with SCCM and WSUS, VM’s are a necessity.
Seconded - really good and informative article. Thank you.
He finds them to be kludgy Band-Aids, and they seem to offend him on some deeper ethical level.
This sounds awfully like my head of IT, he even used the same 'Band-Aid' phraseology to say that we must stop everyone using Docker containers on development machines. Then again, Apple would appear to have taken care of this with their latest MacBook Pro range - running any docker-containerised application is like running BLAST on a 386.
It's full of mistakes. A process with a "bug" can't step easily into the memory of another. It takes a kernel bug or bad design to allow it, as it is enforced at the hardware level - Meltdown was a CPU bug coupled by bad design (kernel memory mapped into user space, and protected only by a paging bit).
Two process can't write to the same file if one opened it with the proper locks, again, unless there are huge bugs in the kernel.
Actually, an OS on Intel architecture could be written to be much more secure and isolate process much better, but the price would be less performance, and loss of backward compatibility for software. AMD removed segmentation, but that was a way to partition and control access to memory in much more secure way.
Software virtualization or containers on Intel don't actually change anything - they are still software layers, and bugs can open holes into them (i.e. https://www.cvedetails.com/cve/CVE-2016-8867/, https://www.theregister.co.uk/2017/09/15/vmware_svga_driver_critical_bug/).
VMWare, Dockers, etc. has been built to solve the issues introduced by OS designs, but on hardware not designed to run more than one OS at the same time, even if some support had been retrofitted.
Sure, they help somehow, especially to isolate users and their processes form each other, and control resource allocations - but they are not proper security boundaries. They are useful, but they don't solve everything. Sometimes, physical separation is exactly what you need.
Know your tools, and use them in the proper way. Just, don't believe in magic.
Yay for commentard understanding the difference between complete electrical isolation and virtualization in terms of security/isolation separation.
Virtualisation and containers are great, but nothing beats electrical separation in terms of delivering complete isolation.
Meanwhile people rush into virtualising their entire infrastructure onto a single device as instances on a common backplane (NFV and friends) and don't understand the difference and implications that compromises might introduce, and plan for these happening.
Virtualisation exists because we're trying to build a mainframe out of jumped up PCs. We waste inordinate effort in solving problems of our own making? 90% or more of modern cloud apps would run more than fine on a nice modular mainframe environment (with containers if necessary)... Time to revisit the drawing board?
Mainframes are very nice machines, I enjoyed working with them.
The main problem I have with them is oomph for the buck. Processing cost in them is way too high, and while some of that goes into the reliable architecture, most of it is simply the margin.
Also, they sell relatively few of them, so research and development costs are quite high per unit.
Had IBM decided to sell at a reasonable price, personal computer would not have exploded in popularity, and therefore intel and friends would not be a big thing now.
the last mainframe I managed was a 370, so no Z for me.
As for dockers.. I do run BLAST on them, with no problems and very little speed reduction.
Maybe the thing is using Xeon processors..
Also, the latest patches for Intel processors DO slow blast a lot, and worse if inside a Docker.. I use an old I5 at work and yes, blast is slower..
Note: most big Internet companies run stupid Intel boxes with smart software managing them. They dont run expensive Mainframes.. and the big companies that do use maingframes are suffering when they try to compete with the agile comapnies that don't use them.
I can't remember the last time that someone actually went through and posed very difficult questions like this. All too often we skirt around these issues and talk about security and networking technologies as products which everyone must have because there is no other way around it.
Very thought provoking, thank you Trevor.
Another very good reason for VM use is to get round the lack of drivers for ancient OS to match modern hardware. In many cases you can get "immortal" hardware as the VM sees little if any changes to the machine upon which it runs.
Of course one would not want to use an out of date OS, but in the real world you may well have some very expensive / difficult to replace software that works just fine on wk2 for example, but would be way too much cost/trouble to replace and run on a current OS.
> ... proper binary prefixes: kilo being 1,000, kibi being 1,024; mega being 1,000, mebi being 1,024, and so on.
Those are not proper binary prefixes. They're a magical invention by sales and marketing droids who want their ram/storage sizes to appear bigger than they really are. And they've been pushing it for years, thankfully with mostly only newbies and the general clueless thinking they're legit.
Please don't spread that false crap further, as if they're terms our industry should use.
This post has been deleted by its author
Because I cheaped out on my NUC with only 4Gb of RAM I found that running the applications I wanted to run became a bit of a resource hog with pretty much all the RAM being swallowed up, in fact, with so little RAM it wasn't possible to run the applications I wanted to run and to run Docker.
So I started it another way, I ran Docker and the applications I wanted to run inside Docker. It's currently running rather nice. So there you go, a use case for containers, being able to run with limited resources.
This post has been deleted by its author
I don't recognize your categorization of "Unix die-hards" being proponents of real time computing.
UNIX was made multitasking almost from the beginning in order to allow several people to share what was an expensive and scarce resource. At that time, UNIX was NOT, and never has been a proper 'real-time' operating system like DEC's RT-11 or RSX-11 (note, there have been real-time extension, like AT&T UNIX RTR, but they are not really mainstream).
In fact, completely counter to what you said, the movers and shakers of UNIX (Dennis, Ken, Doug and Joe - although Brian was less involved) were involved in various degrees with Multics, with all of them taking an active role in that project. Multics was multi-user and multi-tasking, and the desire when creating UNIX was to preserve many of the good things in Multics, on much smaller and less costly systems than Multics needed.
So as a result, UNIX was written, pretty much from the ground up, as a multi-user and multi-tasking system.
In my view, if IBM had chosen a cut-down OS based on UNIX rather than what Microsoft provided, the whole computing world would have been better. As it was, proper multi-tasking did not appear on desktop-class machines for many years, and windows was only dragged into the multi-user world very late indeed.
But I take the points made in the article that the poor implementation of many computer OSs and applications does not provide sufficient isolation between each application, but a properly designed OS with the correct resource fences (for CPU, memory and IO) should really do everything that is currently being done by a type 2 hypervisor. Basic UNIX has always provided process and memory separation, and AT&T derived UNIXes had a 'fair share scheduler' back in the 1980's to enforce CPU limits, and AIX has had Work Load Manager (WLM) since AIX 4.3.3, which is used for WPARs (Workload Partitions - much like Solaris Containers) for limiting CPU, memory and I/O resource use.
A proper OS should enforce memory separation (UNIX has since it was re-written on the PDP-11), although the current Meltdown has shown that Linux (note, Linux is not UNIX) has taken some (in hindsight, and IMHO) poorly thought out efficiency shortcuts (like mapping most of the kernel memory space into each process). UNIX never did this, at least not on the PDP-11, s370, VAX, Sun Motorola and SPARC platforms that I know most about.
It would be interesting to look at Intel UNIX ports like Sun/OS i386. AIX PS/2 (damn, I should know this for this platform), Xenix/368, Interactive UNIX, Microport UNIX and UNIXware to see whether those platforms properly separated the kernel address space from user-land.
This post has been deleted by its author
I am aware that prior to Edition 4, UNIX on the PDP-7 was one user at a time, but it had the concept of multiple users, although only one, at a time from earlier than that.
Bearing in mind that in the beginning, it was a side-of-the-desk project, borrowing a system that did not belong to them, it is not surprising that it took a short while to become fully multitasking with multiple concurrent users.
The early '70s was before my (computing) experience. I first used UNIX Version (Edition) 6 at Durham University in England in October 1978 (Yay, 40th anniversary of first using UNIX coming up), although I had used ICL George 3 or 4 and TENEX as a guest a few months earlier, and MTS at the same time as UNIX, but shared access computers were a real rarity at the time, especially outside of universities and other research establishments.
UNIX must have been quite the breakthrough for those who came across it at the time.
I think that the article (and most of the comments) could be summed up in the old direction advice: "the best way to get to over there is not to start from over here" yet here we are, and as well as getting to there we have to balance competing (and contradictory) goals, resource constraints, legacy investments and organisational politics. Take three axes of success and you can pick (at most) two of them. Achieving at least one is a better-than-average day in my book. We need both the single track, mission-poster nutters as well as the pragmatists to move forward in balance as there is not right answer and no point pretending that there is.
This sort of diverse, opinionated, unpredictable, irreverent but mostly constructive and respectful discussion is what I have always liked about el reg
In a shrinking world where the available software and its sources are shrinking, it is easy for some software, o/s or driver to simply refuse to be corralled by a VM or container. For some software we can say gotcha by running another full o/s implementation and the miscreant within but that just leaves us & our system feeling bloated and sluggish. We are then forced to find an alternative, often with less functionality, do without or let it have it's way.
There's one area where VMs are particularly useful - testing.
Being able to spin up an environment with specific software versions, do some testing, and then revert it back to a snapshot, all without cluttering my desk with machines is invaluable.
I can't see a way to do that in hardware without a whole desk full of machines, and re-imaging them after each test.
Virtual machines and containers bring with them a whole host of their own unique problems, just as running directly on hardware brings its own unique problems.
I don't thing that any stance that says you should always or never use virtualization is valid. It's a tool, and like all tools, there are appropriate and inappropriate uses for it.
The only thing that really concerns me about virtualization these days is the number of people who view it as some sort of panacea, and believe that if they're running something in a VM, then that means there's no way that misbehaving software in that VM can affect the rest of the system.
That simply isn't true, and brings to mind the truism in security circles that you are the most vulnerable the moment you believe that you are safe.
Well, I have seen type 1 hypervisors used for just that purpose.
I'm not suggesting that I agree, as you have to trust the vendor that the isolation is complete enough, but in at least three places I have worked, systems in the environment including security components have been virtualized onto the same hardware for seperation.
And that is the worry, especially as Linux is often used for the basis of the hypervisor in many installations, and the various forms of Spectre including Meltdown has been shown to affect Linux.
Though I assume it's sort of safe to assume the article author means "docker" style containers(app containers) when referencing containers, am not certain though.
I deployed my first vmware in production on what was then called VMware GSX server I believe(later renamed I think to VMware Server??) back in 2004. It was a last minute idea I had. The situation was the company had to roll back a major production software upgrade which included support for a new customer that was going to launch in production in a few days(they were already advertising their service on TV at the time I saw). Company was really upset.. but there was no way to go forward the code base and data set were shared with other bigger customers that had critical bugs and could not go forward.
So after an all nighter I was in the CEO's office along with some other senior folks. I suggested at the time we could take one of the QA VMware hosts, reconfigure it, put it in production and use that. The expected traffic to the customer was expected to be tiny. Production ops had no spare hardware available to use vmware (mainly memory requirements our standards at the time was about 4GB/system). Even though we were used to ordering HP Proliant servers UPS Red (overnight), this was Friday morning and there was no way we could get new hardware by Monday morning deployed and in production.
So I lead the effort to build the new customer a new application environment which consisted of a single apache front end VM, a tomcat web server VM behind it, a weblogic app server VM behind that, and an Oracle server VM behind that. All running on a 2U Dell something with 2 CPU cores and 16GB of ram with 6x3.5" disks, I think 10 or 15k RPM I don't remember. It had 3 network connections to the 3 levels of networking we had at the time. Took about 30 hours of configuration work but we got it working and the customer launched on time. The first day they took 10x the traffic we were expecting and we had to augment the setup with a 2nd web server. That vmware box ran in production for probably 1 year before they had migrated everything to bare metal.
The biggest use case for hypervisors to me at this point is hardware is just so powerful now it is difficult to leverage the power of a single system that can have dozens of cores and tons of memory. Most of the 1,000 VMs I have sit at very low utilization levels most of the time. Trying to horseshoe dozens of applications in different environments onto a single OS image is well too difficult. Add to that network complexity, my VM hosts have a dozen or more VLANs assigned to each host making it easy to select where to put a VM network wise.
With containers I have only used LXC OS-level containers where they run all of the same services as a VM with exception of the kernel/drivers. You can login to them (Ubuntu in our case) and do whatever like a normal system. The biggest benefits here that still hold true today for our app stacks is massive CPU scheduler improvements for our stateless applications. There is some "risk" of a container/app overrunning the others WRT to CPU but in reality that has never come close to happening in the past 3 years(a good chunk of that time the servers never could go above maybe ~30% cpu usage due to app code bottlenecks that were resolved later). So the data says otherwise at least for our workloads (it would make less sense if you were doing this with workloads you do not have control over(e.g. service provider), we control/monitor everything end to end). Result is we haven't had to adjust the CPU levels of our main app servers in 3 years. Most of the time they sit around 5% usage, ultra high usage have seen as high as 70-80% but that is very rare. Response times improved a lot taking the hypervisor out, and giving the systems more cpus to spread their threads across. I don't have to worry about CPU contention on these systems since it is all 1 kernel. It's worked better than anyone expected. The original LXC server hardware paid for itself almost immediately simply in licensing savings. Operational savings not having to worry about scalability for the following years despite ever increasing traffic was just piled on top.
Application containers is something I am disgusted with myself. Not the containers themselves but more how sloppy development has gotten over the past decade to bring up a need for such a technology in the first place. I thought Ruby on rails was bad back in 2007 when I was first exposed to it. To my horror not only have they not fixed that problem but it's gotten exponentially worse with things like node.js. By comparison current org ran the same PHP stack for about 4 years other than security updates. The past 2 years have had 1 upgrade (pretty minor from x.x to x.y), and a few security updates. Node seems to get that on at least a monthly if not more frequent basis (with the modules and stuff)
VMs came about in my opinion more to provide an abstraction layer to the hardware, mobility of VMs between servers, or between storage systems. Also as servers got more powerful it became ever more difficult if not impossible to leverage that power with a single application running (My dual socket/384GB vmware servers run anywhere from 30 to 80 VMs/each and are far from fully loaded memory wise, I am very conscious of vCPU allocations). And before you say oh everything is highly available, that is a load of shit. Tons of things are single points of failure, especially in dev/test environments, I probably have 300 VMs that each by themselves are single points of failure (none are critical to production). We haven't "lost" a VM in the 5 years this infrastructure has been running (we were lucky to lose only 1 VM/month when in public cloud). If a host fails the VM is restarted on another host within a couple of minutes, so there is no need to build redundancy at that level it's just a waste of resources. There are a few production single VM single points of failure due to application design, though I protect them with VMware Fault tolerance (they are single CPU VMs), to-date haven't had to have fault tolerance kick in though.
App containers are a totally different use case, they certainly have their advantages but to me anyway they solve a problem that should not exist in the first place. That said there are people at the org I am at that really like containers and want to use them and we probably will use app containers for some things in the future. My boss's boss fought me hard deploying LXC for our newer application stack saying he wanted everything in vmware. I warned him over and over again given the nature of the application they needed to do serious ongoing performance testing to make sure the system could scale correctly. Well after 1 year that never happened and they were constantly worried about scalability. Well that boss is gone now, and one benefit from app containers may be the new app stack gets deployed to bare metal to leverage improved cpu scheduling and scalability.
While I run containers on bare metal, my use case wouldn't really apply if I was running LXC within a VM on a hypervisor, so I don't do that. The only LXC containers we have run on bare metal.
I will second your comments about Type 1 hypervisors (ESX/ESXi)- When I came into the network group at [RedactedCo] in 2009, we had just started to virtualize most of our applications- prior to that, we were buying 1U rackmounts for each application, which was.. not an efficient use of resources.
As far as machines being massively over powered? Spot on- On average, one of our 'production' hosts nearly 40 VMs, and sits at around 40% CPU utilization. Memory is about the same, but we performed a massive upgrade in that respect in order to get enough slack to be able to tolerate a host failure in the cluster.
As far as containers? I don't think any of our vendors knows what they are, let alone use them. :)
One of the things that I ran into last week was an ill-behaved VM appliance not just crashing, but it caused the host to spontaneously reset at the same time. (as in if someone either pulled the power plug out or pressed the hard reset button on the machine reset) I've seen VMs mis-behave, but this is the first time I've seen one take out the host in such a dramatic manner. (I've also seen the hardware cause ESXi to purple screen as well, but that's not entirely unheard of.)
Fortunately, this was not in our production environment, but in our test lab arena, so the only problems it caused was with the other apps we have in that environment. *shrugs* And I was able to find a work around (i.e., re-deploy the appliance to different hardware), so it wasn't all that bad
I spend a fair bit of my time these days as an architect for healthcare and bio projects. These aren't greenfield sites - anything but. They have software that can be fifteen years old, designed for operating systems that aren't supported any more. Applications often assume they're interactive and only one instance will ever run on the system at one time. This gets quite entertaining when you're trying to do bulk analysis of (say) 5,000 multi-gigabyte data files in the shortest possible time. But the applications are known to produce "correct" results and nobody wants to fiddle with them, so we have to find a way of running lots of them in parallel on modern tin.
Oh, and these are generally Windows applications, so VMs are far more useful than containers - though if anyone can point me at a system that can emulate Windows XP with Docker-speed process starts and low memory overhead, I'll be very interested!
Time is expensive. Storage is cheap (assuming the flash fabs aren’t flooded out again...)
Containers et al are probably a kludge, but it works. I don’t see a better alternative on the horizon at the moment. I think most organisations would prefer to buy a few extra terabytes of storage than spend extra time standing up/fixing apps. Dependency hell is the worst kind of hell. Containers can help to mitigate that.
Containers and VMs are also much more portable than running directly on iron. That means you can get more flexibility out of your hardware if you design things appropriately. Migrating VMs and containers around (or between) data centres tends to be a simpler affair than moving servers.
Biting the hand that feeds IT © 1998–2019