Containers are the hot topic at many if not most of the IT conferences I have attended recently. Containerization has definitely invaded the territory of virtualization, with many saying that containerization wins in a beat down. Does industry even need VMs anymore?
We decided to offer our take in a 8-part blog series that will examine the advantages of containerization over virtualization. The battle royal will include:
- Image size and Overhead
- Start time, and Setup Orchestration
- Image formats
- Applications sharing
- Image version control
- Package distribution
- Device access
Our “wrestling match” is actually a friendly contest between Taras Shapovalov, who is on the container team and Piotr Wachowicz, who counters with some tricky moves outlining VM advantages. Let’s get ready to rumble:
Round 1 – Image Size and Overhead
Taras – Let’s start with image size. A container image is usually pretty light, because it does not need to include the whole operation system. The image could be even less than a megabyte and may work well with only the application with some configuration. Such small size can really save on image transfer time and also saves space on the host's filesystem. A small image also allows you to easily send it to someone else else via network – or even by email.
Image courtesy of stackoverflow.com
By contrast, the VM image needs to have a copy of a whole operation system, including kernel, all system libraries, system configuration files, all the directories required by the operating system, and all the system and administrator utilities. A copy of almost all of these files has to be located on the same host for each VM running. And it will be hard to send a huge image to someone else, because you will need to find a place to upload the image first (for example, an ftp server) and then send instructions on how the user can download the image afterwards.
Piotr – Not so fast. With VMs and a proper storage system, you only have to store one, or maybe just a few, full images of the VMs along with differential copy-on-write copies of those images containing any incremental changes. For example, in OpenStack you could use volumes, managed by Cinder for that, with Ceph as the storage backend.
Taras – Right, copy-on-write technology does save some space, reducing the container advantage to some degree.
Next, let’s move on to overhead. For high performance applications that use I/O intensively or that make a lot of different system calls, it is important that overhead is kept as low as possible. This is critical in HPC, but must also be taken into consideration for other types of applications.
Taras – Because a container (at least in Linux) is basically a cgroup (control group) and a set of specifically-configured namespaces, the containerization itself does not add any overhead, as it would with bare-metal processes. However, it is true that there is some overhead on networking operations when containers use a dedicated virtualized network. On the other hand, when the host shares network interfaces with containers, there should be little or no overhead.
With virtual machines, it’s a totally different story. When a process running inside a VM performs some I/O call or any other operation that involves a system call, that call goes through a layer of virtualization that always brings some overhead. Of course, modern hypervisors working on modern hardware reduce the overhead, but some research I’ve seen suggests that there is more overhead when using VM’s compared to containers – and it is still meaningful.
I also want to point out that each VM consumes some amount of memory, which is another type of overhead. The VM consumes memory even if it isn’t running any user processes. This kind of overhead limits how many VMs you can run on the same host at the same time. You can always start more containers using the same amount of memory.
Piotr – The CPU overhead of a VM can actually be smaller than you might think. It is not uncommon to go down to as little as 2 percent CPU overhead for the VMs. In most software, more can be gained by taking the time saved by deploying in a virtualized environment, instead of bare metal /containers, and investing that saved time into optimizing those few time-critical for-loop in your software.
In other words, CPU virtualization is relatively easy. Virtualizing disk (block) and network I/O is a different story. Here we have to discuss two separate things: contention caused from other tenants on the same hypervisor, and the overhead of virtualization itself.
Overall, I maintain that VMs are not as expensive as one might think. Not as efficient as containers, but not far off. VM overhead can be reduced dramatically when you know exactly how to do that (and when you use the right tools). We’ll go into this more in the next “round.”