Containerization vs. Virtualization – Image Version Control

    

Our blog smackdown between containerization and virtualization is moving along nicely. We have even extended it two more “rounds” because our teammates have so much more to say! Our next post is about image version control.

For new readers, we are in the middle of a “wrestling match” – a friendly contest between Taras Shapovalov, (team container) Piotr Wachowicz (team VM). The battle royal includes:

  1. Image size and Overhead (part 1)
  2. Overhead (part 2)
  3. Start time, and Setup Orchestration
  4. Image formats
  5. Applications sharing
  6. Image version control 
  7. Package distribution
  8. Device access

     

    Round 6 begins – Image Version Control

    Taras – Sometimes it is really important to create several versions of the same application environment. If you need to set up several versions of GNU Compiler Collection (GCC) and easily switch between them, you can use, for instance, module files or Lmod (Environment Module System). With container images, it is much easier to support several versions of the same applications.

    hierarchy.jpgHPC users may use containers to pack job environments inside the container. When they want to create a new version of the containerized job, they can just add a new layer in the image or run the container locally (say on their desktop). Or they could modify the filesystem from inside the container and then create a new snapshot; this snapshot will be a new version of the image that can be used to run new containers. This way, the user can create a hierarchy of the images that will contain the same application, but, for example, may use different third-party libraries. Once the user has tested all of the created images and finds the best one (say, from a performance perspective) they can remove the other images.

    Let’s compare that with the virtual machine version of images snapshotting modified images. The average size of these images makes it difficult to create as many snapshots as most users would likely need. It’s also more difficult to add a new layer (that contains, for example, new files) to an existing VM image.

    Branches.jpgWe can even compare the creation of new versions of container images and VM images with the way Subversion (SVN) and Git branches are created. While branch creation is possible and well documented within both, Git branches are pretty lightweight, can be easily created locally, and can be easily managed, modified or destroyed. By contrast, SVN branches are fairly heavy and can be harder to work.

    Piotr - I like your analogy with SVN and Git branches. But managing VM image revisions can be pretty easy – if you have your cloud set up “the right way.” Say your OpenStack cloud is running on top of a storage system like Ceph, which provides support for copy-on-write storage. Then, creating copies of snapshots and volumes is pretty cheap. With copy-on-write, only the actual differences between revision A and revision B are stored. This is not only space-efficient, but also relatively fast, even when working with large volumes containing large datasets.

    Taras - OK, that’s true, but that means that Ceph will be a requirement for all OpenStack deployments – and Ceph installation and management is not a trivial matter! It also requires administrator experience.

    Piotr - I agree that deploying Ceph can be non-trivial, but Bright OpenStack comes with Ceph integration built in, which means it’s not only easy to deploy Ceph, but also to manage that deployment throughout its lifetime. Actually, the only tricky part about deploying Ceph is selecting the hardware components that will give you the biggest bang for your buck. However, as long as you stick to some already well-established rules of thumb, you should be good.

    (Editor’s note: The rules of thumb to which Piotr is referring are: Put OSD Journals on SSDs, use 1 journal SSD per 4-5 HDD, use SSDs which support a lot of writes throughout their lifetime, use at least 10 GigE fabric, use a dedicated fabric for Ceph data replication).

    Taras - Agreed.

    Join us again soon for Round 7 – a look at package distribution.