In recent years, docker has created a containerization boom around the world by providing a way to easily create and run application containers. Containers save people from dependency hell by packaging the software with the operating environment it needs. Although docker was designed to be neither an operating system container nor an operating system running directly on the bare metal, docker’s powerful suite of tools will also give us tremendous convenience in managing our desktop system running on bare metal.
Why using docker image as a desktop system is a good idea? Let’s begin with talking about the inconvenience of the normal way how people are managing their desktop systems. Nowadays, most of us has more than one computer, and we want these computers to be “consistent”. Here when I say “consistent”, I mean, for example, I begin writing a document on one computer (say, at home) and am unable to finish it before having to switch to another computer (say, at work). I don’t want to worry about copying it manually to another computer, instead, I want it to be able to magically appear there so I can access it at any time. This is exactly what cloud sync disks like Dropbox do for us. However, for geeks, what cloud sync disks do is far from enough. For example, you are busy with a project, which uses a number of programming languages, libraries, and a bunch of GUI and non-GUI tools. As you keep trying new things, you install new tools and change configurations continually on your system. It would be nice if these changes can be synced across different devices automatically so that when you install something you won’t need to install it one by one on each of your computers.
Readers unfamiliar with docker can check out the official tutorial at here. Docker is easy to use: we start by writing a Dockerfile containing commands that install and configure the libraries and tools we want. Readers unfamiliar with docker can take a look at the Dockerfile example below that I borrow to get a quick idea on how a Dockerfile looks like:
With a Dockerfile, a docker image can be created with just a single
docker build command. The Docker company offers a service called DockerHub that can host public images for free. Use
docker push to upload the image to DockerHub. To get the latest version of your image from DockerHub from a different computer, you only need a
docker pull. DockerHub also supports auto-build. By connecting your DockerHub account with your GitHub account, you can make DockerHub automatically regenerates the image whenever the Dockerfile on GitHub changes.
Docker is actually giving us the solution to the maintenance of consistency mentioned at the beginning of the article: we create a docker image that has everything we needed for our project. In that case, everything related to this project, such as development, testing, deployment, etc., can be done by first opening a container with
docker run and then doing all the work inside. When we want to install something new or change some settings, we simply make the appropriate changes in the Dockerfile, rebuild the image, and then update it with
docker pull on all the machines using it. This philosophy of use provides a very elegant solution to the problem of consistency through a centralized repository. The only drawback is that not all programs can run in the container, and not all of the programs are easy to run in the container. If you are using a GUI program, or some system-level program, then using the programs inside a container can be a lot of hassle. So, to overcome this drawback, it is natural to ask: can we mount a docker image directly as the root directory when we boot, so that we can run the image on the bare machine and use it as our daily desktop system?
There are other benefits to this approach other than the convenience of maintaining consistency:
- The entire system is stored in the cloud, the local content is only a cache of the cloud, so there is no need for regular backup of the system.
- How your system is configured from scratch to the way you want is clearly written in the Dockerfile. Dockerfile then becomes your best note.
- No need to worry about junk files or data corruption of some programs after using your system for a long time. Because every time you turn your computer on, you are using a brand new system.
- When you get a new machine, you don’t have to install the operating system from scratch: just pull the image from DockerHub.
- The process of a system update is nothing but rebuilding a Dockerfile from the latest software repository. Less human intervention will be needed compared to the normal system upgrade, which sometimes encounters file conflicts or dependency problems.
There is an introduction to docker’s storage driver on the official site, so we only discuss it briefly here. Docker uses the concept of layers. When building the image, docker executes the Dockerfile line by line and a new layer is created when executing each line. Only the diff is stored for each layer. When we do a
docker pull or
docker push, what docker actually does is download or upload the delta between layers instead of the whole layer. Whenever we do a
docker run, docker will stack these downloaded deltas together to get a complete image. A new read-write layer (we will call it rwlayer in later contexts) on the top would also be created so that all writes to the container go to the rwlayer and the image itself is kept read-only. The concept of “layer” is actually implemented differently for different file systems where the docker directory (usually the directory
/var/lib/docker on your computer) resides. These implementations are called graph drivers. Build-in graph drivers include aufs, overlay, btrfs, zfs, devicemapper and so on. Most graph drivers use copy-on-write techniques so that the process of creating a new layer does not require making a new copy of the data. The actual copy takes place when a write happens.
As the author only uses btrfs, this article will focus on btrfs. Btrfs is a copy-on-write system. To make use of the copy-on-write feature of btrfs, whenever stacking a new layer, docker creates a snapshot of the original layer’s subvolume and write the diff into the new snapshot. When creating a container from an image, docker would create a snapshot of the top layer’s subvolume and use it as the rwlayer.
Besides the knowledge on how docker storage drivers work just discussed, we still need to know the Linux startup process to achieve our goal. During boot, the boot manager would load the kernel and a ramdisk called initramfs into memory. After some very basic initializations, the kernel will extract the initramfs to the root directory
/ and then start the init program in the memory disk (usually
/ init). This init program will do some further initialization (for example, loading the file system drivers, do fsck, etc.). Then, this init program will mount the real root directory based on the kernel options
rootflags, etc. and uses the
switch_root program to switch the
/ from the initramfs to the mounted new root. The init program will be started by
switch_root to make final initializations, such as mounting entries in fstab, loading graphics interface and so on. Many distributions provide tools to generate initramfs. These tools are often modular and allow users to add their own hooks. The Arch Linux’s such tool is called “mkinitcpio”.
Now we have all the knowledge needed to boot into a docker image, and here is the idea: we can add a hook to initramfs, which creates a snapshot from the desired image’s subvolume in docker’s local cache as a rwlayer before the new root is mounted. The real root is set to be the rwlayer, which will then be mounted and
switch_rooted by the init in initramfs. In detail, when writing the kernel option in boot manager’s config file,
root should be the partition where your
/var/ lib/docker is located, and
rootflags must have a
subvol=XXXXX, where XXXXX is the relative location of the rwlayer that we plan to create with respect to
root. The most important thing to do is to write a hook that: find the btrfs subvolume for the desired docker image, and then create a snapshot of that subvolume named XXXXX (the same name as in the kernel option). If you are using Arch Linux like the author, then all of the works have been done and the reader can use the author’s hook at GitHub directly.
The code is located at: https://github.com/zasdfgbnm/mkinitcpio-docker-hooks . Besides, the readers can also install
mkinitcpio-docker-hooks directly from AUR. A step-by-step tutorial to use this hook is given in the following section.
The usage of mkinitcpio-docker-hooks is roughly divided into the following steps:
- Make sure your
/var/lib/dockeris in a btrfs partition
- Prepare a docker image suitable for booting on bare metal
- Install and configure mkinitcpio-docker-hooks in this docker image
- Prepare the kernel and initramfs
- Prepare the top layer content
- Setup boot manager
To run a docker image on bare metal, you first need to have a docker image that is suitable for doing so. Many docker images, in order to reduce the size of the image, do not come with software packages that are only useful to bare metal running (such as dhcpcd). So we need to manually install these packages in Dockerfile. For Arch Linux, this can be done simply by installing the group named
Since the next step is to install mkinitcpio-docker-hooks, it is recommended to base your docker image from an image that comes with yaourt. Personally, I use my own archlinux-yaourt image named
zasdfgbnm/archlinux-yaourt. Therefore, the first few lines of Dockerfile looks like this:
Since this is just a demo, I will not install any other package here. The readers may want to install other packages according to their needs.
The mkinitcpio-docker-hooks package should be installed inside the docker image that we plan to use as our desktop. The reason is mainly that the Linux kernel does not provide ABI stability. Therefore, the kernel modules must be strictly consistent the kernel version, otherwise, the modules cannot be loaded. In order to maintain this consistency, what we do is to install mkinitcpio-docker-hooks and generate initramfs in docker, and use the kernel from the docker image to boot. Doing so will ensure that the kernel, modules copied to initramfs, and the
/lib/modules inside the docker image are for the same kernel. To install mkinitcpio-docker-hooks, add the following to Dockerfile:
The configuration file for mkinitcpio-docker-hooks is
/etc/docker-btrfs.json, whose default value reads:
All we need to do is replace the values of these two variables with the values we want, for example, I’m going to name my demo docker image “sample_image”. At the same time, we also need to add the
docker-btrfs hook to
/etc/mkinitcpio.conf. The following two lines in Dockerfile does the above configuration:
Now we have a ready-to-use demo Dockerfile:
We can the use the command
docker build . -t sample_image to create our docker image from this Dockerfile.
After the image is generated, the next step is to prepare the kernel and initramfs. Note that this step is best done on the machine that you intend to use to start the docker image because mkinitcpio automatically places the appropriate kernel modules into initramfs based on the machine, and if run on other machines, there may be wrong drivers will be packed into initramfs. As mentioned earlier, this step is done inside the docker container.
First, run the container and open an interactive shell:
Then, run the following commands inside the shell:
Now you can see the
vmlinuz-linux and generated
initramfs-linux.img in the current directory.
The top layer is a new concept introduced by mkinitcpio-docker-hooks. It refers to a directory in a drive that will be copied to the rwlayer via busybox’s
cp -a at startup, before mounting the real root and after the creation of rwlayer. Why do you need top layer? Because we need to start the same image on multiple machines, different machines often need to have different configuration files, such as
/etc/X11/xorg.conf. In addition, DockerHub free account can only host very few private images, but
/etc/shadow and other private files are not suitable to be in a public image.
Preparation of the top layer is to find a folder to store separate configuration files, in a structure the same way as in image’s file system. For example, if you want a separate
/etc/fstab, you should add the file
etc/fstab in the top layer’s directory. Here’s the suggested operation flow: first, enter the shell in the container through
docker run -v $(pwd): /workspace -w /workspace -it sample_image bash; then, config the system in the shell, such as
useradd ...; finally, copy new configuration file to the top layer folder.
Now the only thing we need is to set the boot manager. Here we take refind as an example. Everything (docker directory, the top layer, kernel, initramfs) is assumed to be in a btrfs partition labeled “linux”. The docker directory (i.e. the
/var/lib/docker) is in a subvolume called “docker” at the root of this partition. The kernel, initramfs, and top layer are all located in the “boot_docker” folder at the root of the partition. We want to name the rwlayer “docker_rwlayer”. Then the menuentry in
refind.conf should read:
Among the kernel options, we specify the partition where the docker directory is located by
rootflags is used to specify the location of the rwlayer. The
docker_path is used to specify the relative position of the docker directory in
root. The partition where the top layer is located is specified by
toplayerflags is used to specify the mount option for that partition. The
toplayer_path is used to specify the relative position of the top layer directory in the
Everything is done. Reboot and enjoy!
In addition, interested readers can take a look at the docker image that the author is using: