Dockers usage in PMDK
In this blog post, I’ll describe why we believe dockers are easy to use, time-saving, and valuable for day-to-day programming and debugging. If you have never heard of dockers (or containers in general), please read, for example, this overview. We use dockers in almost all of the repositories in our organization. In this blog post, I will describe how we use dockers based on the PMDK repository. In some of our repositories, like in memkind, we use a bit different approach, but it still relies on docker. The section “Our various solutions” below describes some differences between our repositories. Let’s get to the details!
Dockers are easy and maintainable
From our point of view, dockers significantly simplify part of our work.
We use dockers in our CI for the majority of tests’ executions. We’ve started by preparing docker
images for commonly used Linux distributions. Each of our repositories contain utils/docker
directory, like here. Images are stored usually
in a separate sub-directory - unsurprisingly called images
. Among docker ‘recipes’ (Dockerfiles),
there are some scripts used within containers to install dependencies (e.g., install-valgrind.sh
)
and a short README file with instructions on manually building and running dockers from our Dockerfile
images. Note that we only have docker images for publicly open Linux distributions; some of our
repositories/libraries support Windows’ builds, and its testing is done purely on Virtual Machines
delivered by CI providers (like GitHub Actions).
Cleaner workflow and reproducible environment
We figured that keeping “Linux images” as Docker files simplifies their maintenance and updates, makes them publicly available in the repo (for any reviews and contributions) and allows for easy re-build on demand by anyone. The last part proved to be especially useful for us - developers. Execution of tests is just the beginning. If the CI fails, you have to dig through the CI’s logs and try to guess what happened. With a quickly reproducible environment, the job got a lot easier. We don’t have to look for a specific machine with some precise version of a Linux distribution. Just read our README ;-) and simply build docker image on any developer’s machine and debug the code within the exactly same environment.
To be even lazier… I mean, productive… we sped up the whole process. GitHub comes with a great
feature - “Packages”. It allows us to store built docker
images for later re-use. We use it mainly on CI to save time on re-building (an unchanged image) and
get to tests’ execution as soon as possible. If the image(s) or installation scripts are updated,
we have to re-build them, but only once. Such ready-to-use docker images can be easily stored as
a “package(s)” along the GitHub project - see, e.g.,
PMDK’s package. Each image is tagged
by a release version (because PMDK’s dependencies may have changed over time, in various release branches),
distribution’s name, and CPU architecture (e.g., 1.12-ubuntu-22.04-x86_64
).
If you want to download and run one of our published images, it is as simple as entering any of our,
mentioned above, public “packages”. Each of them provides a straightforward “how to” page generated
by GitHub about using such an image. You can either pull
the image and just run it or use it
as a base image for your own Dockerfile ‘recipe’. Building on top of our image gives you the advantage
of having all dependencies already prepared. Such an image would have to be expended with potential
custom packages and files, e.g., for further development or debugging.
Usage in the Continuous Integration
As I mentioned above, we keep all “docker” related files, in most of our repositories,
in a utils/docker
part of the tree. For example, in PMDK it is
here.
The previously mentioned images
directory contains:
- README file - to explain basics on how to use dockers
- Dockerfiles, which define steps to create our complete OS environments - docker ‘recipes’
- installation scripts (for various dependencies and, e.g., testing tools), used to build images
- two helper scripts for building and pushing images to the “registry” (a.k.a. GitHub’s Packages)
Dockerfile ‘recipes’ are used by the helper scripts, and they, in turn, are used by the
“upper” layer of scripts that reside directly in the utils/docker
directory. pull-or-rebuild-image.sh
script makes use of these helpers. It decides whether to re-build the image or just download it
from the registry. Images’ re-building is based on changes introduced by a user (e.g., in
a Pull Request).
In the docker
directory, there are the most interesting scripts. There are some build*
scripts
(usually just one - build.sh
) and several run-*
scripts. In the PMDK repository, there are two build scripts.
First one (build-CI.sh
) is used by our CI as an entry point to prepare the environment (dockers)
and execute a selection of our tests. The second one (build-local.sh
) is a simplified version
to run tests on your local machine manually (but still using dockers).
The second group of scripts (run-*
) is prepared for executing specific sets of checks
(e.g., run-build-package.sh
verifies if preparing PMDK packages is working properly).
All GitHub Actions workflows and jobs (using the listed above scripts) are defined in
.github/workflows
sub-tree of our repositories. For PMDK it is
here.
Our various solutions
As I wrote in the introduction, not all repositories are handled exactly the same. Each repository related to PMem was developed by various people with heads full of ideas. Some requirements may have forced teams to update testing workflows accordingly to their needs. Some changes to CI and dockers were introduced in a rush (e.g., because of some deadlines) and were not ported to other repositories. Having said that, we tried to keep the differences to the minimum.
The main differences are, for sure, in run-*.sh
scripts, which are delivered specifically to
execute tests and checks adequate to the given library. Most repositories, like
rpma introduce only one entry point script
(build.sh
). In the PMDK repository, building the whole environment and tests’ execution requires
a significant amount of env variables to be set on the host machine. To ease the process for local
re-building, build-local.sh
was introduced.
The PMDK repository is tested in more environments, compared to other repositories, (including various
architectures) so some additional files are located in utils/docker
to handle different CI’s.
Example of such extra file is arm64.blacklist
, which lists tests not applicable for arm64
architecture.
As I mentioned in the beginning, Memkind approach is a little simplified and unique. There’s still
utils/docker directory, but
it does not contain a separate images
sub-directory, and it does not push images to GitHub’s “registry”.
It comes with great, pre-defined docker_run_*.sh
scripts (used, e.g., in GitHub’s workflow) and
an extra run_local.sh
script (similar to PMDK). There’s also a well-written README file
describing all files and the building process.
Finally, not all repositories share the same number of docker images. Each repository has its own set of OSes, depending on the requirement for a specific library. We started some efforts to unify it, but this isn’t a piece of cake, and it would require some time to finish this up. Currently, some “common” dockers are located in a separate repository called dev-utils-kit.
Summary
To summarize dockers’ usage in PMDK, I’d have to say: it’s very nice to have them working in our CI! As I described above, there are multiple benefits of introducing them into our development process, with “reproducibility” and “portability” as one of the greatest (in my opinion). Overall, they add a little complexity to our workflows, but after you get used to these virtual environments - they are great!
As for our various repositories - the testing environments and Continuous Integrations come in a few flavors, but they are generally quite similar. The differences result from various needs and different developers, but when you familiarize yourself with any of the repositories, the other ones should be just as readable.