Some projects, especially ones utilizing machine learning require access to the computing power of an attached graphics processing unit (GPU). This is the case for software that depends on CUDA, NVIDIA’s parallel programming model.
Alternative needed
On a full Linux OS such as Ubuntu, you could simply install the NVIDIA Container Toolkit alongside Docker and run one of NVIDIA's base images with the --gpus
flag and have container access to the GPU, assuming the NVIDIA drivers have already been installed.
In a minimal OS designed for edge devices such as balenaOS, this setup is simply not an option for a few reasons: The OS is read-only, so you can't install the Container Toolkit. Even if you could, the --gpus
flag, which exposes the GPU drivers on the host to Docker (in our case balenaEngine) would not find any GPU drivers. That's because they are too large to include in balenaOS and are not open source. Also you can't install them because remember, the OS is read-only.
You may see some references to the balena docker-compose label io.balena.features.gpu
which was explored for future use but not functional as of this writing.
A solution
In order to get the NVIDIA drivers we need onto the host OS, we'll build them as kernel modules in our container, and then load those modules from inside the container after it starts. Since kernel modules can be loaded and unloaded into the kernel upon demand, this gets around our read-only host OS situation.
Out of tree
Some Linux modules can be compiled with the kernel because their source code lives in the kernel source tree. These modules are called "in-tree." The NVIDIA driver module we will be building is considered "out-of-tree" because the source code is located elsewhere. (NVIDIA has so far decided not to open source their driver, so they maintain it out-of-tree.) We will need to download the kernel source code for our OS, extract the headers, and use them to build our driver module via files provided from NVIDIA.
Anti-patterns
In order to build and load our module from within the container that requires GPU access, we’ll need to perform a few actions that run counter to the Docker notions of container portability and security. They are mentioned below so you can decide for yourself if they are acceptable for your project or implementation.
The kernel module we will build for the GPU in the container will have to match the host's kernel exactly. If you update the host OS and it uses a different kernel version, you'll need to rebuild the module. (For this to happen, you'll need to adjust the container's Dockerfile and re-push it.) This limits container portability, where a typical goal is for a container to be able to run independently of its host system.
In order to load a kernel module from our container into the host, we will also need to grant that container elevated privileges. In our examples they will run with
privileged access. Also note that kernel modules can bypass filesystem permissions and other security, so use with caution.
Getting started
You can find the example repository
here. The repository’s readme file contains all of the necessary details for deploying a few different examples which we’ll outline below.
Minimum example
For a minimal example, simply run the gpu
container from the docker-compose file and remove everything else from the file. The gpu
container downloads the kernel source file for our exact OS version, builds the NVIDIA modules, and loads them into the host OS. This container then provides CUDA compiled application support (for CUDA development support see the cuda
container below).
You could use the gpu
image as-is, or as a base image for adding your own apps or software on top of it. In addition, you can add the gpu
container to an existing project to provide GPU access to any other container. (That’s how the rest of the examples in this repo work.) The only requirement is that the containers requiring GPU access must have their own copy of the NVIDIA drivers installed. The driver version must exactly match the version in the gpu
container.
In the cuda
container example below, we use the same driver downloaded from NVIDIA as in the gpu
container. In the app
container, we download and use the same version of the NVIDIA driver from Ubuntu’s official package repository.
Before you deploy the
gpu
container, be sure to set the variables in the Dockerfile to the proper values for your use case as
outlined in the readme.
CUDA example
While the
gpu
container on its own provides CUDA compiled application support, the
cuda
container shows an example of installing and running the
CUDA Toolkit. The
gpu
container is required to be running before starting the
cuda
container. (This is accomplished by using the Docker
depends_on
option in the docker-compose file.
Note that although the NVIDIA driver version in the cuda
container needs to exactly match the version in the gpu
container, the base image or OS does not need to match. (The gpu
container runs Debian Bullseye while the cuda
container runs Ubuntu Bionic at time of writing.)
Application example
The app
container is another example of how the downloading and installing of the NVIDIA kernel modules can be separated from an application that requires GPU access. The gpu
container must be running before starting the cuda
container. Although we use NVIDIA drivers from a different source than in the gpu
container, they must still match in version exactly.
The “application” we install in this example is a simple Python script using PyTorch to confirm that an NVIDIA GPU is present and available to the container.
NVIDIA container
In our final example
nv-pytorch
, we start with a container pulled from the
NVIDIA NGC Catalog. We then install the same version of the NVIDIA driver as used in the
gpu
container which must be running before this container starts. Our container will have GPU access without using the Container Toolkit.
This example uses the
NVIDIA PyTorch container and then runs the same Python script as the
app
container to show that the container is indeed accessing a CUDA-enabled GPU.
Final thoughts
The examples provided in this repository are not optimized for size and are huge! In most production cases, you’ll want to take advantage of
multi-stage builds to reduce container size and bandwidth usage. For example, you could eliminate the samples provided in the CUDA Toolkit to save space.
Also keep in mind that you would likely only need to use the gpu
container alone or in conjunction with one of the other containers (adjusted for your use case.) This project is not meant to be deployed as-is (although you could!) but rather as a few different examples of how you can gain GPU access for your containers. (For instance the app
and nv-pytorch
containers accomplish the same result via different methods.)
Feedback
This is just one of many possible implementations of CUDA/NVIDIA GPUs in a container on balenaOS (or Docker). The method outlined here has the advantage of being distribution-agnostic and provides you with the latest drivers. Let us know in the comments or forums if you have had success or issues with the examples presented here. Do you have suggestions, improvements or alternate methods? Let us know in the comments below.