Some projects, especially ones utilizing machine learning require access to the computing power of an attached graphics processing unit (GPU). This is the case for software that depends on CUDA, NVIDIA’s parallel programming model.
On a full Linux OS such as Ubuntu, you could simply install the NVIDIA Container Toolkit alongside Docker and run one of NVIDIA's base images with the
--gpus flag and have container access to the GPU, assuming the NVIDIA drivers have already been installed.
In a minimal OS designed for edge devices such as balenaOS, this setup is simply not an option for a few reasons: The OS is read-only, so you can't install the Container Toolkit. Even if you could, the
--gpus flag, which exposes the GPU drivers on the host to Docker (in our case balenaEngine) would not find any GPU drivers. That's because they are too large to include in balenaOS and are not open source. Also you can't install them because remember, the OS is read-only.
You may see some references to the balena docker-compose label
io.balena.features.gpu which was explored for future use but not functional as of this writing.
In order to get the NVIDIA drivers we need onto the host OS, we'll build them as kernel modules in our container, and then load those modules from inside the container after it starts. Since kernel modules can be loaded and unloaded into the kernel upon demand, this gets around our read-only host OS situation.
Out of tree
Some Linux modules can be compiled with the kernel because their source code lives in the kernel source tree. These modules are called "in-tree." The NVIDIA driver module we will be building is considered "out-of-tree" because the source code is located elsewhere. (NVIDIA has so far decided not to open source their driver, so they maintain it out-of-tree.) We will need to download the kernel source code for our OS, extract the headers, and use them to build our driver module via files provided from NVIDIA.
In order to build and load our module from within the container that requires GPU access, we’ll need to perform a few actions that run counter to the Docker notions of container portability and security. They are mentioned below so you can decide for yourself if they are acceptable for your project or implementation.
The kernel module we will build for the GPU in the container will have to match the host's kernel exactly. If you update the host OS and it uses a different kernel version, you'll need to rebuild the module. (For this to happen, you'll need to adjust the container's Dockerfile and re-push it.) This limits container portability, where a typical goal is for a container to be able to run independently of its host system.
In order to load a kernel module from our container into the host, we will also need to grant that container elevated privileges. In our examples they will run with privileged access
. Also note that kernel modules can bypass filesystem permissions and other security, so use with caution.
You can find the example repository here
. The repository’s readme file contains all of the necessary details for deploying a few different examples which we’ll outline below.
For a minimal example, simply run the
gpu container from the docker-compose file and remove everything else from the file. The
gpu container downloads the kernel source file for our exact OS version, builds the NVIDIA modules, and loads them into the host OS. This container then provides CUDA compiled application support (for CUDA development support see the
cuda container below).
You could use the
gpu image as-is, or as a base image for adding your own apps or software on top of it. In addition, you can add the
gpu container to an existing project to provide GPU access to any other container. (That’s how the rest of the examples in this repo work.) The only requirement is that the containers requiring GPU access must have their own copy of the NVIDIA drivers installed. The driver version must exactly match the version in the
cuda container example below, we use the same driver downloaded from NVIDIA as in the
gpu container. In the
app container, we download and use the same version of the NVIDIA driver from Ubuntu’s official package repository.
Before you deploy the
container, be sure to set the variables in the Dockerfile to the proper values for your use case as outlined in the readme
container on its own provides CUDA compiled application support, the
container shows an example of installing and running the CUDA Toolkit
container is required to be running before starting the
container. (This is accomplished by using the Docker
option in the docker-compose file.
Note that although the NVIDIA driver version in the
cuda container needs to exactly match the version in the
gpu container, the base image or OS does not need to match. (The
gpu container runs Debian Bullseye while the
cuda container runs Ubuntu Bionic at time of writing.)
app container is another example of how the downloading and installing of the NVIDIA kernel modules can be separated from an application that requires GPU access. The
gpu container must be running before starting the
cuda container. Although we use NVIDIA drivers from a different source than in the
gpu container, they must still match in version exactly.
The “application” we install in this example is a simple Python script using PyTorch to confirm that an NVIDIA GPU is present and available to the container.
In our final example
, we start with a container pulled from the NVIDIA NGC Catalog
. We then install the same version of the NVIDIA driver as used in the
container which must be running before this container starts. Our container will have GPU access without using the Container Toolkit.
This example uses the NVIDIA PyTorch container
and then runs the same Python script as the
container to show that the container is indeed accessing a CUDA-enabled GPU.
The examples provided in this repository are not optimized for size and are huge! In most production cases, you’ll want to take advantage of multi-stage
builds to reduce container size and bandwidth usage. For example, you could eliminate the samples provided in the CUDA Toolkit to save space.
Also keep in mind that you would likely only need to use the
gpu container alone or in conjunction with one of the other containers (adjusted for your use case.) This project is not meant to be deployed as-is (although you could!) but rather as a few different examples of how you can gain GPU access for your containers. (For instance the
nv-pytorch containers accomplish the same result via different methods.)
This is just one of many possible implementations of CUDA/NVIDIA GPUs in a container on balenaOS (or Docker). The method outlined here has the advantage of being distribution-agnostic and provides you with the latest drivers. Let us know in the comments or forums if you have had success or issues with the examples presented here. Do you have suggestions, improvements or alternate methods? Let us know in the comments below.