21 December 2021 / Last updated: 21 Dec 2021

How to use an NVIDIA GPU on an x86 device with balenaOS

Execution time: 30mins - 1hr
Difficulty: Low
Cost: Low
Some projects, especially ones utilizing machine learning require access to the computing power of an attached graphics processing unit (GPU). This is the case for software that depends on CUDA, NVIDIA’s parallel programming model.
Use Nvidia GPU on x86 device with balenaOS
We've demonstrated how to enable CUDA on Jetson-based devices running balenaOS, but in this guide, we'll outline how to access an NVIDIA GPU on an x86 (PC) device to run CUDA or any other software that requires GPU access.

Alternative needed

On a full Linux OS such as Ubuntu, you could simply install the NVIDIA Container Toolkit alongside Docker and run one of NVIDIA's base images with the --gpus flag and have container access to the GPU, assuming the NVIDIA drivers have already been installed.
In a minimal OS designed for edge devices such as balenaOS, this setup is simply not an option for a few reasons: The OS is read-only, so you can't install the Container Toolkit. Even if you could, the --gpus flag, which exposes the GPU drivers on the host to Docker (in our case balenaEngine) would not find any GPU drivers. That's because they are too large to include in balenaOS and are not open source. Also you can't install them because remember, the OS is read-only.
You may see some references to the balena docker-compose label io.balena.features.gpu which was explored for future use but not functional as of this writing.

A solution

In order to get the NVIDIA drivers we need onto the host OS, we'll build them as kernel modules in our container, and then load those modules from inside the container after it starts. Since kernel modules can be loaded and unloaded into the kernel upon demand, this gets around our read-only host OS situation.

Out of tree

Some Linux modules can be compiled with the kernel because their source code lives in the kernel source tree. These modules are called "in-tree." The NVIDIA driver module we will be building is considered "out-of-tree" because the source code is located elsewhere. (NVIDIA has so far decided not to open source their driver, so they maintain it out-of-tree.) We will need to download the kernel source code for our OS, extract the headers, and use them to build our driver module via files provided from NVIDIA.


In order to build and load our module from within the container that requires GPU access, we’ll need to perform a few actions that run counter to the Docker notions of container portability and security. They are mentioned below so you can decide for yourself if they are acceptable for your project or implementation.
The kernel module we will build for the GPU in the container will have to match the host's kernel exactly. If you update the host OS and it uses a different kernel version, you'll need to rebuild the module. (For this to happen, you'll need to adjust the container's Dockerfile and re-push it.) This limits container portability, where a typical goal is for a container to be able to run independently of its host system.
In order to load a kernel module from our container into the host, we will also need to grant that container elevated privileges. In our examples they will run with privileged access. Also note that kernel modules can bypass filesystem permissions and other security, so use with caution.

Getting started

You can find the example repository here. The repository’s readme file contains all of the necessary details for deploying a few different examples which we’ll outline below.

Minimum example

For a minimal example, simply run the gpu container from the docker-compose file and remove everything else from the file. The gpu container downloads the kernel source file for our exact OS version, builds the NVIDIA modules, and loads them into the host OS. This container then provides CUDA compiled application support (for CUDA development support see the cuda container below).
You could use the gpu image as-is, or as a base image for adding your own apps or software on top of it. In addition, you can add the gpu container to an existing project to provide GPU access to any other container. (That’s how the rest of the examples in this repo work.) The only requirement is that the containers requiring GPU access must have their own copy of the NVIDIA drivers installed. The driver version must exactly match the version in the gpu container.
In the cuda container example below, we use the same driver downloaded from NVIDIA as in the gpu container. In the app container, we download and use the same version of the NVIDIA driver from Ubuntu’s official package repository.
Before you deploy the gpu container, be sure to set the variables in the Dockerfile to the proper values for your use case as outlined in the readme.

CUDA example

While the gpu container on its own provides CUDA compiled application support, the cuda container shows an example of installing and running the CUDA Toolkit. The gpu container is required to be running before starting the cuda container. (This is accomplished by using the Docker depends_on option in the docker-compose file.
Note that although the NVIDIA driver version in the cuda container needs to exactly match the version in the gpu container, the base image or OS does not need to match. (The gpu container runs Debian Bullseye while the cuda container runs Ubuntu Bionic at time of writing.)

Application example

The app container is another example of how the downloading and installing of the NVIDIA kernel modules can be separated from an application that requires GPU access. The gpu container must be running before starting the cuda container. Although we use NVIDIA drivers from a different source than in the gpu container, they must still match in version exactly.
The “application” we install in this example is a simple Python script using PyTorch to confirm that an NVIDIA GPU is present and available to the container.

NVIDIA container

In our final example nv-pytorch, we start with a container pulled from the NVIDIA NGC Catalog. We then install the same version of the NVIDIA driver as used in the gpu container which must be running before this container starts. Our container will have GPU access without using the Container Toolkit.
This example uses the NVIDIA PyTorch container and then runs the same Python script as the app container to show that the container is indeed accessing a CUDA-enabled GPU.

Final thoughts

The examples provided in this repository are not optimized for size and are huge! In most production cases, you’ll want to take advantage of multi-stage builds to reduce container size and bandwidth usage. For example, you could eliminate the samples provided in the CUDA Toolkit to save space.
Also keep in mind that you would likely only need to use the gpu container alone or in conjunction with one of the other containers (adjusted for your use case.) This project is not meant to be deployed as-is (although you could!) but rather as a few different examples of how you can gain GPU access for your containers. (For instance the app and nv-pytorch containers accomplish the same result via different methods.)


This is just one of many possible implementations of CUDA/NVIDIA GPUs in a container on balenaOS (or Docker). The method outlined here has the advantage of being distribution-agnostic and provides you with the latest drivers. Let us know in the comments or forums if you have had success or issues with the examples presented here. Do you have suggestions, improvements or alternate methods? Let us know in the comments below.
by Alan BorisHardware Hacker in Residence

Share this post