Comparing performance of Jetson GPUs on balena

Compare real-world performance of various Jetson devices on balena.

The NVIDIA Jetson is a popular IoT device for machine learning on the edge, and balenaOS supports over a dozen different models including the new Orin series. There’s plenty of benchmarks, specifications and marketing material available to ascertain the relative computing power of all the different NVIDIA Jetson devices. You’ll often find charts that compare AI performance in TOPS, GPU cores, CUDA cores, or Tensor Cores. These can be helpful when trying to decide the correct Jetson model for a given project (along with cost, power requirements, etc…) but we thought it might be helpful to provide some real-world performance results from a set of Jetson devices running the same AI software on balenaOS.

The devices

The devices we used for this comparison are:

The software

Of course all of the devices in this comparison are supported by and running balenaOS! We’ve included the OS version in the results grid below, but generally we used the latest version for each device that was available at the time of this test.

We utilized our ML example project of choice, OpenDataCam, as the application software. It’s open source, runs on any Jetson device, and includes a real time frames per second (fps) reading.

Let’s take a quick look at how OpenDataCam works and which factors affect the device’s performance.

OpenDataCam

Known as “an open source tool to quantify the world” OpenDataCam takes a video camera feed as input and performs object detection in real time. The output consists of video with the detected objects outlined and labeled, as well as the detection data which is available via an API. OpenDatacam uses Darknet (an open source neural network framework written in C and CUDA) and YOLO v4 to perform object detections. (YOLO stands for “You Only Look Once”.)

Trade Offs

YOLO provides two pre-trained “weights” files which are trained on a different number of classes (objects). The full file is trained on 9000+ classes while the “tiny” file is only trained on 80 classes and therefore runs faster on less powerful hardware. (However, it detects fewer objects and appears to be less accurate than the full file)

Full weights example

As seen above, with the “full” weights file, we detect the person walking as well as the cars in the background, but it runs at a slower fps. (See chart below)

Tiny weights example

The “tiny” weight file detects the cars in the foreground but not the ones in the background or the pedestrian, however we can achieve 15 – 30 fps on all Jetson devices.

The results

(Higher fps imply better performance)

Device GPU cores OS version YOLO tiny fps YOLO full fps
Jetson Nano 2GB SD 128 2.98.12 17 n/a
Jetson Nano 4GB SD 128 2.98.12 17 2
Jetson TX2 256 3.1.9 29 4
Jetson TX2 NX 256 2.113.14 29 5
Orin Nano 8GB 1024 3.0.8+rev2 30 12
AGX Orin* 2048 3.0.11 30 28

*Using highest power mode (ID 3)

There are some interesting takeaways in these results:
– The 2GB Nano is not capable of running the full YOLO weights file. Apparently the extra 2GB of RAM makes a difference in this regard!
– All of the Jetson devices are capable of running OpenDataCam at a reasonable frame rate if the appropriate weights file is used.
– The minimum board required to run the full weight file at a usable frame rate appears to be the TX2 NX with the Seeed carrier board.

The devices in this comparison were the ones we happened to have available at the time of testing. There’s a bit of a performance gap between the TX2 and Orin Nano that might be well served by the Jetson AGX Xavier. When we are able to test that board, we will add the results here. (If you have one and want to run the test, let us know below!)

In case you’re wondering if running balenaOS instead of the stock Jetson Linux (Ubuntu 20-based) has any effect on performance, we tested that too. The same software actually performed slightly better on balenaOS:

Device/weights BalenaOS Ubuntu 20
AGX Orin/Yolov-full 13 fps 10 fps
AGX Orin/Yolov-tiny 30 fps 30 fps

(with AGX Orin in low power mode)

Final thoughts

Obviously this is not a scientific experiment but merely an informal comparison to gauge the relative performance of some Jetson devices running the same software on balenaOS. Hopefully this can be helpful as one portion of your evaluation process when determining the right board for the job.

Have you performed any similar comparisons? How did your results compare? Let us know in the comments or the forums.