06 September 2023 / Last updated: 06 Sep 2023

Comparing performance of Jetson GPUs on balena

The NVIDIA Jetson is a popular IoT device for machine learning on the edge, and balenaOS supports over a dozen different models including the new Orin series. There’s plenty of benchmarks, specifications and marketing material available to ascertain the relative computing power of all the different NVIDIA Jetson devices. You’ll often find charts that compare AI performance in TOPS, GPU cores, CUDA cores, or Tensor Cores. These can be helpful when trying to decide the correct Jetson model for a given project (along with cost, power requirements, etc…) but we thought it might be helpful to provide some real-world performance results from a set of Jetson devices running the same AI software on balenaOS.
Post header image

The devices

The devices we used for this comparison are:

The software

Of course all of the devices in this comparison are supported by and running balenaOS! We’ve included the OS version in the results grid below, but generally we used the latest version for each device that was available at the time of this test.
We utilized our ML example project of choice, OpenDataCam, as the application software. It’s open source, runs on any Jetson device, and includes a real time frames per second (fps) reading.
Let’s take a quick look at how OpenDataCam works and which factors affect the device’s performance.

OpenDataCam

Known as “an open source tool to quantify the world” OpenDataCam takes a video camera feed as input and performs object detection in real time. The output consists of video with the detected objects outlined and labeled, as well as the detection data which is available via an API. OpenDatacam uses Darknet (an open source neural network framework written in C and CUDA) and YOLO v4 to perform object detections. (YOLO stands for “You Only Look Once”.)

Trade Offs

YOLO provides two pre-trained “weights” files which are trained on a different number of classes (objects). The full file is trained on 9000+ classes while the “tiny” file is only trained on 80 classes and therefore runs faster on less powerful hardware. (However, it detects fewer objects and appears to be less accurate than the full file)
Full weights example
As seen above, with the “full” weights file, we detect the person walking as well as the cars in the background, but it runs at a slower fps. (See chart below)
Tiny weights example
The “tiny” weight file detects the cars in the foreground but not the ones in the background or the pedestrian, however we can achieve 15 - 30 fps on all Jetson devices.

The results

(Higher fps imply better performance)
DeviceGPU coresOS versionYOLO tiny fpsYOLO full fps
Jetson Nano 2GB SD1282.98.1217n/a
Jetson Nano 4GB SD1282.98.12172
Jetson TX22563.1.9294
Jetson TX2 NX2562.113.14295
Orin Nano 8GB10243.0.8+rev23012
AGX Orin*20483.0.113028
*Using highest power mode (ID 3)
There are some interesting takeaways in these results:
  • The 2GB Nano is not capable of running the full YOLO weights file. Apparently the extra 2GB of RAM makes a difference in this regard!
  • All of the Jetson devices are capable of running OpenDataCam at a reasonable frame rate if the appropriate weights file is used.
  • The minimum board required to run the full weight file at a usable frame rate appears to be the TX2 NX with the Seeed carrier board.
The devices in this comparison were the ones we happened to have available at the time of testing. There’s a bit of a performance gap between the TX2 and Orin Nano that might be well served by the Jetson AGX Xavier. When we are able to test that board, we will add the results here. (If you have one and want to run the test, let us know below!)
In case you’re wondering if running balenaOS instead of the stock Jetson Linux (Ubuntu 20-based) has any effect on performance, we tested that too. The same software actually performed slightly better on balenaOS:
Device/weightsBalenaOSUbuntu 20
AGX Orin/Yolov-full13 fps10 fps
AGX Orin/Yolov-tiny30 fps30 fps
(with AGX Orin in low power mode)

Final thoughts

Obviously this is not a scientific experiment but merely an informal comparison to gauge the relative performance of some Jetson devices running the same software on balenaOS. Hopefully this can be helpful as one portion of your evaluation process when determining the right board for the job.
Have you performed any similar comparisons? How did your results compare? Let us know in the comments or the forums.
by Alan BorisHardware Hacker in Residence