There was a time when IoT devices were merely low-powered boards capable of doing a single chore, albeit in far-flung locales. Today, boards like the balenaFin, Raspberry Pi 3 and 4, and NVIDIA are powerful multi-core microcomputers with plenty of horsepower to spare. How do you decide which is best for your application?
The capacity question
Figuring out the type of device you need is often about how many resources your applications consume, but it can be difficult to measure the performance of small, often remote devices. But by applying some of the same tools you use to monitor blades in your data center, you can get a good look at just how much CPU, RAM and other resources your applications consume. Armed with that information, you can make better-informed decisions on purchasing the devices you need.
In this blog post, I’ll show you how to use
balenaCloud to quickly deploy Prometheus and Grafana, tools that combine to give you excellent graphical dashboards for monitoring your hardware in real time. You can then add a small Node Exporter service to any balenaCloud application you have to monitor its performance and resource-consumption in real time.
I created a containerized balenaCloud application to host
Prometheus and Grafana, and then added Node Exporter to scrape CPU, RAM, storage (and other) data. In this way, I could see just how much -- or how little -- my applications were consuming.
Install Prometheus and Grafana
This balenaCloud application can be run on any
Prometheus-supported architecture, so you can deploy it on the balenaFin, Raspberry Pis, Intel NUCs and even QEMU virtual machines. Install these monitoring services alongside your primary application to see how it performs.
Start by cloning this repo:
git clone gi[email protected]:balenalabs-incubator/balena-prometheus-grafana.git
The file tree looks like this:
/balena-prometheus-grafana/
├── grafana
│ ├── Dockerfile.template
│ └── conf
│ └── dashboard.yml
│ └── dashboard_node_exporter.json
│ └── datasources.yml
├── node_exporter
│ └── Dockerfile.template
├── prometheus
│ ├── Dockerfile.template
│ └── prometheus.yml
│ └── shart.sh
├── docker-compose.yml
├── README.md
In this example, Prometheus, Grafana and Node Exporter services are all added to the primary application. The docker-compose.yml file shows that you’ll build three services, and persistent volumes are added to the Grafana service to save data (and configuration settings) across reboots:
version: '2.0'
services:
prometheus:
build: ./prometheus
ports:
- '9090:9090'
- '9100:9100'
networks:
- frontend
- backend
depends_on:
- node_exporter
container_name: prometheus
grafana:
build: ./grafana
ports:
- '3000:3000'
networks:
- frontend
- backend
volumes:
- grafana_etc:/etc/grafana
- grafana_usr:/usr/share/grafana
- grafana_var:/var/lib/grafana
container_name: grafana
node_exporter:
build: ./node_exporter
networks:
- backend
container_name: node_exporter
networks:
frontend:
backend:
volumes:
grafana_etc:
grafana_usr:
grafana_var:
Edit the ./prometheus/Dockerfile.template ARG ARCH=”armv7” to match the platform on which you’re deploying this application. For example, this might be set to amd64 for a NUC device.
Note that two networks are used. The frontend network enables external connections to ports 3000, 9090 and 9100 on the Prometheus service to enable requests to and from Grafana (3000), Prometheus (9090) and Node Exporter (9100). The backend network enables the local name resolution of node_exporter. However, all networks could be set as network_mode: host for simplicity, and on other devices that aren't locally resolvable by the Prometheus/Grafana node.
During the Prometheus build, a minimal configuration file is added as prometheus.yml. It creates reference points for Prometheus itself and creates a node_exporter job that allows the application to monitor itself.
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_timeout: 10s
scrape_configs:
- job_name: prometheus
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- prometheus:9090
- job_name: node_exporter
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets: ['node_exporter:9100']
The final “-targets:” entry uses the local service name and port. To add additional devices for Prometheus to aggregate, use the balenaCloud Dashboard to create a Device Environment Variable called TARGETS to hold the IP addresses and port numbers of any future nodes you want to monitor. In this way, you never have to manually edit the prometheus.yml config file. Setting TARGET values will automatically update the “-targets:” entry, appending new nodes like the following:
…
targets: [‘node_exporter:9100’, ‘10.128.1.134:9100’, ‘192.168.1.211:9100’]
Deploy the app:
cd balena-prometheus-grafana
balena push <appname>
Once the services are up and running, you can view the Prometheus Targets dashboard at http://prometheus-app-IP:9090 to confirm the connections are active:
Now, log in to the Grafana dashboard at http://grafana-device-IP:3000 to start viewing your graphical data. The Prometheus datasource and Node Exporter dashboard were created automatically during the Grafana build, so there’s no need to manually add any initial resources. You can look at those files in:
./balena-prometheus-grafana
├── grafana
│ ├── Dockerfile.template
│ └── conf
│ └── datasources.yml
│ └── dashboard.yml
│ └── dashboard-node-exporter.json
The Prometheus datasource (datasources.yml) is configured to point to your Prometheus service, https://prometheus:9090, using the defined service name, which is locally resolvable by the Grafana service because they’re running on the same device.
Similarly, the Node Exporter dashboard is preconfigured and available as soon as you log in to the Grafana dashboard. It uses the stock JSON from the Node Exporter dashboard,
https://grafana.com/grafana/dashboards/11074. It points to the already-provisioned
Prometheus datasource.
To view your real-time application performance data, go to the main Grafana dashboard, click the Dashboards menu item, then on “Node Exporter.” You’ll see live data coming from this single device. You can view all devices at once, just one or any you manually select from the Host menu.
Add Node Exporter to other devices
With Prometheus and Grafana running, you can monitor more devices by adding the following to any other balenaCloud application. Just edit the docker-compose.yml file to add the Node Exporter bit:
node_exporter:
build: ./node_exporter
ports:
- '9100:9100'
network_mode: host
container_name: node_exporter
Now copy the ./node_exporter folder from this application to that application. It contains a Dockerfile.template that looks like this:
FROM balenalib/%%BALENA_MACHINE_NAME%%-alpine
WORKDIR /app
RUN echo "https://dl-cdn.alpinelinux.org/alpine/edge/testing/" >> /etc/apk/repositories && \
apk add --no-cache prometheus-node-exporter
Expose the port Prometheus uses (optional for if using network_mode: host)
EXPOSE 9100
CMD [ "/usr/bin/node_exporter" ]
Finally, be sure to add a new entry in the balenaCloud Device Variables for the application running Prometheus and Grafana, as described above. Spaces are optional.
So, what’s the load look like?
Node Exporter’s requirements are low -- the containerized service running on Alpine is less than 75MB -- and you’ll immediately get a sense of the resources your balenaCloud application is consuming. This will give you hard data that you can use to more confidently determine the type of boards to purchase as you scale out, or turn what you always thought were single-use IoT devices into
multi-use ones.
Even with a couple different apps running, you’ll notice your devices consume different levels of resources. Those resources may even vary by architecture, so this is a good way to run tests and see performance in real-time.
If you don’t have an existing application in mind, you can combine Node Exporter with an OctaPi service node to see how your board handles the load. I create a containerized implementation of OctaPi, a Python compute clustering application originally designed for standalone Raspberry Pis, to add some load. This enabled me to use extra compute cycles on any device in my fleet for OctaPi number-crunching.
I converted the OctaPi project to run as balenaCloud application service, which means any device -- not just a Raspberry Pi -- can run a
balena-octapi-worker service alongside whatever else you have running. Rather than committing an entire RPi to the work, you can quickly deploy the same data-crunching functionality in a container and take advantage of all the CPU and RAM the board has to offer. This is a quick way to see how your device performs.
Deploy OctaPi
Start by cloning the _balena_octapi_master repo:
git clone [email protected]:balenalabs-incubator/balena-octapi-master.git
The master works by SSH-ing into each OctaPi worker node, so you need to seed the master and workers with SSH keys. You can reuse the provided examples or generate your own by running ssh-keygen (without a password) and adding id_rsa.pub and the key’s contents to authorized_keys on the workers. The .ssh/config file disables StrictHostKeyChecking so initial logins by the master to the workers are non-interactive.
Before pushing this to a balenaCloud app, update the ARG SUBNET in the ./balena-octapi-master/Dockerfile.template to match the subnet on which your devices are attached. The example uses “10.128.1.*”. Don’t forget the asterisk.
Deploy the app:
cd ./balena-octapi-master
balena push <appname>
Now, add the balena-octapi-worker app to another device application. No edits are necessary. The Master node will scan your subnet for workers and automatically make them available:
git clone [email protected]:balenalabs-incubator/balena-octapi-worker.git
cd balena-octapi-worker
balena push <appname>
The default OctaPi Python scripts (computer.py, etc.) are placed in the /usr/src/app working directory on the Master and can be run from there with commands like the one below. No commands are run directly on Worker nodes. Just access the octapi-master service shell using balena ssh or directly from the balenaCloud dashboard.
python3 ./compute.py
Power to spare
Even with a simple calculation of pi (creating thousands of data points for analysis), I found balenaFin, Raspberry Pi and QEMU did a nice job of consuming device resources. This follow-up Grafana dashboard screenshot shows one device under load in the midst of a Python dispy number-crunch.
Give it a try to learn more
Deciding which IoT devices are right for your task can be harrowing, but this monitoring example should give you solid data to help you decide the best IoT and edge devices for your applications.