Category: IoT Edge

Azure IoT Edge on constraint devices

Introduction

In this post I would like to show some tweaks you can (and might need to) apply to influence the behavior of your IoT Edge device, when it comes to message retention on devices that are limited in resources.

The setup of this scenario is not uncommon, as it uses a module to retrieve telemetry from machines, parses them in another module and sends the messages to an IoT Hub.

The problem

After a while the device is not sending data anymore and is not accessible via SSH. The logs reveal lots of message still in the queue.

picture with logfile lines like Cleaned up messages from queue for endpoint iothub and messages from message store
Lots of messages in the queue in edgeHub logs

But why? And how can I find out what causes the problem?
Spoiler: Disk full 🙁

Troubleshoot

Looking at logfiles helps a lot – if you have access to the logfiles. Fortunately IoT Edge can expose data in the Prometheus exposition format for the edgeHub and edgeAgent. These endpoints are enabled by default for IoT Edge 1.0.10 (upgrade to this version if you haven’t) and can be enabled for 1.0.9.

The data can then be uploaded to Log Analytics for further analysis and to create alerts with a sample metrics-collector module.

For analyzation and to display the metrics, you can use a Workbook in Azure Monitor.

Azure Monitor Workbook with edgeHub log extract

In this particular case I could see that the available disk space was going down, down, down until the whole device did not respond anymore (no SSH access possible, no data sent to Azure).

What to change?

Adding more space to the disk was not an option. Other solutions needed to solve the issue. There are 2 options I looked at and adjusted to be a better fit for the usage scenario and resource limitation.

  1. The Time to live setting defines how long messages will be kept on the device: Operate devices offline – Azure IoT Edge | Microsoft Docs (which is set to 2h per default).
  2. The not so obvious Rocks DB size configures the size of the logfiles: https://github.com/Azure/iotedge/issues/2431#issuecomment-582089419

After tweaking the settings, the following graph shows that now the device cleans up data before the disk runs full.

I can not give you values for you particular setup. You’ll need to figure them out for your setup depending on the amount of messages going though the Edge device and hardware sizing. Here are some pointers to settings which you might want to investigate, if you hit a similar problem on your devices:

RocksDB sizes

The above image shows setting for RocksDB (orange: 512MB, blue, 128MB, green 256MB). With the default setting the device is running out of disk space.

What can I do to prevent the device crashing?

Well, it depends 🙂 You can find a setting from the above that will prevent a full disk for a known scenario. But if you don’t know which modules with which setting is deployed?

In this case an alarm for low disk space is an option. It then needs to trigger a function that calls a method on the device to restart the edgeHub. This will clear the cache.

Azure IoT Edge not starting

Sometimes a permission denied is a permission denied 🙁

[INFO] - Starting Azure IoT Edge Security Daemon
[INFO] - Version - 1.0.10~rc1
[INFO] - Using config file: /etc/iotedge/config.yaml
[INFO] - Configuring /var/lib/iotedge as the home directory.
[INFO] - Configuring certificates…
[INFO] - Transparent gateway certificates not found, operating in quick start mode…
[INFO] - Finished configuring provisioning environment variables and certificates.
[INFO] - Initializing hsm…
[INFO] - Finished initializing hsm.
[INFO] - Provisioning edge device…
[INFO] - Starting provisioning edge device via manual mode using a device connection string…
[INFO] - Manually provisioning device "iotedgedevice" in hub "iothub.azure-devices.net"
[INFO] - Finished provisioning edge device.
[INFO] - Initializing the module runtime…
[INFO] - Initializing module runtime…
[INFO] - Using runtime network id azure-iot-edge
[WARN] - Could not initialize module runtime
[WARN] - caused by: Container runtime error
[WARN] - caused by: error trying to connect: Permission denied (os error 13)
[ERR!] - The daemon could not start up successfully: Could not initialize module runtime
[ERR!] - caused by: Could not initialize module runtime
[ERR!] - caused by: Container runtime error
[ERR!] - caused by: error trying to connect: Permission denied (os error 13)

This is the output I got viajournalctl -u iotedge -f on a testinstallation.

For troubleshooting purpose I looked at the https://docs.microsoft.com/en-us/azure/iot-edge/troubleshoot guide. But nothing solved my problem. Then I disabled http and mqtt support as of https://docs.microsoft.com/en-us/azure/iot-edge/production-checklist. Still not starting.

Finally I got it up and running by creating a docker group, adding iotedge to it and changed the group ownership of the /var/run/docker.sock file sudo chown root:docker /var/run/docker.sock

This post is meant to be found via search engines if you (or me again) has the same startup problems.

My context: Ubuntu 20.04 with snap installed docker

VisionAI DevKit won’t deploy a module

Today my VisionAI DevKit was not deploying a module. In the logs (sudo journalctl -u iotedge -f) I could see the deployment was received:

Successfully pulled image machinelearndfd8df7d.azurecr.io/mobilenetimagenet:3
Creating module VisionSampleImagenet…
Could not create module VisionSampleImagenet
caused by: No such image: machinelearndfd8df7d.azurecr.io/mobilenetimagenet:3

Strange. During troubleshooting I started docker images and saw a lot of older images and versions. After deleting a log of them with docker image rm xyz the deployment succeeded and the module started. 🙂

Learning: Clean up the mess…