This repository provides a ROS 2-based resource monitoring solution that leverages Telegraf to collect system metrics and publish them as ROS messages, with the possibility of also plugging into ROS2 diagnostics. It is designed to be easily configurable and extensible, allowing users to monitor various system resources such as CPU, memory, disk usage, and more.
Monitoring system resources is important for maintaining the health and determining performance of robotic systems. There does not seem to be a well established solution to do this in ROS 2, with these the current ones that can be found easily online:
- https://github.com/AgoraRobotics/ros2-system-monitor
- https://github.com/kei1107/ros2-system-monitor
- https://github.com/ethz-asl/ros-system-monitor
- https://tier4.github.io/autoware.iv/tree/main/system/system_monitor/
This project attempts to fill that gap.
Resource monitoring is not a unique problem to robotics, and there are many existing tools that do this well. A well established tool within the cloud native and DevOps communities is Telegraf. Telegraf is an open-source agent for collecting and reporting metrics. It supports a variety of input plugins to gather data from different sources and output plugins to send data to various destinations. By integrating Telegraf with ROS 2, we do not have to reinvent the wheel of resource monitoring and can leverage its more advanced capabilities, such as aggregators and processors.
Telegraf also present the opportunity to build out remote monitoring capabilities of the same resources over the OTLP protocol, which is a common standard for telemetry data. This can be connect to any opentelemetry collector which can then pass it on to whatever remote monitoring environment you wish.
This repository contains three ROS 2 packages:
telegraf_resource_monitorIntegrates Telegraf with ROS 2 to monitor system resources and publish them as ROS messages.resource_diagnostics_updaterSubscribes to resource topics and updates the ROS 2 diagnostics system with the latest metrics, based on target resources stipulated in a configuration file.resource_monitoring_interfacesCustom message definitions for resource monitoring.
The architecture between the packages is illustrated below:
The package consists of:
- Telegraf Configuration: Custom Telegraf config that outputs metrics to a Unix socket
- Unix Socket Manager: Receives JSON data from Telegraf via Unix socket
- Sensor Message Processor: Processes incoming sensor data and manages publishers
- Sensor Message Publisher: Publishes resource data as ROS 2 messages
The package dynamically creates topics based on the metrics collected by Telegraf. Examples include:
/cpu/cpu0/cpu/cpu1/cpu/cpu2/cpu/cpu3/cpu/cpu_total/disk/root/mem/procstat/telegraf_resource_monitor/sensors/acpitz_acpi_0/temp1/sensors/amdgpu_pci_0400/edge/sensors/amdgpu_pci_0400/slowppt/sensors/amdgpu_pci_0400/vddgfx/sensors/amdgpu_pci_0400/vddnb/sensors/bat1_acpi_0/in0/sensors/iwlwifi_1_virtual_0/temp1/sensors/k10temp_pci_00c3/tctl/sensors/nvme_pci_0100/composite/sensors/nvme_pci_0100/sensor_1
Each topic publishes Resource messages from the resource_monitoring_interfaces package.
Run the following command to launch the Telegraf resource monitor with default settings:
ros2 launch telegraf_resource_monitor telegraf_resource_monitor_launch.pythe following command allows you to specify a custom ROS2 configuration file and set the logging level:
ros2 launch telegraf_resource_monitor telegraf_resource_monitor_launch.py \
config_file_path:=/path/to/your/config.yaml \
log_level:=DEBUGThe package includes a pre-configured Telegraf configuration file at config/telegraf.conf that:
- Collects metrics every 1 second (configurable per input)
- Outputs data to Unix socket
/tmp/telegraf.sock - Includes processors for data cleanup and tagging
- Monitors CPU, memory, disk, sensors, and ROS processes
Look at the influx plugins to find other plugins that can monitor relevant resources for you.
Currently no configuration is needed on the node side, since it will parse the available fields and use its names to generate the topics accordingly.
The package consists of:
- Diagnostics Resource Updater: Subscribes to specific resource topics and updates the ROS 2 diagnostics system based on specified DiagnosedResource defined during initialization.
- Diagnostics Resource Updater Node: Parses a configuration file to determine which resources to monitor and initializes the Diagnostics Resource Updaters accordingly.
- Diagnostics Publisher: Publishes aggregated diagnostics information to the
/diagnosticstopic at a regular interval and is an interface to the diagnostics topic for the updaters.
Run the following command in terminal to launch the diagnostics resource updater with the default configuration file:
ros2 launch resource_diagnostics_updater resource_diagnostics_updater_launch.pyYou can specify a custom configuration file and set the logging level using the following command:
ros2 launch resource_diagnostics_updater resource_diagnostics_updater_launch.py \
config_file_path:=custom_path/resource_diagnostics.yaml \
log_level:=DEBUGThere is a sample configuration file at config/resource_diagnostics.yaml that specifies which resources to monitor and their corresponding diagnostic parameters. You can modify this file to suit your monitoring needs or create your own that you then specify during launch.
The configuration file uses the following format:
/resource_diagnostics_updater_node:
ros__parameters:
diagnosed_resources: |
- topic: <topic name of resource to monitor>
name: <name to show in diagnostics>
field: <field to monitor>
warning_threshold: <value for warning threshold>
error_threshold: <value for error threshold>Defines custom ROS 2 message types for message sent by the telegraf_resource_monitor, including:
Field.msg: Represents a single metric field with name and valueResource.msg: Represents a resource with a header and an array ofFieldmessages
- ROS 2 Humble (or compatible)
- lm-sensors (for temperature monitoring)
# Add InfluxDB repository
curl -s https://repos.influxdata.com/influxdata-archive_compat.key | sudo apt-key add -
echo "deb https://repos.influxdata.com/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/influxdata.list
# Install Telegraf
sudo apt update
sudo apt install telegraf-
Clone the repository into your ROS 2 workspace:
cd ~/ros2_ws/src git clone https://github.com/Bart-van-Ingen/ros-telegraf-monitor.git
-
Install dependencies:
cd ~/ros2_ws rosdep install --from-paths src --ignore-src -r -y
-
Build the package:
colcon build
-
Source the workspace:
source install/setup.bash
Bart van Ingen
Email: van.ingen.bart@gmail.com
- Built on Telegraf by InfluxData
- Uses ROS 2 for distributed messaging
