GitHub - MarxKrontalPartner/sensor-data-simulate

Sensor Data Simulator

This project simulates multi-channel sensor data and stores it as Parquet files for later analysis.

Each generated file contains multiple rows of sensor readings, where each row includes:

Timestamp: UTC timestamp of the reading
5 Channels: channel_0 through channel_4 containing random float values

Configuration

The simulator is controlled by the following parameters:

--frequency-hertz (default: 1): Sampling frequency in Hertz (positive integer)
--sending-rate-seconds (default: 30): Sending rate in seconds (positive integer)
--n-files (default: None): Number of files to generate (None = infinite)
--directory (default: ./simulated_data): Output directory for Parquet files
--realtime/--no-realtime (default: realtime): Enable or disable real-time simulation with delays

The number of entries per file is derived automatically as:

messages_per_file = frequency_hertz * sending_rate_seconds

By default, the simulator generates files with 30 messages at 1 Hz, which equals a 30-second collection window.

Docker Usage (recommended)

Build the image from the repository root:

docker build -t sensor-data-simulate .

Show CLI help via Docker:

docker run --rm sensor-data-simulate --help

Run a finite simulation, storing sensor data in a host directory.

Windows (PowerShell)

mkdir simulated_data
docker run --rm `
	-v "${PWD}/simulated_data:/app/simulated_data" `
	sensor-data-simulate `
	--n-files 10 `
	--frequency-hertz 2

Linux / macOS (bash/zsh)

mkdir -p simulated_data
docker run --rm \
	-v "$(pwd)/simulated_data:/app/simulated_data" \
	sensor-data-simulate \
	--n-files 10 \
	--frequency-hertz 2

Run an infinite, real-time simulation and stop it with Ctrl+C or a SIGTERM (same volume mapping applies, only the arguments change), e.g. on Linux/macOS:

docker run --rm \
	-v "$(pwd)/simulated_data:/app/simulated_data" \
	sensor-data-simulate --realtime

The container's entrypoint is sensor-data-simulate, so any additional arguments after the image name are passed directly to the simulator.

Local Installation

This project uses Poetry.

Make sure Poetry is installed.
From the repository root, install dependencies:
```
poetry install
```
Activate the environment when running commands:
```
poetry run sensor-data-simulate --help
```

If you prefer not to use Poetry, you can instead create a virtual environment of your choice and install the project with pip from the repository root (for example on Windows PowerShell):

python -m venv .venv
.venv\\Scripts\\Activate.ps1
pip install .
sensor-data-simulate --help

Command Line Usage

The CLI is exposed via the sensor-data-simulate script (configured in [tool.poetry.scripts] in pyproject.toml).

Basic help:

poetry run sensor-data-simulate --help

Arguments

--directory PATH
- Directory where simulated Parquet files are written.
- Default: ./simulated_data (created if it does not exist).
--frequency-hertz INT
- Sampling frequency of the generated sensor readings in Hz.
- Default: 1.
- Must be a positive integer.
--sending-rate-seconds INT
- Time interval between file writes in seconds.
- Default: 30.
- Must be a positive integer.
messages-per-file (derived)
- Computed as frequency-hertz * sending-rate-seconds.
- Example: 2 Hz * 30 s = 60 entries per file.
--n-files INT
- Total number of files to generate.
- Default: None → run indefinitely until interrupted with Ctrl+C.
--realtime / --no-realtime
- Default behavior is realtime mode.
- Use --no-realtime to write files as fast as possible.
--log-level LEVEL
- Logging verbosity: DEBUG, INFO, WARNING, ERROR.
- Default: INFO.

Examples

Generate 100 files with default settings (30 messages at 1 Hz each):

poetry run sensor-data-simulate --n-files 100

Generate 50 files with 2 Hz sampling frequency and 60 messages per file:

poetry run sensor-data-simulate --n-files 50 --frequency-hertz 2

Generate files indefinitely with real-time delays (30 seconds between files):

poetry run sensor-data-simulate

Generate data to a custom directory:

poetry run sensor-data-simulate --directory data/sensor_logs --n-files 10

Stop an infinite simulation with Ctrl+C.

Output Format

Each simulation run generates a Parquet file in the target directory. Filenames follow the pattern:

sensor_data_{YYYYMMDD}T{HHMMSS}_{uuid}.parquet

Each file contains multiple rows of sensor readings with the following columns:

Timestamp – pandas.Timestamp (UTC) of the reading.
channel_0 – float, random sensor value for channel 0.
channel_1 – float, random sensor value for channel 1.
channel_2 – float, random sensor value for channel 2.
channel_3 – float, random sensor value for channel 3.
channel_4 – float, random sensor value for channel 4.

You can load and analyze the data with pandas:

import pandas as pd

df = pd.read_parquet("simulated_data/sensor_data_20260416T153707_....parquet")
timestamp = df.loc[0, "Timestamp"]
channel_0 = df.loc[0, "channel_0"]

Feature Extraction for Stream Processing

For downstream analytics or stream-processing pipelines, you can summarize each sensor channel into a small set of statistics using extract_channel_statistics from mkp.sensor_data.simulate.features:

from mkp.sensor_data.simulate.features import extract_channel_statistics

features = extract_channel_statistics(df["channel_0"].to_numpy())

The returned dictionary contains mean, standard deviation, minimum and maximum values and is suitable for feeding into streaming pipelines or online monitoring dashboards.

Development

Format/lint checks are configured via Ruff.
Tests are run with pytest:

    poetry run pytest

Adjust or extend the simulator logic in mkp/sensor_data/simulate/simulate.py and the CLI in mkp/sensor_data/simulate/main.py.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
mkp		mkp
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sensor Data Simulator

Configuration

Docker Usage (recommended)

Windows (PowerShell)

Linux / macOS (bash/zsh)

Local Installation

Command Line Usage

Arguments

Examples

Output Format

Feature Extraction for Stream Processing

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sensor Data Simulator

Configuration

Docker Usage (recommended)

Windows (PowerShell)

Linux / macOS (bash/zsh)

Local Installation

Command Line Usage

Arguments

Examples

Output Format

Feature Extraction for Stream Processing

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages