Skip to content

implydata/imply-eventgenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

218 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Imply event generator

A highly customizable event data generator, created by the team at Imply.

Prerequisites

The data generator requires Python 3.

Create and activate a local virtual environment, then install dependencies:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Quickstart

Run the following example to test the generator script:

python generator.py -c presets/configs/ecommerce.json -t access_combined -m 1 -n 10

This command generates logs in the format of Apache access combined logs. It uses a single worker to generate 10 records, and it outputs the results to the standard output stream, such as the terminal window. Status messages are written to stderr, so stdout contains only data and can be piped directly.

For more examples and test cases, see test.sh.

The presets/ folder contains ready-to-use configs with embedded output templates — use -t to select an output format by name. See presets/README.md for details.

Documentation

Building your own config? Start here:

Reference — field-level lookup for all config options:

Command-line reference

Run the generator.py script from the command line with Python.

python generator.py \
        -c <generator configuration file> \
        -t <template name> \
        -f <format file> \
        -s <start timestamp> \
        -m <generator workers limit> \
        -n <record limit> \
        -r <duration limit in ISO8610 format> \
        --schedule <schedule file> \
        --debug \
        --seed <integer>
Argument Description
-c Path to the generator configuration JSON file. See generator configuration reference.
-t / --template A named output template embedded in the generator config. See output templates.
-s Use a simulated clock starting at the specified ISO time, rather than using the system clock. This will cause records to be produced instantaneously (batch) rather than with a real clock (real-time).
-m The maximum number of workers to create. Defaults to 100.
-n The number of records to generate. Must not be used in combination with -r.
-r The length of time to create records for, expressed in ISO8601 format. Must not be used in combination with -n.
--schedule A JSON file that modulates the number of active workers over time, producing time-of-day traffic variation. See the schedule documentation for available schedules and how to write your own.
--debug Enable debug logging. Outputs detailed thread scheduling and event queue information to stderr.
--seed An integer seed for deterministic data generation. Use with -s for fully reproducible output.

Generator configuration

The generator configuration is a JSON document passed via -c. It contains two top-level arrays:

{
  "states": [ ... ],
  "emitters": [ ... ]
}
  • A list of states that each worker traverses. The first state controls interarrival pacing; subsequent states set variables, emit records, route between paths, and terminate.
  • A list of emitters that define output record shape. Each dimension uses a field generator to produce values, controlled by distributions.

Each concurrent worker (-m) runs one independent Actor — one lifecycle from the initial event:start:timer to event:end. For the full design process, see how to build a config.

Output format

Configs that include a templates block (such as those in presets/configs/) support named output templates selected with --template. Templates use Jinja2 and can produce JSON, CSV, NCSA combined logs, and more from a single config. See the output templates reference.

Generation limits

Use -n to stop after a number of records, or -r to stop after a duration (ISO 8601). If neither is set, the generator runs indefinitely.

# 1000 records
python generator.py -c presets/configs/ecommerce.json -t apache:access:json -n 1000

# One hour of data
python generator.py -c presets/configs/ecommerce.json -t apache:access:json -r PT1H

Simulated time

By default, timestamps reflect the real system clock. Use -s to start a synthetic clock at a fixed point in time — records are produced instantly rather than in real time, which is recommended for generating large volumes of historical data.

# 1000 records starting 1 Jan 2025
python generator.py -c presets/configs/ecommerce.json -t apache:access:json -n 1000 -s "2025-01-01T00:00"

# One hour of data starting 1 Jan 2025
python generator.py -c presets/configs/ecommerce.json -t apache:access:json -r PT1H -s "2025-01-01T00:00"

Using the output

The generator always writes to stdout. Pipe it to whatever destination you need.

stdout

The default — useful for inspection or piping to other tools:

python generator.py -c presets/configs/ecommerce.json -t apache:access:json -n 100

File

Redirect stdout to a file:

python generator.py -c presets/configs/ecommerce.json -t apache:access:json -n 1000 > events.json

Apache Kafka

Pipe to kcat:

python generator.py -c presets/configs/ecommerce.json -t apache:access:json \
  | kcat -b localhost:9092 -t my-topic

Confluent Cloud

Use kcat with SASL authentication:

python generator.py -c presets/configs/ecommerce.json -t apache:access:json \
  | kcat -b pkc-example.us-east-1.aws.confluent.cloud:9092 \
         -X security.protocol=SASL_SSL \
         -X sasl.mechanisms=PLAIN \
         -X sasl.username="$CONFLUENT_API_KEY" \
         -X sasl.password="$CONFLUENT_API_SECRET" \
         -t my-topic

Splunk HEC

When the endpoint is able to apply metadata (e.g. sourcetype, index, and host), pipe to services/collector/raw:

python generator.py -c presets/configs/ecommerce.json -t access_combined \
  | curl -s -X POST https://hec.example.com/services/collector/raw \
         -H "Authorization: Splunk $HEC_TOKEN" \
         --data-binary @-

For full control over metadata, use a pipeline tool that wraps each event in a HEC envelope — an OTel Collector with a Splunk HEC exporter, or Cribl or Vector.

About

A Python module for generating synthetic event data

Resources

License

Stars

Watchers

Forks

Contributors

Languages