Imply event generator

A highly customizable event data generator, created by the team at Imply.

Prerequisites

The data generator requires Python 3.

Create and activate a local virtual environment, then install dependencies:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Quickstart

Run the following example to test the generator script:

python generator.py -c presets/configs/ecommerce.json -t access_combined -m 1 -n 10

This command generates logs in the format of Apache access combined logs. It uses a single worker to generate 10 records, and it outputs the results to the standard output stream, such as the terminal window. Status messages are written to stderr, so stdout contains only data and can be piped directly.

For more examples and test cases, see test.sh.

The presets/ folder contains ready-to-use configs with embedded output templates — use -t to select an output format by name. See presets/README.md for details.

Documentation

Building your own config? Start here:

How to build a config — step-by-step from concept to tested config, with a worked example
Common patterns — variable persistence, multi-record sessions, flow duration
Best practices — naming conventions, the synthetic clock, common pitfalls

Reference — field-level lookup for all config options:

States — all five state types and their fields
Emitters — record output configuration
Field generators — all field generator types
Distributions — uniform, exponential, normal, gmm_temporal
Templates — Jinja2 output templates
Schedules — time-of-day traffic variation
Deterministic output — reproducible generation with --seed

Command-line reference

Run the generator.py script from the command line with Python.

python generator.py \
        -c <generator configuration file> \
        -t <template name> \
        -f <format file> \
        -s <start timestamp> \
        -m <generator workers limit> \
        -n <record limit> \
        -r <duration limit in ISO8610 format> \
        --schedule <schedule file> \
        --debug \
        --seed <integer>

Argument	Description
`-c`	Path to the generator configuration JSON file. See generator configuration reference.
`-t` / `--template`	A named output template embedded in the generator config. See output templates.
`-s`	Use a simulated clock starting at the specified ISO time, rather than using the system clock. This will cause records to be produced instantaneously (batch) rather than with a real clock (real-time).
`-m`	The maximum number of workers to create. Defaults to 100.
`-n`	The number of records to generate. Must not be used in combination with `-r`.
`-r`	The length of time to create records for, expressed in ISO8601 format. Must not be used in combination with `-n`.
`--schedule`	A JSON file that modulates the number of active workers over time, producing time-of-day traffic variation. See the schedule documentation for available schedules and how to write your own.
`--debug`	Enable debug logging. Outputs detailed thread scheduling and event queue information to stderr.
`--seed`	An integer seed for deterministic data generation. Use with `-s` for fully reproducible output.

Generator configuration

The generator configuration is a JSON document passed via -c. It contains two top-level arrays:

{
  "states": [ ... ],
  "emitters": [ ... ]
}

A list of states that each worker traverses. The first state controls interarrival pacing; subsequent states set variables, emit records, route between paths, and terminate.
A list of emitters that define output record shape. Each dimension uses a field generator to produce values, controlled by distributions.

Each concurrent worker (-m) runs one independent Actor — one lifecycle from the initial event:start:timer to event:end. For the full design process, see how to build a config.

Output format

Configs that include a templates block (such as those in presets/configs/) support named output templates selected with --template. Templates use Jinja2 and can produce JSON, CSV, NCSA combined logs, and more from a single config. See the output templates reference.

Generation limits

Use -n to stop after a number of records, or -r to stop after a duration (ISO 8601). If neither is set, the generator runs indefinitely.

# 1000 records
python generator.py -c presets/configs/ecommerce.json -t apache:access:json -n 1000

# One hour of data
python generator.py -c presets/configs/ecommerce.json -t apache:access:json -r PT1H

Simulated time

By default, timestamps reflect the real system clock. Use -s to start a synthetic clock at a fixed point in time — records are produced instantly rather than in real time, which is recommended for generating large volumes of historical data.

# 1000 records starting 1 Jan 2025
python generator.py -c presets/configs/ecommerce.json -t apache:access:json -n 1000 -s "2025-01-01T00:00"

# One hour of data starting 1 Jan 2025
python generator.py -c presets/configs/ecommerce.json -t apache:access:json -r PT1H -s "2025-01-01T00:00"

Using the output

The generator always writes to stdout. Pipe it to whatever destination you need.

stdout

The default — useful for inspection or piping to other tools:

python generator.py -c presets/configs/ecommerce.json -t apache:access:json -n 100

File

Redirect stdout to a file:

python generator.py -c presets/configs/ecommerce.json -t apache:access:json -n 1000 > events.json

Apache Kafka

Pipe to kcat:

python generator.py -c presets/configs/ecommerce.json -t apache:access:json \
  | kcat -b localhost:9092 -t my-topic

Confluent Cloud

Use kcat with SASL authentication:

python generator.py -c presets/configs/ecommerce.json -t apache:access:json \
  | kcat -b pkc-example.us-east-1.aws.confluent.cloud:9092 \
         -X security.protocol=SASL_SSL \
         -X sasl.mechanisms=PLAIN \
         -X sasl.username="$CONFLUENT_API_KEY" \
         -X sasl.password="$CONFLUENT_API_SECRET" \
         -t my-topic

Splunk HEC

When the endpoint is able to apply metadata (e.g. sourcetype, index, and host), pipe to services/collector/raw:

python generator.py -c presets/configs/ecommerce.json -t access_combined \
  | curl -s -X POST https://hec.example.com/services/collector/raw \
         -H "Authorization: Splunk $HEC_TOKEN" \
         --data-binary @-

For full control over metadata, use a pipeline tool that wraps each event in a HEC envelope — an OTel Collector with a Splunk HEC exporter, or Cribl or Vector.

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
docs		docs
ieg		ieg
presets		presets
tools		tools
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
generator.py		generator.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imply event generator

Prerequisites

Quickstart

Documentation

Command-line reference

Generator configuration

Output format

Generation limits

Simulated time

Using the output

stdout

File

Apache Kafka

Confluent Cloud

Splunk HEC

About

Uh oh!

Releases 2

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Imply event generator

Prerequisites

Quickstart

Documentation

Command-line reference

Generator configuration

Output format

Generation limits

Simulated time

Using the output

stdout

File

Apache Kafka

Confluent Cloud

Splunk HEC

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Contributors

Uh oh!

Languages