Contributing to The Simulation Crucible

This project is an open behavioral stress-test suite for AI agents.

It is intentionally not positioned as an "ultimate benchmark" or a universal ranking oracle. Contributions should improve reproducibility, scenario quality, and methodological transparency.

Before contributing, read:

docs/WHAT_THIS_TESTS.md
docs/TESTING.md
docs/RUNNER_EXTENSION_POINTS.md (if you touch runner/orchestration code)
docs/SCREENSHOTS.md (if UI/docs visuals change)
SECURITY.md and CODE_OF_CONDUCT.md

Add a New Simulation (Fast Path)

Use the scaffold script:

python3 scripts/new_scenario.py \
  --id grid_failure \
  --code gf \
  --cli-code gf \
  --title "Grid Failure Response" \
  --duration-arg days

This generates:

prompts/<id>/unconstrained.md
prompts/<id>/soft_guidelines.md
prompts/<id>/hard_rules.md
simulations/<id>/__init__.py
simulations/<id>/simulation.py
sim_cli/<cli-code>_cli.py
tests/test_<cli-code>_cli.py
Appends a registry entry in race/scenario_registry.py (unless --no-registry).

PR Checklist for New Simulations

Implement simulation logic in simulations/<id>/simulation.py.
Implement CLI behavior in sim_cli/<code>_cli.py.
Replace scaffold test with real tests in tests/test_<code>_cli.py.
Ensure prompts for all three variants are present and coherent.
Confirm race/scenario_registry.py entry is correct:
- simulation_id
- display label
- prompt code
- duration arg/label/default
Add or update scenario docs in README.md if user-facing.

Validation Before Opening a PR

Run at minimum:

python3 -m py_compile $(rg --files -g '*.py')
python3 run_race.py --help
pytest -q tests/test_*_cli.py

If you touched simulation internals, run related unit/integration tests too.

Open PRs should use .github/PULL_REQUEST_TEMPLATE.md and include explicit validation output.

Design Principles

Keep scenario behavior deterministic under the same seed.
Prefer explicit hidden-metric accounting over implicit side effects.
Keep CLI contracts stable (start, status, advance, full-score style).
Avoid benchmark hype in docs; be clear about limitations and scope.
When changing claims or reported findings, update both docs and result artifacts together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to The Simulation Crucible

Add a New Simulation (Fast Path)

PR Checklist for New Simulations

Validation Before Opening a PR

Design Principles

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to The Simulation Crucible

Add a New Simulation (Fast Path)

PR Checklist for New Simulations

Validation Before Opening a PR

Design Principles