Deterministic Runs

This document explains how to run the training pipeline with bitwise reproducible results for financial audit compliance and research reproducibility.

Quick Start

Run with Strict Determinism

bin/run_deterministic.sh python TRAINING/orchestration/intelligent_trainer.py --experiment-config determinism_test

Run Normal (Best Effort)

python TRAINING/orchestration/intelligent_trainer.py --experiment-config your_config

Modes

Mode	Command	Reproducibility	Performance
Strict	`bin/run_deterministic.sh python ...`	Bitwise identical	CPU only (GPU disabled)
Best Effort	`python ...`	Seeded but may vary	Full GPU/parallelism

What Causes Non-Determinism?

GPU is the primary source of non-determinism. GPU operations use parallel floating-point arithmetic where operation ordering is not guaranteed, causing slight numerical differences between runs.

Multithreading with CPU is generally deterministic when models are properly seeded. The pipeline sets seeds for all randomness sources, so multithreaded CPU execution typically produces identical results.

What Strict Mode Does

The launcher script (bin/run_deterministic.sh) sets critical environment variables before Python starts:

PYTHONHASHSEED=42        # Deterministic Python hash
REPRO_MODE=strict        # Enable strict mode
OMP_NUM_THREADS=1        # Single-threaded OpenMP (conservative)
MKL_NUM_THREADS=1        # Single-threaded MKL (conservative)
CUBLAS_WORKSPACE_CONFIG=:4096:8  # CUDA determinism (if GPU used)

The training pipeline then:

Forces tree models to use device_type=cpu (no GPU) - this is the critical setting
Sets deterministic=True for LightGBM
Uses n_jobs=1 (conservative, but multithreading may work)
Uses SHA256-based seed derivation for stability
Injects seeds into all model configs automatically

Verification

Run Twice and Compare

# Run 1
bin/run_deterministic.sh python TRAINING/orchestration/intelligent_trainer.py \
    --experiment-config determinism_test 2>&1 | tee run1.log

# Run 2
bin/run_deterministic.sh python TRAINING/orchestration/intelligent_trainer.py \
    --experiment-config determinism_test 2>&1 | tee run2.log

# Compare fingerprints (should be identical)
diff <(grep fingerprint run1.log) <(grep fingerprint run2.log)

Verify Using Snapshots

Each run produces snapshot.json files with determinism-relevant signatures:

# Find snapshots from your run
find RESULTS/runs/*/targets/*/reproducibility -name "snapshot.json"

# Compare comparison_group fields between runs
jq '.comparison_group' run1/targets/fwd_ret_10m/reproducibility/CROSS_SECTIONAL/cohort=*/snapshot.json

Key fields to compare:

Field	Purpose
`dataset_signature`	Hash of data_dir + symbols
`task_signature`	Hash of target config
`routing_signature`	Hash of view + symbol routing
`hyperparameters_signature`	Hash of model hyperparameters
`train_seed`	Random seed used
`metrics_sha256`	Hash of output metrics

If all signatures match between runs, the configuration is identical.

What to Expect

Identical fingerprints = determinism working
Different fingerprints = something is non-deterministic

Config Requirements

For guaranteed deterministic runs, your experiment config should disable GPU:

# The critical setting - GPU is the source of non-determinism
reproducibility:
  mode: strict
  strict:
    disable_gpu_tree_models: true  # REQUIRED for determinism

Optional (conservative) settings:

# These are set by the launcher script but may not be strictly necessary
# if all models are properly seeded
multi_target:
  parallel_targets: false  # Sequential (conservative)

threading:
  parallel:
    enabled: false  # Disable parallelism (conservative)

Note: Multithreading with properly seeded CPU models typically produces identical results. The single-threaded settings are conservative guarantees, not strict requirements.

See CONFIG/experiments/determinism_test.yaml for a complete example.

Architecture

bin/run_deterministic.sh
    │
    ├── Sets PYTHONHASHSEED=42 (before Python starts)
    ├── Sets REPRO_MODE=strict
    ├── Sets OMP_NUM_THREADS=1, MKL_NUM_THREADS=1
    │
    ▼
intelligent_trainer.py
    │
    ├── import repro_bootstrap (FIRST - sets thread env vars)
    ├── Validates PYTHONHASHSEED is set
    ├── Checks no numeric libs imported before bootstrap
    │
    ▼
determinism.py
    │
    ├── load_reproducibility_config() - ENV overrides YAML
    ├── seed_all() - Sets Python/NumPy/Torch seeds
    ├── create_estimator() - Single choke point for model creation
    │   ├── Applies n_jobs=1 in strict mode
    │   ├── Applies device_type=cpu in strict mode
    │   └── Uses normalize_seed() to prevent edge cases
    │
    ▼
Bitwise Identical Results

Output Directory Structure

Reproducibility artifacts are organized under each target:

RESULTS/runs/<run_name>/
└── targets/<target>/
    └── reproducibility/
        ├── CROSS_SECTIONAL/
        │   ├── universe=<sig>/           # Identifies symbol set
        │   │   └── feature_importances/  # Per-model CSV files
        │   └── cohort=cs_<id>/
        │       ├── snapshot.json         # Full snapshot with signatures
        │       ├── metrics.json          # Output metrics
        │       ├── metadata.json         # Run metadata
        │       └── diff_prev.json        # Comparison to previous run
        └── SYMBOL_SPECIFIC/
            └── symbol=AAPL/              # Per-symbol directory
                ├── feature_importances/  # Per-model CSV files
                └── cohort=sy_<id>/
                    ├── snapshot.json
                    ├── metrics.json
                    ├── metadata.json
                    └── diff_prev.json

Key directories:

CROSS_SECTIONAL/ - Multi-symbol (panel) runs
SYMBOL_SPECIFIC/symbol=XXX/ - Per-symbol runs

Configuration Files

File	Purpose
`CONFIG/pipeline/training/reproducibility.yaml`	Default determinism settings
`CONFIG/experiments/determinism_test.yaml`	Test config with parallelism disabled
`bin/run_deterministic.sh`	Launcher script

reproducibility.yaml

reproducibility:
  mode: strict  # or best_effort
  seed: 42
  version: v1
  
  strict:
    require_env_vars: true
    disable_gpu_tree_models: true   # CRITICAL: GPU causes non-determinism
    force_single_thread: true        # Conservative (may not be required)
    enforce_stable_ordering: true    # Sort features/targets for stability

Key insight: disable_gpu_tree_models: true is the critical setting. GPU floating-point operations have non-deterministic ordering. CPU operations with proper seeding are deterministic even with multithreading.

For Production Financial Use

Audit Trail

Always use strict mode for compliance-critical runs
Store fingerprints in your audit database
Compare fingerprints to detect code/environment drift

Environment Pinning

Strict determinism only guarantees reproducibility within the same environment:

Same Python version
Same library versions (lightgbm, xgboost, numpy, etc.)
Same CPU architecture

Pin versions in requirements.txt or use Docker for full reproducibility.

Regulatory Compliance

Run determinism test before production deployments
Store prediction hashes alongside model artifacts
Document any fingerprint changes in change log

Troubleshooting

"PYTHONHASHSEED not set"

Use the launcher script:

bin/run_deterministic.sh python your_script.py

"Bootstrap imported too late"

Ensure repro_bootstrap is imported FIRST in your entrypoint:

import TRAINING.common.repro_bootstrap  # MUST be first
import numpy as np  # Now safe

Different fingerprints across runs

Check:

Parallelism is disabled in config
Using the launcher script
Same library versions
No external randomness (e.g., shuffled data loading)

API Reference

Key Functions

from TRAINING.common.determinism import (
    create_estimator,      # Single choke point for model creation
    seed_all,              # Set all random seeds
    resolve_seed,          # SHA256-based seed derivation
    is_strict_mode,        # Check if strict mode enabled
    stable_sort,           # Deterministic ordering
    load_reproducibility_config,  # Load SST config
)

Example Usage

import TRAINING.common.repro_bootstrap  # FIRST!

from TRAINING.common.determinism import create_estimator, seed_all, resolve_seed

seed_all(42)

# Create model with determinism params automatically applied
model = create_estimator(
    library="lightgbm",
    base_config={"n_estimators": 100},
    seed=resolve_seed(42, "training", target="fwd_ret_10m"),
    problem_kind="regression"
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic Runs

Quick Start

Run with Strict Determinism

Run Normal (Best Effort)

Modes

What Causes Non-Determinism?

What Strict Mode Does

Verification

Run Twice and Compare

Verify Using Snapshots

What to Expect

Config Requirements

Architecture

Output Directory Structure

Configuration Files

reproducibility.yaml

For Production Financial Use

Audit Trail

Environment Pinning

Regulatory Compliance

Troubleshooting

"PYTHONHASHSEED not set"

"Bootstrap imported too late"

Different fingerprints across runs

API Reference

Key Functions

Example Usage

FilesExpand file tree

DETERMINISTIC_RUNS.md

Latest commit

History

DETERMINISTIC_RUNS.md

File metadata and controls

Deterministic Runs

Quick Start

Run with Strict Determinism

Run Normal (Best Effort)

Modes

What Causes Non-Determinism?

What Strict Mode Does

Verification

Run Twice and Compare

Verify Using Snapshots

What to Expect

Config Requirements

Architecture

Output Directory Structure

Configuration Files

reproducibility.yaml

For Production Financial Use

Audit Trail

Environment Pinning

Regulatory Compliance

Troubleshooting

"PYTHONHASHSEED not set"

"Bootstrap imported too late"

Different fingerprints across runs

API Reference

Key Functions

Example Usage