This document explains how to run the training pipeline with bitwise reproducible results for financial audit compliance and research reproducibility.
bin/run_deterministic.sh python TRAINING/orchestration/intelligent_trainer.py --experiment-config determinism_testpython TRAINING/orchestration/intelligent_trainer.py --experiment-config your_config| Mode | Command | Reproducibility | Performance |
|---|---|---|---|
| Strict | bin/run_deterministic.sh python ... |
Bitwise identical | CPU only (GPU disabled) |
| Best Effort | python ... |
Seeded but may vary | Full GPU/parallelism |
GPU is the primary source of non-determinism. GPU operations use parallel floating-point arithmetic where operation ordering is not guaranteed, causing slight numerical differences between runs.
Multithreading with CPU is generally deterministic when models are properly seeded. The pipeline sets seeds for all randomness sources, so multithreaded CPU execution typically produces identical results.
The launcher script (bin/run_deterministic.sh) sets critical environment variables before Python starts:
PYTHONHASHSEED=42 # Deterministic Python hash
REPRO_MODE=strict # Enable strict mode
OMP_NUM_THREADS=1 # Single-threaded OpenMP (conservative)
MKL_NUM_THREADS=1 # Single-threaded MKL (conservative)
CUBLAS_WORKSPACE_CONFIG=:4096:8 # CUDA determinism (if GPU used)The training pipeline then:
- Forces tree models to use
device_type=cpu(no GPU) - this is the critical setting - Sets
deterministic=Truefor LightGBM - Uses
n_jobs=1(conservative, but multithreading may work) - Uses SHA256-based seed derivation for stability
- Injects seeds into all model configs automatically
# Run 1
bin/run_deterministic.sh python TRAINING/orchestration/intelligent_trainer.py \
--experiment-config determinism_test 2>&1 | tee run1.log
# Run 2
bin/run_deterministic.sh python TRAINING/orchestration/intelligent_trainer.py \
--experiment-config determinism_test 2>&1 | tee run2.log
# Compare fingerprints (should be identical)
diff <(grep fingerprint run1.log) <(grep fingerprint run2.log)Each run produces snapshot.json files with determinism-relevant signatures:
# Find snapshots from your run
find RESULTS/runs/*/targets/*/reproducibility -name "snapshot.json"
# Compare comparison_group fields between runs
jq '.comparison_group' run1/targets/fwd_ret_10m/reproducibility/CROSS_SECTIONAL/cohort=*/snapshot.jsonKey fields to compare:
| Field | Purpose |
|---|---|
dataset_signature |
Hash of data_dir + symbols |
task_signature |
Hash of target config |
routing_signature |
Hash of view + symbol routing |
hyperparameters_signature |
Hash of model hyperparameters |
train_seed |
Random seed used |
metrics_sha256 |
Hash of output metrics |
If all signatures match between runs, the configuration is identical.
- Identical fingerprints = determinism working
- Different fingerprints = something is non-deterministic
For guaranteed deterministic runs, your experiment config should disable GPU:
# The critical setting - GPU is the source of non-determinism
reproducibility:
mode: strict
strict:
disable_gpu_tree_models: true # REQUIRED for determinismOptional (conservative) settings:
# These are set by the launcher script but may not be strictly necessary
# if all models are properly seeded
multi_target:
parallel_targets: false # Sequential (conservative)
threading:
parallel:
enabled: false # Disable parallelism (conservative)Note: Multithreading with properly seeded CPU models typically produces identical results. The single-threaded settings are conservative guarantees, not strict requirements.
See CONFIG/experiments/determinism_test.yaml for a complete example.
bin/run_deterministic.sh
│
├── Sets PYTHONHASHSEED=42 (before Python starts)
├── Sets REPRO_MODE=strict
├── Sets OMP_NUM_THREADS=1, MKL_NUM_THREADS=1
│
▼
intelligent_trainer.py
│
├── import repro_bootstrap (FIRST - sets thread env vars)
├── Validates PYTHONHASHSEED is set
├── Checks no numeric libs imported before bootstrap
│
▼
determinism.py
│
├── load_reproducibility_config() - ENV overrides YAML
├── seed_all() - Sets Python/NumPy/Torch seeds
├── create_estimator() - Single choke point for model creation
│ ├── Applies n_jobs=1 in strict mode
│ ├── Applies device_type=cpu in strict mode
│ └── Uses normalize_seed() to prevent edge cases
│
▼
Bitwise Identical Results
Reproducibility artifacts are organized under each target:
RESULTS/runs/<run_name>/
└── targets/<target>/
└── reproducibility/
├── CROSS_SECTIONAL/
│ ├── universe=<sig>/ # Identifies symbol set
│ │ └── feature_importances/ # Per-model CSV files
│ └── cohort=cs_<id>/
│ ├── snapshot.json # Full snapshot with signatures
│ ├── metrics.json # Output metrics
│ ├── metadata.json # Run metadata
│ └── diff_prev.json # Comparison to previous run
└── SYMBOL_SPECIFIC/
└── symbol=AAPL/ # Per-symbol directory
├── feature_importances/ # Per-model CSV files
└── cohort=sy_<id>/
├── snapshot.json
├── metrics.json
├── metadata.json
└── diff_prev.json
Key directories:
CROSS_SECTIONAL/- Multi-symbol (panel) runsSYMBOL_SPECIFIC/symbol=XXX/- Per-symbol runs
| File | Purpose |
|---|---|
CONFIG/pipeline/training/reproducibility.yaml |
Default determinism settings |
CONFIG/experiments/determinism_test.yaml |
Test config with parallelism disabled |
bin/run_deterministic.sh |
Launcher script |
reproducibility:
mode: strict # or best_effort
seed: 42
version: v1
strict:
require_env_vars: true
disable_gpu_tree_models: true # CRITICAL: GPU causes non-determinism
force_single_thread: true # Conservative (may not be required)
enforce_stable_ordering: true # Sort features/targets for stabilityKey insight: disable_gpu_tree_models: true is the critical setting. GPU floating-point operations have non-deterministic ordering. CPU operations with proper seeding are deterministic even with multithreading.
- Always use strict mode for compliance-critical runs
- Store fingerprints in your audit database
- Compare fingerprints to detect code/environment drift
Strict determinism only guarantees reproducibility within the same environment:
- Same Python version
- Same library versions (lightgbm, xgboost, numpy, etc.)
- Same CPU architecture
Pin versions in requirements.txt or use Docker for full reproducibility.
- Run determinism test before production deployments
- Store prediction hashes alongside model artifacts
- Document any fingerprint changes in change log
Use the launcher script:
bin/run_deterministic.sh python your_script.pyEnsure repro_bootstrap is imported FIRST in your entrypoint:
import TRAINING.common.repro_bootstrap # MUST be first
import numpy as np # Now safeCheck:
- Parallelism is disabled in config
- Using the launcher script
- Same library versions
- No external randomness (e.g., shuffled data loading)
from TRAINING.common.determinism import (
create_estimator, # Single choke point for model creation
seed_all, # Set all random seeds
resolve_seed, # SHA256-based seed derivation
is_strict_mode, # Check if strict mode enabled
stable_sort, # Deterministic ordering
load_reproducibility_config, # Load SST config
)import TRAINING.common.repro_bootstrap # FIRST!
from TRAINING.common.determinism import create_estimator, seed_all, resolve_seed
seed_all(42)
# Create model with determinism params automatically applied
model = create_estimator(
library="lightgbm",
base_config={"n_estimators": 100},
seed=resolve_seed(42, "training", target="fwd_ret_10m"),
problem_kind="regression"
)