Skip to content

feat(config): add v6.1 ruleset and pin RNG seeds from submission ruleset#389

Open
arekay-nv wants to merge 1 commit into
mainfrom
feat/random-seeds
Open

feat(config): add v6.1 ruleset and pin RNG seeds from submission ruleset#389
arekay-nv wants to merge 1 commit into
mainfrom
feat/random-seeds

Conversation

@arekay-nv

Copy link
Copy Markdown
Collaborator

Adds the MLPerf Inference v6.1 RoundRuleset (schedule/sample_index seeds from loadgen/mlperf.conf; per-model latency targets unchanged from v5.1) and makes it CURRENT. The registry now registers every known round by version, so both v5.1 and v6.1 are resolvable.

When a config carries submission_ref, _resolve_and_validate now pins the runtime RNG seeds from the selected ruleset during construction — before the config is dumped to the report dir, so the persisted config.yaml, RuntimeSettings, and report all reflect the seeds that actually ran. The ruleset value wins over any user-supplied seed (mirrors LoadGen locking core seeds from user.conf). Lenient by design: an unregistered ruleset leaves the config unchanged, so placeholder/non-submission configs are unaffected.

Duration/latency overrides and the runtime-path integration (surfacing ruleset latency targets in the report) are deferred to a follow-up.

What does this PR do?

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

Adds the MLPerf Inference v6.1 RoundRuleset (schedule/sample_index seeds from
loadgen/mlperf.conf; per-model latency targets unchanged from v5.1) and makes
it CURRENT. The registry now registers every known round by version, so both
v5.1 and v6.1 are resolvable.

When a config carries submission_ref, _resolve_and_validate now pins the
runtime RNG seeds from the selected ruleset during construction — before the
config is dumped to the report dir, so the persisted config.yaml,
RuntimeSettings, and report all reflect the seeds that actually ran. The
ruleset value wins over any user-supplied seed (mirrors LoadGen locking core
seeds from user.conf). Lenient by design: an unregistered ruleset leaves the
config unchanged, so placeholder/non-submission configs are unaffected.

Duration/latency overrides and the runtime-path integration (surfacing ruleset
latency targets in the report) are deferred to a follow-up.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@arekay-nv arekay-nv requested a review from a team July 2, 2026 19:59
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions Bot requested a review from nvzhihanj July 2, 2026 19:59

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the MLPerf Inference v6.1 ruleset, including its official RNG seeds, and updates the default submission template to reference v6.1. It also implements a mechanism in BenchmarkConfig to automatically override and pin runtime RNG seeds based on the selected submission ruleset, ensuring compliance. Feedback is provided regarding the definition of _v6_1, where copying the mutable benchmark_rulesets dictionary instead of sharing its reference with _v5_1 is recommended to prevent potential side effects.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

version="v6.1",
scheduler_rng_seed=3936089224930324775,
sample_index_rng_seed=14276810075590677512,
benchmark_rulesets=_v5_1.benchmark_rulesets,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Sharing the mutable benchmark_rulesets dictionary reference between _v5_1 and _v6_1 can lead to unexpected side effects if the rulesets for one round are modified or extended in the future. It is safer to copy the nested dictionary to ensure round isolation.

Suggested change
benchmark_rulesets=_v5_1.benchmark_rulesets,
benchmark_rulesets={model: rules.copy() for model, rules in _v5_1.benchmark_rulesets.items()},

@arekay-nv arekay-nv left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Council — Multi-AI Code Review

Reviewed by: Codex + Claude | Depth: thorough

Codex: no actionable correctness issues. Claude: 5 posted (1 shared-dict finding on rules.py:282 was dropped as a duplicate of the existing gemini-code-assist comment). See the summary comment below for the tiered breakdown.

return

updates: dict[str, int] = {}
if ruleset.scheduler_rng_seed is not None:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude] medium (testing): The two is not None seed branches and the if not updates: return early-exit are never exercised — both registered MLCommons rulesets set both seeds. ruleset_base.py documents None as a valid "unseeded randomization" value, so a partial ruleset (one seed None) or a no-op ruleset (both None) is a supported, reachable state. Add a test that registers a throwaway RoundRuleset(scheduler_rng_seed=None, sample_index_rng_seed=<int>) (and one with both None) and asserts only the set seed is pinned / the config is left unchanged.

"""Only the round seeds rotate; per-model targets are identical."""
v5_1 = get_ruleset("mlperf-inference-v5.1")
v6_1 = get_ruleset("mlperf-inference-v6.1")
assert v6_1.benchmark_rulesets == v5_1.benchmark_rulesets

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude] low (testing): v6_1.benchmark_rulesets == v5_1.benchmark_rulesets is identity-trivially true — _v6_1 is built with benchmark_rulesets=_v5_1.benchmark_rulesets (the same dict object), so this passes without actually verifying the per-model targets. (If the shared reference is replaced by a copy — see the existing suggestion on rules.py:282 — this becomes a real structural check.) Consider asserting a concrete expected value, e.g. a specific model's max_tpot_latency_ms, instead.

# Also register as "mlcommons-current" for convenience
# Register every known round under its version-specific name
for ruleset in mlcommons_rounds:
register_ruleset(f"mlperf-inference-{ruleset.version}", ruleset)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude] low (design): register_ruleset does _RULESET_REGISTRY[name] = ruleset with no duplicate guard, so if two ALL_ROUNDS entries ever share a version string the earlier one is silently overwritten — a subtle trap for whoever adds the next round. Consider raising/logging on an already-present key, and/or a test asserting len(ALL_ROUNDS) == len({r.version for r in ALL_ROUNDS}).

# which the sample-index shuffle already covers).
_v6_1 = RoundRuleset(
version="v6.1",
scheduler_rng_seed=3936089224930324775,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude] low (verify before merge): These seeds are compliance-locked, so it's worth confirming them against the tagged v6.1 loadgen/mlperf.conf. They match the repo's own docs/RANDOM_SEEDS.md table (sourced from mlperf.conf @ v6.0.0pre) and the schedule_rng_seed→scheduler / sample_index_rng_seed→dataloader mapping is consistent — but I can't verify from here that the round didn't rotate seeds between v6.0.0pre and v6.1. A maintainer should confirm against the final round config.

"settings",
self.settings.model_copy(update={"runtime": new_runtime}),
)
logger.info(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude] low (log noise): logger.info("Pinned RNG seeds ...") fires on every construction. with_updates re-runs all validators, and the CLI reconstructs submission configs at least twice (from_yaml_file, then with_updates(timeout=...)/with_updates(datasets=...)), so a single run logs this line multiple times. The override itself is correctly idempotent — consider demoting to debug to cut the noise.

@arekay-nv

Copy link
Copy Markdown
Collaborator Author

Review Council — Multi-AI Code Review

Reviewed by: Codex (gpt-5.5, xhigh) + Claude | Depth: thorough

Codex: "changes consistently register the new MLCommons round and apply the selected ruleset seeds into runtime configuration without introducing an evident functional regression. No actionable correctness issues."

Claude: 6 findings; 1 dropped as a duplicate of the existing gemini-code-assist comment on rules.py:282 (shared mutable-dict reference). 5 posted inline.

No critical/high issues. The core logic (frozen-model mutation via model_copy + object.__setattr__, lenient resolution, ruleset-wins seed lock, before-dump ordering) reviewed clean under both reviewers.

🟡 Should Fix (medium)

# File Line Category Reviewer Summary
1 config/schema.py 1055 testing Claude is not None seed branches + if not updates early-exit are untested — a None-seed ruleset is a documented, supported state

🔵 Consider (low)

# File Line Category Reviewer Summary
2 .../mlcommons/test_rules.py 147 testing Claude benchmark_rulesets == assertion is identity-trivially true (same dict object) — verifies nothing
3 config/ruleset_registry.py 85 design Claude No duplicate-version guard — a future round sharing a version string would be silently overwritten
4 .../mlcommons/rules.py 280 docs/verify Claude Compliance-locked seeds match docs/RANDOM_SEEDS.md (@ v6.0.0pre) but should be confirmed against the tagged v6.1 mlperf.conf
5 config/schema.py 1068 error-handling Claude logger.info("Pinned RNG seeds …") fires multiple times per run (with_updates re-validates) — consider debug

Related: the existing gemini-code-assist comment on rules.py:282 (copy the shared benchmark_rulesets dict) would also resolve finding #2 if applied.

Commit hygiene: 1 commit, 0 fixups — clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant