feat: accuracy issuer inherits perf concurrency in online mode (#357)#379
feat: accuracy issuer inherits perf concurrency in online mode (#357)#379arekay-nv wants to merge 6 commits into
Conversation
When the performance phase runs the CONCURRENCY load pattern (online), the accuracy phase now mirrors that same fixed concurrency instead of always bursting at MAX_THROUGHPUT, so evaluation exercises the endpoint the same way as the performance run. All other patterns are unchanged: POISSON and offline MAX_THROUGHPUT perf phases keep the accuracy phase at MAX_THROUGHPUT, since inheriting POISSON would silently rate-limit evaluation to the perf QPS (no accuracy QPS-budgeting yet). The gate is purely load_pattern.type == CONCURRENCY, which the schema already constrains to online mode. Also logs the accuracy issuer's chosen load mode (pattern + target_concurrency) per accuracy dataset. Adds unit tests for the concurrency-inheritance, POISSON-stays-max-throughput, offline-stays-max-throughput, and logging cases. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request updates the benchmark execution logic so that the accuracy phase mirrors the fixed concurrency of the performance phase when a CONCURRENCY load pattern is used, while continuing to default to MAX_THROUGHPUT for other patterns (such as POISSON). It also adds logging for the accuracy issuer's load mode and includes comprehensive unit tests to verify these behaviors. There are no review comments, and I have no additional feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Signed-off-by: arekay-nv <230885705+arekay-nv@users.noreply.github.com>
| # the (non-agentic) accuracy datasets — create_load_strategy rejects it — | ||
| # so it (and a missing perf pattern) falls back to MAX_THROUGHPUT. | ||
| perf_lp = ctx.rt_settings.load_pattern | ||
| if perf_lp is None or perf_lp.type == LoadPatternType.AGENTIC_INFERENCE: |
There was a problem hiding this comment.
@hvagadia is this the right intended behavior for agentic workload? I might be missing something here but thought we should use the same load pattern as well
There was a problem hiding this comment.
For our standalone accuracy. we will likely have to use much lower concurrency than performance due to docker overhead. @tianmu-li is working on the PR, he will likely have a separate field to control accuracy load pattern.
| # and QPS-budgeting support are added. | ||
| acc_load_pattern: LoadPattern | None = LoadPattern( | ||
| type=LoadPatternType.MAX_THROUGHPUT | ||
| if acc_load_pattern.type == LoadPatternType.CONCURRENCY: |
There was a problem hiding this comment.
Seems like LoadPatternType can have a __str__ (or/and __repr__) class so we can print directly instead of if statement here
When the performance phase runs the CONCURRENCY load pattern (online), the accuracy phase now mirrors that same fixed concurrency instead of always bursting at MAX_THROUGHPUT, so evaluation exercises the endpoint the same way as the performance run.
All other patterns are unchanged: POISSON and offline MAX_THROUGHPUT perf phases keep the accuracy phase at MAX_THROUGHPUT, since inheriting POISSON would silently rate-limit evaluation to the perf QPS (no accuracy QPS-budgeting yet). The gate is purely load_pattern.type == CONCURRENCY, which the schema already constrains to online mode.
Also logs the accuracy issuer's chosen load mode (pattern + target_concurrency) per accuracy dataset. Adds unit tests for the concurrency-inheritance, POISSON-stays-max-throughput, offline-stays-max-throughput, and logging cases.
What does this PR do?
Type of change
Related issues
Testing
Checklist