Skip to content

fix(weka): guarantee trailing-user turns; drop raw-payload re-validation + fix batch-flush shutdown loss#12

Merged
cquil11 merged 2 commits into
SemiAnalysisAI:cquil11/agentx-v0.4-syncfrom
ajcasagrande:ajc/fix-trailing-assistant-roles
Jun 30, 2026
Merged

fix(weka): guarantee trailing-user turns; drop raw-payload re-validation + fix batch-flush shutdown loss#12
cquil11 merged 2 commits into
SemiAnalysisAI:cquil11/agentx-v0.4-syncfrom
ajcasagrande:ajc/fix-trailing-assistant-roles

Conversation

@ajcasagrande

Copy link
Copy Markdown

Two independent fixes on top of cquil11/agentx-v0.4-sync.

1. Weka: guarantee trailing-user turns (feat(weka))

Reconstructed weka turns could end with a trailing assistant segment when a block-aligned context pull-back truncated onto an assistant block — 376 / 98,827 turns (0.38%) on the 062126 corpus. A chat request must end with a user message.

  • New compute_asst_block_caps: a role-independent forward pass that finds every degenerate pull-back's truncation target and caps the assistant block count of the turn that created that block, so the boundary lands on a user block. The boundary is fixed at creation and never relabeled → cross-turn KV-cache reuse preserved (relabel-after-emission would diverge the cached prefix).
  • Wired into all five reconstruction loops (serial parent/child/flat-chain + parallel-worker parent/child).
  • Extracted compute_turn_block_geometry as the single source of truth advance_turn and the planner share; the planner mutates its block tile in place (O(new)/turn).
  • The only theoretically-unfixable shape (a turn that appends zero new tokens, e.g. a system-only turn 0) stays flagged on _trailing_non_user_turns; none occur in the corpus.

Validated on the full 393-trace corpus: trailing-assistant 376 → 0, 0 crashes, byte-exact sum(seg.tokens) == in_tokens preserved.

2. Raw payload: drop per-record orjson re-validation + fix batch-flush shutdown loss (fix(raw-payload))

  • Removed the per-send orjson.loads round-trip in InferenceClient and the per-record orjson.loads validation in RawRecordWriterProcessor. payload_bytes are validated at dataset-load time (or produced by orjson.dumps), so re-parsing every request/record only reintroduced the decode cost the verbatim / orjson.Fragment fast paths exist to avoid. Invalid bytes now forward/splice verbatim.
  • Fixed a pre-existing shutdown data-loss race: RawRecordWriterProcessor's batch-trigger flush task was scheduled via execute_async without being registered in _flush_tasks, so _stop_all_tasks cancelled it before _close_file awaited it — losing the whole batch on stop. Now registered like the parent BufferedJSONLWriterMixin.

Testing

  • uv run pytest tests/unit/dataset/loader/ — green (new helper / cap / tool-shaping tests added).
  • uv run pytest tests/unit/post_processors/test_raw_record_writer_adversarial.py — green (adversarial tests updated to verbatim-splice behavior; the batch-flush shutdown test now passes).
  • uv run pytest tests/component_integration/dataset/test_raw_payload_replay_adversarial.py -m component_integration — green.
  • Full 393-trace corpus reconstruction: 0 trailing-assistant, no crashes.

Pre-existing unrelated failure, not touched here: tests/unit/dataset/loader/test_weka_trace.py::test_flattened_fanout_logs_detection_summary (a log-string drift in code outside this change).

🤖 Generated with Claude Code

ajcasagrande and others added 2 commits June 30, 2026 10:29
Reconstructed weka turns could end with a trailing assistant segment when a block-aligned context pull-back truncated onto an assistant block (376/98,827 turns on the 062126 corpus). A chat request must end with a user message.

Add compute_asst_block_caps: a role-independent forward pass that finds every degenerate pull-back's truncation target and caps the assistant block count of the turn that created that block, so the boundary lands on a user block. The boundary is fixed at creation and never relabeled, preserving cross-turn KV-cache reuse. Wire the cap into all five reconstruction loops (serial parent/child/flat-chain + parallel parent/child). Extract compute_turn_block_geometry as the single source of truth advance_turn and the planner share; the planner mutates its block tile in place to stay O(new) per turn.

Drives trailing-assistant turns to 0 across the full 393-trace corpus with no crashes; byte-exact sum(seg.tokens)==in_tokens preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…sh shutdown loss

Remove the per-send orjson.loads round-trip in InferenceClient and the per-record orjson.loads validation in RawRecordWriterProcessor. payload_bytes are validated at dataset-load time (or produced by orjson.dumps), so re-parsing every request/record only reintroduces the decode cost the verbatim / orjson.Fragment paths exist to avoid. Invalid bytes now forward/splice verbatim.

Also fix a pre-existing shutdown data-loss race: RawRecordWriterProcessor's batch-trigger flush task was scheduled via execute_async without being registered in _flush_tasks, so _stop_all_tasks cancelled it before _close_file awaited it, losing the whole batch on stop. Register the task like the parent BufferedJSONLWriterMixin does.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
@github-actions

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@e165d787faa8ba8de34feb6e5e3525eaa33a82f0

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@e165d787faa8ba8de34feb6e5e3525eaa33a82f0

Last updated for commit: e165d78Browse code

@cquil11 cquil11 merged commit cec702b into SemiAnalysisAI:cquil11/agentx-v0.4-sync Jun 30, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants