Skip to content

[mirror] fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit#1

Open
yashwant86 wants to merge 8 commits intomm-base-12918from
mm-pr-12918
Open

[mirror] fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit#1
yashwant86 wants to merge 8 commits intomm-base-12918from
mm-pr-12918

Conversation

@yashwant86
Copy link
Copy Markdown

@yashwant86 yashwant86 commented Apr 26, 2026

Mirror of upstream Significant-Gravitas#12918 for benchmark. Do not merge.


Summary by MergeMonkey

  • Paper Trail:
    • Added comprehensive test suite for partial-work preservation on SDK retry failures.
    • Documented error marker stripping and orphan tool-use flushing behavior.
    • Clarified flush_unresolved_tool_calls contract: single invocation, shared event list.
  • New Features:
    • Preserve streamed assistant content when SDK retry loop fails, preventing user-visible data loss.
    • Synthesize tool_result rows for orphan tool_use blocks on final failure.
    • Restore unsent user text to composer on 429 rate-limit, drop optimistic bubble.
  • Squashed Bugs:
    • Fixed regression SECRT-2275: rolled-back partial work now re-attached on final-failure exit.
    • Fixed 429 rate-limit handling: unsent message text now recoverable in composer.
  • Tidying Up:
    • Renamed _flush_unresolved_tool_calls to public flush_unresolved_tool_calls.
    • Extracted TranscriptSnapshot type alias for clarity.
    • Added pyright ignore directive for complex function.

majdyz and others added 8 commits April 25, 2026 08:16
…failure exit

SECRT-2275 — when an SDK turn was interrupted (transient API errors with
exhausted retries, mid-stream LLM exceptions, or context-overflow with all
attempts exhausted) the retry loop's pre-decision rollback discarded the
assistant's partial work (text + tool calls + reasoning) that had been
incrementally appended to session.messages during the failed attempt.

Users described it as "the turn is gone": their UI streamed tokens live, then
a refresh showed an empty turn and the next message would prompt the model
to "continue" with no context, so it picked an unrelated old task.

Fix: capture the rolled-back partial in the retry-loop exception handlers and
re-attach it via a single helper on every final-failure branch (including
the events_yielded > 0 path that previously skipped the error marker entirely
and the non-context-non-transient + attempts-exhausted paths). Synthesize
"interrupted" tool_result rows for any orphan tool_use so the next turn's
LLM context stays API-valid. Successful retry breaks clear the captured
partial so attempt #1's rolled-back content doesn't leak into a successful
attempt #2's history.

Baseline path already preserves partial via its existing finally block; only
SDK was affected.
…block type-check

The inline _restore_partial_with_error_marker calls across five retry-loop
branches pushed stream_chat_completion_sdk past pyright's complexity heuristic
(CI type-check failed on main). Consolidate into a single post-loop block keyed
off ended_with_stream_error + the existing attempts_exhausted / transient_exhausted
/ stream_err flags, plus a new handled_error_info tuple that carries
_HandledStreamError's final-yield decision out of the retry loop.

Behaviour is unchanged — same restore semantics, same client-facing StreamError
sequencing, same transcript-upload skip. Confirmed with 319 existing + new
tests (backend/copilot/sdk + baseline).

Pyright still bails on the function body (1500 LoC — the retry loop with
context-overflow fallback + transient backoff + partial-work preservation
shares too much state across branches to split cleanly without hurting
readability). A file-targeted reportGeneralTypeIssues suppression covers the
complexity bailout while keeping real type errors elsewhere in the file
surfaced.
…ial + carry retryable through _HandledStreamError

Two fixes layered on the partial-restore path introduced by this PR:

1. _rollback_attempt_capturing_partial now drops trailing error markers
   (COPILOT_ERROR_PREFIX / COPILOT_RETRYABLE_ERROR_PREFIX) from the
   captured partial. _run_stream_attempt's idle-timeout and
   circuit-breaker paths append a marker via _append_error_marker BEFORE
   raising _HandledStreamError; without this filter the post-loop
   restore would replay the stale marker and then add a fresh one,
   leaving duplicate error bubbles and pushing any synthetic tool_result
   after an assistant(error) turn that has no matching tool_use.

2. Replace the (msg, code, already_yielded) 3-tuple carrying
   _HandledStreamError state out of the retry loop with a frozen
   _HandledErrorInfo dataclass that also carries `retryable`. The
   post-loop block now uses exc.retryable instead of hardcoding True,
   so a future _HandledStreamError(retryable=False, ...) won't silently
   write the wrong marker prefix.

3 new tests cover the rollback marker-stripping contract.
…o _InterruptedAttempt

Previous revisions carried the failed-attempt state across three separate
function-scope variables (last_attempt_partial, handled_error_info) + four
module-level helpers (_rollback_attempt_capturing_partial,
_restore_partial_with_error_marker, _flush_orphan_tool_uses_to_session,
_append_error_marker). The retry loop mutated all three and the post-loop
block reassembled the pieces by hand. Scattered and hard to follow.

Collapse to one dataclass with capture / clear / finalize + one
_classify_final_failure helper that picks the display message based on
which failure flag the retry loop set (attempts_exhausted,
transient_exhausted, stream_err, handled_error). Call sites:

  - success break:          interrupted.clear()
  - _HandledStreamError:    interrupted.capture(...); interrupted.handled_error = ...
  - Exception:              interrupted.capture(...)
  - post-loop:              final_msg, retryable = _classify_final_failure(interrupted, ...); interrupted.finalize(...)
  - outer except:           interrupted.finalize(...)

Behaviour is unchanged — same restore semantics, same StreamError
sequencing, same transcript-upload skip, same orphan tool_use flush, same
stale-marker stripping from b1172e2 / 5406fe9. The retry-scenarios
suite (48 integration tests) plus the rewritten interrupted_partial_test
(14 unit tests) both pass; the full SDK test suite (1012 tests) is green.
…final-failure emit

CodeRabbit flagged that _flush_orphan_tool_uses_to_session (called from
_InterruptedAttempt.finalize) used state.adapter._flush_unresolved_tool_calls
with a # noqa: SLF001 suppressor. The private call mutates resolved_tool_calls
and flips has_unresolved_tool_calls to False, which caused the downstream
error-cleanup block at lines 4185-4200 to skip its own flush — UI spinners
on the client stayed open until page refresh because no cleanup events were
yielded after the early flush swallowed the unresolved state.

Changes:
- Rename _flush_unresolved_tool_calls → flush_unresolved_tool_calls (public)
  in response_adapter.py; update 3 internal call sites + 2 service.py sites.
  Drops the # noqa: SLF001 suppressor (no longer a private-access violation).
- _flush_orphan_tool_uses_to_session and _InterruptedAttempt.finalize now
  return the list[StreamBaseResponse] produced by the flush so the caller
  yields them to the client instead of re-flushing.
- Replace the three scattered post-loop error blocks (partial restore +
  redundant flush + stream_err yield + handled_error yield) with one
  consolidated block that: (a) calls _classify_final_failure → _FinalFailure,
  (b) yields finalize()'s events + _end_text_if_open, (c) yields one
  StreamError (unless handled_error.already_yielded=True). Fixes the
  double-flush skip-cleanup bug and eliminates duplicated error-text/code
  strings between history marker and SSE yield.
- _classify_final_failure now returns _FinalFailure(display_msg, code,
  retryable) instead of a (msg, retryable) tuple — single source of truth
  for in-history marker + SSE event so they can't drift.

Tests: +5 _classify_final_failure contract tests, +2 return-value assertions
on finalize/orphan-flush. All 1022 SDK tests pass (was 1012).
…-limit

Mode 1 (rate-limit at turn start, user message never persisted): the
backend's `check_rate_limit` raises BEFORE `append_and_save_message`, so
when a 429 fires the user's text only exists in the optimistic `useChat`
bubble — refresh or even a successful retry would lose it.

See `autogpt_platform/backend/backend/api/features/chat/routes.py:916-922`
(rate-limit check) and `routes.py:945` (later append-and-save) — backend
can't recover this on its own.

New flow on 429:
  - drop the optimistic user bubble (since DB has no record of it),
  - push `lastSubmittedMsgRef.current` back into the composer via the
    existing `setInitialPrompt` slot — same path URL pre-fills use, so
    `useChatInput`'s `consumeInitialPrompt` effect picks it up
    automatically,
  - clear `lastSubmittedMsgRef` so dedup doesn't block re-send.

In-memory only; refresh-survival is a separate follow-up.
Adds two cases that the existing 429 test did not exercise so codecov/patch
clears the 80% threshold:
1. The setMessages updater is invoked but no-ops when the trailing message
   is not a user bubble (assistant reply already landed).
2. The composer-restore branch is skipped entirely when no unsent text was
   captured (lastSubmittedMsgRef is null at error time).
Replace `transcript_snap: object` + `# type: ignore[arg-type]` on the
restore() call with a `TranscriptSnapshot` type alias exported from
transcript_builder, so `_InterruptedAttempt.capture` is fully typed.
@bot-mergemonkey
Copy link
Copy Markdown

bot-mergemonkey Bot commented Apr 26, 2026

Risk AssessmentCRITICAL · ~45 min review

Focus areas: Partial-work capture/finalize contract across retry loop · Orphan tool-use flushing and event deduplication · Rate-limit recovery and optimistic bubble rollback · Error classification unification (history + SSE)

Assessment: Fixes data-loss regression (SECRT-2275) affecting user-visible chat history and error handling.

Walkthrough

When an SDK retry loop fails, the system now captures rolled-back messages via _InterruptedAttempt.capture() before the rollback, preserving what the user saw streaming. On final-failure exit, finalize() re-attaches the partial work, flushes any orphan tool_use blocks as synthetic tool_result rows, and appends an error marker — all persisted to session history and yielded to the client. On frontend 429 rate-limit, the unsent message text is restored to the composer and the optimistic user bubble is dropped, allowing the user to retry after the limit resets.

Changes

Files Summary
Partial-Work Preservation on Retry Failure
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
Introduces _InterruptedAttempt dataclass to capture rolled-back messages + error state across retry loop. Implements capture() to preserve partial work before rollback, finalize() to re-attach on final failure, and _flush_orphan_tool_uses_to_session() to synthesize missing tool_result rows. Adds _classify_final_failure() to unify error classification for both history markers and SSE yields. Fixes SECRT-2275 regression where users saw empty turns after refresh.
Tool-Call Flushing Contract & Visibility
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
Renamed _flush_unresolved_tool_calls to public flush_unresolved_tool_calls and updated all call sites. Enhanced docstring to clarify single-invocation contract: method mutates adapter state, so callers must share the returned event list to avoid stale UI elements (spinners) staying open.
Rate-Limit Recovery in Frontend
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/useCopilotStream.test.ts
On 429 usage-limit error, restores unsent message text to composer via setInitialPrompt and drops the optimistic user bubble so user can edit/resend after rate-limit reset. Includes tests for text restoration, bubble removal, and edge cases (no captured text, non-user trailing message).
Type Clarity & Refactoring
autogpt_platform/backend/backend/copilot/transcript_builder.py
Extracted TranscriptSnapshot type alias for snapshot/restore methods. Improves readability and enables reuse in _InterruptedAttempt.capture().

Sequence Diagram

sequenceDiagram
    participant RetryLoop as Retry Loop
    participant Attempt as _InterruptedAttempt
    participant Session as ChatSession
    participant Adapter as ResponseAdapter
    participant Client as Frontend
    
    RetryLoop->>Attempt: capture(session, builder, snap, pre_count)
    Attempt->>Attempt: strip trailing error markers
    Attempt->>Session: roll back messages to pre_count
    Attempt->>Attempt: store partial work
    
    Note over RetryLoop: Attempt fails, decide no retry
    
    RetryLoop->>Attempt: finalize(session, state, msg, retryable)
    Attempt->>Session: re-attach partial messages
    Attempt->>Adapter: flush_unresolved_tool_calls()
    Adapter->>Session: append synthetic tool_result rows
    Adapter-->>Attempt: return events
    Attempt->>Session: append error marker
    Attempt-->>RetryLoop: return events
    
    RetryLoop->>Client: yield events + StreamError
    Client->>Client: render partial work + error bubble
    
    Note over Client: On 429 rate-limit
    Client->>Client: restore unsent text to composer
    Client->>Client: drop optimistic user bubble
Loading

Dig Deeper With Commands

  • /review <file-path> <function-optional>
  • /chat <file-path> "<question>"
  • /roast <file-path>

Runs only when explicitly triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants