[mirror] fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit by yashwant86 · Pull Request #1 · Mergemonkey-demo/AutoGPT

yashwant86 · 2026-04-26T14:30:43Z

Mirror of upstream Significant-Gravitas#12918 for benchmark. Do not merge.

Summary by MergeMonkey

Paper Trail:
- Added comprehensive test suite for partial-work preservation on SDK retry failures.
- Documented error marker stripping and orphan tool-use flushing behavior.
- Clarified flush_unresolved_tool_calls contract: single invocation, shared event list.
New Features:
- Preserve streamed assistant content when SDK retry loop fails, preventing user-visible data loss.
- Synthesize tool_result rows for orphan tool_use blocks on final failure.
- Restore unsent user text to composer on 429 rate-limit, drop optimistic bubble.
Squashed Bugs:
- Fixed regression SECRT-2275: rolled-back partial work now re-attached on final-failure exit.
- Fixed 429 rate-limit handling: unsent message text now recoverable in composer.
Tidying Up:
- Renamed _flush_unresolved_tool_calls to public flush_unresolved_tool_calls.
- Extracted TranscriptSnapshot type alias for clarity.
- Added pyright ignore directive for complex function.

…failure exit SECRT-2275 — when an SDK turn was interrupted (transient API errors with exhausted retries, mid-stream LLM exceptions, or context-overflow with all attempts exhausted) the retry loop's pre-decision rollback discarded the assistant's partial work (text + tool calls + reasoning) that had been incrementally appended to session.messages during the failed attempt. Users described it as "the turn is gone": their UI streamed tokens live, then a refresh showed an empty turn and the next message would prompt the model to "continue" with no context, so it picked an unrelated old task. Fix: capture the rolled-back partial in the retry-loop exception handlers and re-attach it via a single helper on every final-failure branch (including the events_yielded > 0 path that previously skipped the error marker entirely and the non-context-non-transient + attempts-exhausted paths). Synthesize "interrupted" tool_result rows for any orphan tool_use so the next turn's LLM context stays API-valid. Successful retry breaks clear the captured partial so attempt #1's rolled-back content doesn't leak into a successful attempt #2's history. Baseline path already preserves partial via its existing finally block; only SDK was affected.

…block type-check The inline _restore_partial_with_error_marker calls across five retry-loop branches pushed stream_chat_completion_sdk past pyright's complexity heuristic (CI type-check failed on main). Consolidate into a single post-loop block keyed off ended_with_stream_error + the existing attempts_exhausted / transient_exhausted / stream_err flags, plus a new handled_error_info tuple that carries _HandledStreamError's final-yield decision out of the retry loop. Behaviour is unchanged — same restore semantics, same client-facing StreamError sequencing, same transcript-upload skip. Confirmed with 319 existing + new tests (backend/copilot/sdk + baseline). Pyright still bails on the function body (1500 LoC — the retry loop with context-overflow fallback + transient backoff + partial-work preservation shares too much state across branches to split cleanly without hurting readability). A file-targeted reportGeneralTypeIssues suppression covers the complexity bailout while keeping real type errors elsewhere in the file surfaced.

…ial + carry retryable through _HandledStreamError Two fixes layered on the partial-restore path introduced by this PR: 1. _rollback_attempt_capturing_partial now drops trailing error markers (COPILOT_ERROR_PREFIX / COPILOT_RETRYABLE_ERROR_PREFIX) from the captured partial. _run_stream_attempt's idle-timeout and circuit-breaker paths append a marker via _append_error_marker BEFORE raising _HandledStreamError; without this filter the post-loop restore would replay the stale marker and then add a fresh one, leaving duplicate error bubbles and pushing any synthetic tool_result after an assistant(error) turn that has no matching tool_use. 2. Replace the (msg, code, already_yielded) 3-tuple carrying _HandledStreamError state out of the retry loop with a frozen _HandledErrorInfo dataclass that also carries `retryable`. The post-loop block now uses exc.retryable instead of hardcoding True, so a future _HandledStreamError(retryable=False, ...) won't silently write the wrong marker prefix. 3 new tests cover the rollback marker-stripping contract.

…o _InterruptedAttempt Previous revisions carried the failed-attempt state across three separate function-scope variables (last_attempt_partial, handled_error_info) + four module-level helpers (_rollback_attempt_capturing_partial, _restore_partial_with_error_marker, _flush_orphan_tool_uses_to_session, _append_error_marker). The retry loop mutated all three and the post-loop block reassembled the pieces by hand. Scattered and hard to follow. Collapse to one dataclass with capture / clear / finalize + one _classify_final_failure helper that picks the display message based on which failure flag the retry loop set (attempts_exhausted, transient_exhausted, stream_err, handled_error). Call sites: - success break: interrupted.clear() - _HandledStreamError: interrupted.capture(...); interrupted.handled_error = ... - Exception: interrupted.capture(...) - post-loop: final_msg, retryable = _classify_final_failure(interrupted, ...); interrupted.finalize(...) - outer except: interrupted.finalize(...) Behaviour is unchanged — same restore semantics, same StreamError sequencing, same transcript-upload skip, same orphan tool_use flush, same stale-marker stripping from b1172e2 / 5406fe9. The retry-scenarios suite (48 integration tests) plus the rewritten interrupted_partial_test (14 unit tests) both pass; the full SDK test suite (1012 tests) is green.

…final-failure emit CodeRabbit flagged that _flush_orphan_tool_uses_to_session (called from _InterruptedAttempt.finalize) used state.adapter._flush_unresolved_tool_calls with a # noqa: SLF001 suppressor. The private call mutates resolved_tool_calls and flips has_unresolved_tool_calls to False, which caused the downstream error-cleanup block at lines 4185-4200 to skip its own flush — UI spinners on the client stayed open until page refresh because no cleanup events were yielded after the early flush swallowed the unresolved state. Changes: - Rename _flush_unresolved_tool_calls → flush_unresolved_tool_calls (public) in response_adapter.py; update 3 internal call sites + 2 service.py sites. Drops the # noqa: SLF001 suppressor (no longer a private-access violation). - _flush_orphan_tool_uses_to_session and _InterruptedAttempt.finalize now return the list[StreamBaseResponse] produced by the flush so the caller yields them to the client instead of re-flushing. - Replace the three scattered post-loop error blocks (partial restore + redundant flush + stream_err yield + handled_error yield) with one consolidated block that: (a) calls _classify_final_failure → _FinalFailure, (b) yields finalize()'s events + _end_text_if_open, (c) yields one StreamError (unless handled_error.already_yielded=True). Fixes the double-flush skip-cleanup bug and eliminates duplicated error-text/code strings between history marker and SSE yield. - _classify_final_failure now returns _FinalFailure(display_msg, code, retryable) instead of a (msg, retryable) tuple — single source of truth for in-history marker + SSE event so they can't drift. Tests: +5 _classify_final_failure contract tests, +2 return-value assertions on finalize/orphan-flush. All 1022 SDK tests pass (was 1012).

…-limit Mode 1 (rate-limit at turn start, user message never persisted): the backend's `check_rate_limit` raises BEFORE `append_and_save_message`, so when a 429 fires the user's text only exists in the optimistic `useChat` bubble — refresh or even a successful retry would lose it. See `autogpt_platform/backend/backend/api/features/chat/routes.py:916-922` (rate-limit check) and `routes.py:945` (later append-and-save) — backend can't recover this on its own. New flow on 429: - drop the optimistic user bubble (since DB has no record of it), - push `lastSubmittedMsgRef.current` back into the composer via the existing `setInitialPrompt` slot — same path URL pre-fills use, so `useChatInput`'s `consumeInitialPrompt` effect picks it up automatically, - clear `lastSubmittedMsgRef` so dedup doesn't block re-send. In-memory only; refresh-survival is a separate follow-up.

Adds two cases that the existing 429 test did not exercise so codecov/patch clears the 80% threshold: 1. The setMessages updater is invoked but no-ops when the trailing message is not a user bubble (assistant reply already landed). 2. The composer-restore branch is skipped entirely when no unsent text was captured (lastSubmittedMsgRef is null at error time).

Replace `transcript_snap: object` + `# type: ignore[arg-type]` on the restore() call with a `TranscriptSnapshot` type alias exported from transcript_builder, so `_InterruptedAttempt.capture` is fully typed.

bot-mergemonkey · 2026-04-26T14:57:45Z

⚡ Risk Assessment — CRITICAL · ~45 min review

Focus areas: Partial-work capture/finalize contract across retry loop · Orphan tool-use flushing and event deduplication · Rate-limit recovery and optimistic bubble rollback · Error classification unification (history + SSE)

Assessment: Fixes data-loss regression (SECRT-2275) affecting user-visible chat history and error handling.

Walkthrough

When an SDK retry loop fails, the system now captures rolled-back messages via _InterruptedAttempt.capture() before the rollback, preserving what the user saw streaming. On final-failure exit, finalize() re-attaches the partial work, flushes any orphan tool_use blocks as synthetic tool_result rows, and appends an error marker — all persisted to session history and yielded to the client. On frontend 429 rate-limit, the unsent message text is restored to the composer and the optimistic user bubble is dropped, allowing the user to retry after the limit resets.

Changes

Files	Summary
Partial-Work Preservation on Retry Failure `autogpt_platform/backend/backend/copilot/sdk/service.py` `autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py`	Introduces _InterruptedAttempt dataclass to capture rolled-back messages + error state across retry loop. Implements capture() to preserve partial work before rollback, finalize() to re-attach on final failure, and _flush_orphan_tool_uses_to_session() to synthesize missing tool_result rows. Adds _classify_final_failure() to unify error classification for both history markers and SSE yields. Fixes SECRT-2275 regression where users saw empty turns after refresh.
Tool-Call Flushing Contract & Visibility `autogpt_platform/backend/backend/copilot/sdk/response_adapter.py`	Renamed _flush_unresolved_tool_calls to public flush_unresolved_tool_calls and updated all call sites. Enhanced docstring to clarify single-invocation contract: method mutates adapter state, so callers must share the returned event list to avoid stale UI elements (spinners) staying open.
Rate-Limit Recovery in Frontend `autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts` `autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/useCopilotStream.test.ts`	On 429 usage-limit error, restores unsent message text to composer via setInitialPrompt and drops the optimistic user bubble so user can edit/resend after rate-limit reset. Includes tests for text restoration, bubble removal, and edge cases (no captured text, non-user trailing message).
Type Clarity & Refactoring `autogpt_platform/backend/backend/copilot/transcript_builder.py`	Extracted TranscriptSnapshot type alias for snapshot/restore methods. Improves readability and enables reuse in _InterruptedAttempt.capture().

Sequence Diagram

sequenceDiagram
    participant RetryLoop as Retry Loop
    participant Attempt as _InterruptedAttempt
    participant Session as ChatSession
    participant Adapter as ResponseAdapter
    participant Client as Frontend
    
    RetryLoop->>Attempt: capture(session, builder, snap, pre_count)
    Attempt->>Attempt: strip trailing error markers
    Attempt->>Session: roll back messages to pre_count
    Attempt->>Attempt: store partial work
    
    Note over RetryLoop: Attempt fails, decide no retry
    
    RetryLoop->>Attempt: finalize(session, state, msg, retryable)
    Attempt->>Session: re-attach partial messages
    Attempt->>Adapter: flush_unresolved_tool_calls()
    Adapter->>Session: append synthetic tool_result rows
    Adapter-->>Attempt: return events
    Attempt->>Session: append error marker
    Attempt-->>RetryLoop: return events
    
    RetryLoop->>Client: yield events + StreamError
    Client->>Client: render partial work + error bubble
    
    Note over Client: On 429 rate-limit
    Client->>Client: restore unsent text to composer
    Client->>Client: drop optimistic user bubble

Dig Deeper With Commands

/review <file-path> <function-optional>
/chat <file-path> "<question>"
/roast <file-path>

Runs only when explicitly triggered.

majdyz and others added 8 commits April 25, 2026 08:16

refactor(backend/copilot): type transcript snapshot to drop type: ignore

220ca3d

Replace `transcript_snap: object` + `# type: ignore[arg-type]` on the restore() call with a `TranscriptSnapshot` type alias exported from transcript_builder, so `_InterruptedAttempt.capture` is fully typed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mirror] fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit#1

[mirror] fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit#1
yashwant86 wants to merge 8 commits intomm-base-12918from
mm-pr-12918

yashwant86 commented Apr 26, 2026 •

edited by bot-mergemonkey Bot

Loading

Uh oh!

bot-mergemonkey Bot commented Apr 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yashwant86 commented Apr 26, 2026 • edited by bot-mergemonkey Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by MergeMonkey

Uh oh!

bot-mergemonkey Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Dig Deeper With Commands

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yashwant86 commented Apr 26, 2026 •

edited by bot-mergemonkey Bot

Loading

bot-mergemonkey Bot commented Apr 26, 2026 •

edited

Loading