[mirror] fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit#1
[mirror] fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit#1yashwant86 wants to merge 8 commits intomm-base-12918from
Conversation
…failure exit SECRT-2275 — when an SDK turn was interrupted (transient API errors with exhausted retries, mid-stream LLM exceptions, or context-overflow with all attempts exhausted) the retry loop's pre-decision rollback discarded the assistant's partial work (text + tool calls + reasoning) that had been incrementally appended to session.messages during the failed attempt. Users described it as "the turn is gone": their UI streamed tokens live, then a refresh showed an empty turn and the next message would prompt the model to "continue" with no context, so it picked an unrelated old task. Fix: capture the rolled-back partial in the retry-loop exception handlers and re-attach it via a single helper on every final-failure branch (including the events_yielded > 0 path that previously skipped the error marker entirely and the non-context-non-transient + attempts-exhausted paths). Synthesize "interrupted" tool_result rows for any orphan tool_use so the next turn's LLM context stays API-valid. Successful retry breaks clear the captured partial so attempt #1's rolled-back content doesn't leak into a successful attempt #2's history. Baseline path already preserves partial via its existing finally block; only SDK was affected.
…block type-check The inline _restore_partial_with_error_marker calls across five retry-loop branches pushed stream_chat_completion_sdk past pyright's complexity heuristic (CI type-check failed on main). Consolidate into a single post-loop block keyed off ended_with_stream_error + the existing attempts_exhausted / transient_exhausted / stream_err flags, plus a new handled_error_info tuple that carries _HandledStreamError's final-yield decision out of the retry loop. Behaviour is unchanged — same restore semantics, same client-facing StreamError sequencing, same transcript-upload skip. Confirmed with 319 existing + new tests (backend/copilot/sdk + baseline). Pyright still bails on the function body (1500 LoC — the retry loop with context-overflow fallback + transient backoff + partial-work preservation shares too much state across branches to split cleanly without hurting readability). A file-targeted reportGeneralTypeIssues suppression covers the complexity bailout while keeping real type errors elsewhere in the file surfaced.
…ial + carry retryable through _HandledStreamError Two fixes layered on the partial-restore path introduced by this PR: 1. _rollback_attempt_capturing_partial now drops trailing error markers (COPILOT_ERROR_PREFIX / COPILOT_RETRYABLE_ERROR_PREFIX) from the captured partial. _run_stream_attempt's idle-timeout and circuit-breaker paths append a marker via _append_error_marker BEFORE raising _HandledStreamError; without this filter the post-loop restore would replay the stale marker and then add a fresh one, leaving duplicate error bubbles and pushing any synthetic tool_result after an assistant(error) turn that has no matching tool_use. 2. Replace the (msg, code, already_yielded) 3-tuple carrying _HandledStreamError state out of the retry loop with a frozen _HandledErrorInfo dataclass that also carries `retryable`. The post-loop block now uses exc.retryable instead of hardcoding True, so a future _HandledStreamError(retryable=False, ...) won't silently write the wrong marker prefix. 3 new tests cover the rollback marker-stripping contract.
…o _InterruptedAttempt Previous revisions carried the failed-attempt state across three separate function-scope variables (last_attempt_partial, handled_error_info) + four module-level helpers (_rollback_attempt_capturing_partial, _restore_partial_with_error_marker, _flush_orphan_tool_uses_to_session, _append_error_marker). The retry loop mutated all three and the post-loop block reassembled the pieces by hand. Scattered and hard to follow. Collapse to one dataclass with capture / clear / finalize + one _classify_final_failure helper that picks the display message based on which failure flag the retry loop set (attempts_exhausted, transient_exhausted, stream_err, handled_error). Call sites: - success break: interrupted.clear() - _HandledStreamError: interrupted.capture(...); interrupted.handled_error = ... - Exception: interrupted.capture(...) - post-loop: final_msg, retryable = _classify_final_failure(interrupted, ...); interrupted.finalize(...) - outer except: interrupted.finalize(...) Behaviour is unchanged — same restore semantics, same StreamError sequencing, same transcript-upload skip, same orphan tool_use flush, same stale-marker stripping from b1172e2 / 5406fe9. The retry-scenarios suite (48 integration tests) plus the rewritten interrupted_partial_test (14 unit tests) both pass; the full SDK test suite (1012 tests) is green.
…final-failure emit CodeRabbit flagged that _flush_orphan_tool_uses_to_session (called from _InterruptedAttempt.finalize) used state.adapter._flush_unresolved_tool_calls with a # noqa: SLF001 suppressor. The private call mutates resolved_tool_calls and flips has_unresolved_tool_calls to False, which caused the downstream error-cleanup block at lines 4185-4200 to skip its own flush — UI spinners on the client stayed open until page refresh because no cleanup events were yielded after the early flush swallowed the unresolved state. Changes: - Rename _flush_unresolved_tool_calls → flush_unresolved_tool_calls (public) in response_adapter.py; update 3 internal call sites + 2 service.py sites. Drops the # noqa: SLF001 suppressor (no longer a private-access violation). - _flush_orphan_tool_uses_to_session and _InterruptedAttempt.finalize now return the list[StreamBaseResponse] produced by the flush so the caller yields them to the client instead of re-flushing. - Replace the three scattered post-loop error blocks (partial restore + redundant flush + stream_err yield + handled_error yield) with one consolidated block that: (a) calls _classify_final_failure → _FinalFailure, (b) yields finalize()'s events + _end_text_if_open, (c) yields one StreamError (unless handled_error.already_yielded=True). Fixes the double-flush skip-cleanup bug and eliminates duplicated error-text/code strings between history marker and SSE yield. - _classify_final_failure now returns _FinalFailure(display_msg, code, retryable) instead of a (msg, retryable) tuple — single source of truth for in-history marker + SSE event so they can't drift. Tests: +5 _classify_final_failure contract tests, +2 return-value assertions on finalize/orphan-flush. All 1022 SDK tests pass (was 1012).
…-limit
Mode 1 (rate-limit at turn start, user message never persisted): the
backend's `check_rate_limit` raises BEFORE `append_and_save_message`, so
when a 429 fires the user's text only exists in the optimistic `useChat`
bubble — refresh or even a successful retry would lose it.
See `autogpt_platform/backend/backend/api/features/chat/routes.py:916-922`
(rate-limit check) and `routes.py:945` (later append-and-save) — backend
can't recover this on its own.
New flow on 429:
- drop the optimistic user bubble (since DB has no record of it),
- push `lastSubmittedMsgRef.current` back into the composer via the
existing `setInitialPrompt` slot — same path URL pre-fills use, so
`useChatInput`'s `consumeInitialPrompt` effect picks it up
automatically,
- clear `lastSubmittedMsgRef` so dedup doesn't block re-send.
In-memory only; refresh-survival is a separate follow-up.
Adds two cases that the existing 429 test did not exercise so codecov/patch clears the 80% threshold: 1. The setMessages updater is invoked but no-ops when the trailing message is not a user bubble (assistant reply already landed). 2. The composer-restore branch is skipped entirely when no unsent text was captured (lastSubmittedMsgRef is null at error time).
Replace `transcript_snap: object` + `# type: ignore[arg-type]` on the restore() call with a `TranscriptSnapshot` type alias exported from transcript_builder, so `_InterruptedAttempt.capture` is fully typed.
⚡ Risk Assessment —
|
| Files | Summary |
|---|---|
Partial-Work Preservation on Retry Failureautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py |
Introduces _InterruptedAttempt dataclass to capture rolled-back messages + error state across retry loop. Implements capture() to preserve partial work before rollback, finalize() to re-attach on final failure, and _flush_orphan_tool_uses_to_session() to synthesize missing tool_result rows. Adds _classify_final_failure() to unify error classification for both history markers and SSE yields. Fixes SECRT-2275 regression where users saw empty turns after refresh. |
Tool-Call Flushing Contract & Visibilityautogpt_platform/backend/backend/copilot/sdk/response_adapter.py |
Renamed _flush_unresolved_tool_calls to public flush_unresolved_tool_calls and updated all call sites. Enhanced docstring to clarify single-invocation contract: method mutates adapter state, so callers must share the returned event list to avoid stale UI elements (spinners) staying open. |
Rate-Limit Recovery in Frontendautogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.tsautogpt_platform/frontend/src/app/(platform)/copilot/__tests__/useCopilotStream.test.ts |
On 429 usage-limit error, restores unsent message text to composer via setInitialPrompt and drops the optimistic user bubble so user can edit/resend after rate-limit reset. Includes tests for text restoration, bubble removal, and edge cases (no captured text, non-user trailing message). |
Type Clarity & Refactoringautogpt_platform/backend/backend/copilot/transcript_builder.py |
Extracted TranscriptSnapshot type alias for snapshot/restore methods. Improves readability and enables reuse in _InterruptedAttempt.capture(). |
Sequence Diagram
sequenceDiagram
participant RetryLoop as Retry Loop
participant Attempt as _InterruptedAttempt
participant Session as ChatSession
participant Adapter as ResponseAdapter
participant Client as Frontend
RetryLoop->>Attempt: capture(session, builder, snap, pre_count)
Attempt->>Attempt: strip trailing error markers
Attempt->>Session: roll back messages to pre_count
Attempt->>Attempt: store partial work
Note over RetryLoop: Attempt fails, decide no retry
RetryLoop->>Attempt: finalize(session, state, msg, retryable)
Attempt->>Session: re-attach partial messages
Attempt->>Adapter: flush_unresolved_tool_calls()
Adapter->>Session: append synthetic tool_result rows
Adapter-->>Attempt: return events
Attempt->>Session: append error marker
Attempt-->>RetryLoop: return events
RetryLoop->>Client: yield events + StreamError
Client->>Client: render partial work + error bubble
Note over Client: On 429 rate-limit
Client->>Client: restore unsent text to composer
Client->>Client: drop optimistic user bubble
Dig Deeper With Commands
/review <file-path> <function-optional>/chat <file-path> "<question>"/roast <file-path>
Runs only when explicitly triggered.
Mirror of upstream Significant-Gravitas#12918 for benchmark. Do not merge.
Summary by MergeMonkey