fix(chat): guard tool-output-denied clobber + harden nightly e2e jobs by threepointone · Pull Request #1818 · cloudflare/agents

threepointone · 2026-06-26T06:21:47Z

Summary

This PR fixes one real SDK bug surfaced by the nightly e2e suite and hardens three nightly e2e jobs that were failing or hanging in CI. The work came out of debugging three distinct nightly failures.

1. `fix(chat)`: don't let `tool-output-denied` clobber an approved/terminal tool part

applyChunkToParts in packages/agents/src/chat/message-builder.ts unconditionally flipped a tool part to output-denied on a tool-output-denied chunk. During an auto-continuation that re-validates the transcript, the AI SDK can legitimately emit tool-output-denied for an approval it now deems unneeded (e.g. a tool without needsApproval). That silently turned a granted approval (approval-responded) — or an already-settled part — into a denial in the persisted message.

The handler is now first-write-wins: a part already in output-available / output-error / output-denied / approval-responded is left untouched. This matches the guards already on the tool-input-* handlers and benefits both @cloudflare/ai-chat and @cloudflare/think. Ships as a patch changeset (agents).

To exercise a realistic approval flow, the ai-chat e2e gains a dedicated ClientToolApprovalAgent (with needsApproval: true) so the approval test is isolated from the plain getUserLocation client-tool test. New unit tests cover the guard for both approval-responded and terminal output-available parts.

2. `ci`: stop the browser-connector nightly from looping on a corrupted Chrome cache

The E2E: agents (browser connector) job kept hitting its 30-minute cap. Root cause: wrangler dev's Browser Rendering simulator repeatedly hit a corrupted Chrome cache caused by an extract-zip bug on Node 24.16+. Wrangler would clear + re-download, the Node bug re-corrupted it, and the loop exhausted the job.

Pin the job to Node 24.15.0 (via a new node-version input on the shared install action).
Cache ~/.cache/.wrangler/chrome to avoid repeated downloads.
bail: 1 in the browser-tests vitest config so the suite stops after the first full failure (incl. retries).
Wrap the run in timeout --kill-after=30s 20m as a wall-clock guard below the job cap.

3. `ci(codemode)`: split e2e into offline + Workers-AI jobs

A wrangler bump (4.105.0) changed remote-binding init from lazy to eager, so wrangler dev now opens a remote proxy session at startup. The codemode e2e's ai: { remote: true } binding therefore needed Cloudflare credentials the deterministic CI job didn't have, failing at boot.

Mirrors the ai-chat split:

wrangler.mock.jsonc (no ai binding) + playwright.config.ts runs the deterministic executor.spec.ts — no credentials needed (e2e-codemode).
playwright.llm.config.ts (remote ai binding, higher timeouts, CI retries) runs codemode.spec.ts + llm-codemode.spec.ts against real Workers AI in a new credentialed e2e-codemode-llm job.

Test plan

pnpm run typecheck — all 113 projects pass
ai-chat guard unit tests (client-tool-duplicate-message.test.ts) — 37/37 pass
LLM codemode e2e (test:e2e:llm) — 8/8 pass against real Workers AI locally
Nightly: e2e-agents-browser completes without looping on Chrome cache
Nightly: e2e-codemode (deterministic) runs without credentials
Nightly: e2e-codemode-llm authenticates and passes
Nightly: ai-chat tool-approval e2e passes against ClientToolApprovalAgent

Made with Cursor

…tool part An autoContinue continuation re-validates the transcript through the AI SDK, which denies an approval for a tool that doesn't declare needsApproval and emits a tool-output-denied chunk. applyChunkToParts then unconditionally flipped the persisted part to output-denied, silently turning a granted approval into a denial. Make the tool-output-denied handler first-write-wins: preserve parts already in output-available / output-error / output-denied, and never regress an approval-responded (user-approved) part. Matches the guards already on the tool-input-* handlers; benefits both @cloudflare/ai-chat and @cloudflare/think. Also harden the ai-chat e2e: add a dedicated ClientToolApprovalAgent whose tool declares needsApproval so the approval auto-continuation test exercises a real approval flow (and assert the SDK-issued approvalId), leaving the existing tool-result flow tests on the unchanged ClientToolAgent untouched. Co-authored-by: Cursor <cursoragent@cursor.com>

…rome cache The e2e-agents-browser job was cancelled at the 30m cap (no test failures reported) because Wrangler's local Browser Rendering simulator downloads its own Chrome via @puppeteer/browsers, whose extract-zip dependency silently aborts mid-extraction on Node 24.16+ (puppeteer/puppeteer#14957). The partial cache made Wrangler "clear and retry" the install on every test, and with retry:3 + a 120s test timeout the suite re-corrupted and re-downloaded in a loop until the job timed out. Fix it in layers: - Pin the job to Node 24.15.0 (pre-bug) via a new node-version input on the shared install action — the root cause. Other jobs keep the latest 24.x. - Cache Wrangler's ~/.cache/.wrangler/chrome so the one-time download is reused and stays out of the timed test window. - bail:1 in the browser vitest config + a 20m wall-clock guard on the step, so a future environmental stall fails fast WITH logs instead of a silent 30m cancel (mirrors the Playwright suites' globalTimeout convention). Co-authored-by: Cursor <cursoragent@cursor.com>

The e2e-codemode nightly job ran with no Cloudflare credentials, but the e2e wrangler.jsonc declares a remote `ai` binding. Wrangler 4.105.0 (recent bump) eagerly opens a remote proxy session at `wrangler dev` startup for any remote: true binding, which needs credentials — so the dev server exited 1 and the whole Playwright run failed before any test, with no actionable report. Older Wrangler connected the binding lazily, so the offline executor specs that never call AI used to boot fine. Finish the split that ai-chat already has: - wrangler.mock.jsonc: AI-free config for the offline executor.spec.ts. - playwright.config.ts: deterministic suite (executor.spec.ts) on the mock config — stays the credential-free gating e2e-codemode job. - playwright.llm.config.ts + test:e2e:llm: the Workers-AI specs (codemode.spec.ts via /run, llm-codemode.spec.ts via /run-multi) on wrangler.jsonc, with a globalTimeout and CI retries, run by a new e2e-codemode-llm job that gets CLOUDFLARE_* credentials. Verified: the deterministic suite passes 21/21 offline with CI=1 and no creds. Co-authored-by: Cursor <cursoragent@cursor.com>

changeset-bot · 2026-06-26T06:21:51Z

🦋 Changeset detected

Latest commit: 2a27101

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
agents	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration

Devin Review found 3 potential issues.

pkg-pr-new · 2026-06-26T06:31:56Z

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1818

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1818

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1818

create-think

npm i https://pkg.pr.new/create-think@1818

hono-agents

npm i https://pkg.pr.new/hono-agents@1818

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1818

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1818

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1818

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1818

commit: 2a27101

…tool-approval-request The server-side applyChunkToParts guard only protected the persisted message; a replayed tool-output-denied still flowed through _storeStreamChunk + _broadcastChatMessage, where AI SDK v6's in-place updateToolPart would regress an approval-responded/terminal tool part to output-denied on the client (and replay it on reconnect). Extend isReplayChunk to drop tool-output-denied for parts already settled or approval-responded, mirroring the apply-side guard. Also add the missing first-write-wins guard to the tool-approval-request handler: a continuation replaying a prior tool round-trip could regress an already-approval-responded (or settled) part back to approval-requested, discarding the user's decision. Harden the approval e2e to send CF_AGENT_TOOL_APPROVAL on tool-approval-request (carrying the real approvalId) rather than racing it on tool-input-available. Adds unit coverage for both guards. Co-authored-by: Cursor <cursoragent@cursor.com>

…y buffer Same broadcast-path gap as tool-output-denied: the applyChunkToParts guard only protects the persisted message, so a replayed tool-approval-request still flowed through _storeStreamChunk + _broadcastChatMessage and reverted an approved tool back to approval-requested on the client (re-showing Approve/Reject), replaying on reconnect. Extend isReplayChunk to drop tool-approval-request for parts already approval-responded or settled, mirroring the tool-output-denied branch. Adds unit coverage for the new isReplayChunk branch. Co-authored-by: Cursor <cursoragent@cursor.com>

threepointone and others added 3 commits June 26, 2026 07:19

threepointone mentioned this pull request Jun 26, 2026

failing nightly e2e tests #1810

Closed

devin-ai-integration Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread packages/agents/src/chat/message-builder.ts

Comment thread packages/agents/src/chat/message-builder.ts

Comment thread packages/ai-chat/e2e/client-tools.spec.ts Outdated

threepointone and others added 2 commits June 26, 2026 08:01

threepointone merged commit fa7bfec into main Jun 26, 2026
7 checks passed

threepointone deleted the fix/tool-output-denied-clobber-approval branch June 26, 2026 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(chat): guard tool-output-denied clobber + harden nightly e2e jobs#1818

fix(chat): guard tool-output-denied clobber + harden nightly e2e jobs#1818
threepointone merged 5 commits into
mainfrom
fix/tool-output-denied-clobber-approval

threepointone commented Jun 26, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

changeset-bot Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

threepointone commented Jun 26, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. fix(chat): don't let tool-output-denied clobber an approved/terminal tool part

2. ci: stop the browser-connector nightly from looping on a corrupted Chrome cache

3. ci(codemode): split e2e into offline + Workers-AI jobs

Test plan

Uh oh!

changeset-bot Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

threepointone commented Jun 26, 2026 •

edited by devin-ai-integration Bot

Loading

1. `fix(chat)`: don't let `tool-output-denied` clobber an approved/terminal tool part

2. `ci`: stop the browser-connector nightly from looping on a corrupted Chrome cache

3. `ci(codemode)`: split e2e into offline + Workers-AI jobs

changeset-bot Bot commented Jun 26, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 26, 2026 •

edited

Loading