Skip to content

fix(voice): rebuild playback bridge after idle to stop slow new turns#1816

Merged
cjol merged 2 commits into
mainfrom
fix/voice-playback-bridge-reuse
Jun 26, 2026
Merged

fix(voice): rebuild playback bridge after idle to stop slow new turns#1816
cjol merged 2 commits into
mainfrom
fix/voice-playback-bridge-reuse

Conversation

@cjol

@cjol cjol commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

This PR fixes assistant speech playing back slow on a new turn after an idle gap in @cloudflare/voice. It was reported against a booth demo built on the voice agent: ask two separate questions with a pause between them and the second reply plays in slow motion, then re-converges to normal speed over the turn.

Why

  • VoiceClient routes assistant playback through a MediaStreamAudioDestinationNode -> HTMLAudioElement bridge so a selected output device can be honored via HTMLMediaElement.setSinkId. That bridge is a live playout path with its own clock.
  • Reusing the same element for a new turn after it has sat idle through the gap between turns makes it resume the fresh burst at the wrong rate, audible as slow-motion that re-converges over the turn. A freshly created element does not show this. Decode and scheduling were ruled out: the decoded buffer is correct and scheduled at the right time. The artifact lives in the element's playout and is not observable from JS.
  • Keeping the element warm with continuous silence between turns was considered: it removed the slow-down but introduced its own start-of-turn artifacts, so rebuilding a fresh element per turn was chosen instead.
  • Routing the default sink straight to ctx.destination (skipping the bridge) was also considered and rejected here: the existing design intentionally routes all playback through the bridge so runtime setOutputDevice() keeps working, and the per-turn rebuild fixes every sink on its own. That lower-latency routing can be a separate change.

Code Changes

  • packages/voice/src/voice-client.ts: track the end time of the last scheduled chunk in #lastPlaybackEnd. In #playAudio, before resolving the destination, tear the bridge down if it exists, has fully drained (#scheduledSources.size === 0), and has been idle past 0.3s; #getPlaybackDestination then builds a fresh element for the turn. The drained guard means this never fires mid-turn, since chunks within a turn keep a source scheduled on the cursor. #lastPlaybackEnd is reset in #stopPlayback.
  • Regression tests cover within-turn reuse, after-idle rebuild, and sub-threshold reuse. They assert the rebuild mechanism, since the audible speed itself is not observable from JS.

Open in Devin Review

Assistant playback routes through a MediaStreamAudioDestinationNode ->
HTMLAudioElement bridge so a selected output device can be honored. That
bridge is a live playout path, and reusing the element for a new turn after
it sat idle between turns made the new turn resume at the wrong rate (audible
as slow-motion that re-converges over the turn).

Tear the bridge down and rebuild it once it has fully drained and been idle
past a short threshold, so each turn plays through a fresh element. The
scheduledSources.size === 0 guard keeps this from firing mid-turn.

Adds regression tests covering within-turn reuse, after-idle rebuild, and
sub-threshold reuse.
@changeset-bot

changeset-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 5ee11a8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/voice Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Jun 25, 2026

Copy link
Copy Markdown

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1816

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1816

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1816

create-think

npm i https://pkg.pr.new/create-think@1816

hono-agents

npm i https://pkg.pr.new/hono-agents@1816

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1816

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1816

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1816

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1816

commit: 5ee11a8

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment thread packages/voice/src/voice-client.ts Outdated
this.#isPlaying = false;
this.#isScheduling = false;
this.#playbackCursor = 0;
this.#lastPlaybackEnd = null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Bridge rebuild is skipped after playback interrupts, which may reproduce the same slow-playback symptom

After a playback_interrupt or user-transcript interrupt, #stopPlayback() resets #lastPlaybackEnd to null (voice-client.ts:1003). This means the idle rebuild condition at voice-client.ts:928 can never fire for the next turn, because the third guard (this.#lastPlaybackEnd !== null) is false. The bridge stays alive.

If the HTMLAudioElement slow-playback issue also manifests when the bridge goes idle after an abrupt source.stop() (interrupt) rather than a natural drain, then the same audible symptom could reappear after: interrupt → long pause → new assistant turn.

This may be intentional — interrupts typically lead to new audio quickly, and the abrupt stop may not trigger the same internal buffering drift. But it's worth verifying empirically that the fix covers the interrupt-then-long-pause scenario.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and it's a real gap rather than intentional. An interrupt leaves the bridge idle the same way a natural drain does, so interrupt -> long pause -> new turn could reproduce the slow playback. Resetting #lastPlaybackEnd to null in #stopPlayback defeated the rebuild guard for that next turn.

Fixed in 5ee11a8: instead of nulling it, #stopPlayback now records the audio-clock time at the interrupt as the idle start (#lastPlaybackEnd = this.#audioContext?.currentTime ?? null), so the next turn's rebuild check still fires. Using the interrupt time rather than the cut chunk's scheduled end matters too, since the chunk was stopped early. Added a regression test covering interrupt -> idle gap -> new turn.

#stopPlayback reset #lastPlaybackEnd to null, which disabled the post-idle
bridge rebuild for the next turn (the guard requires a non-null value). An
interrupt leaves the bridge idle just like a natural drain, so interrupt ->
long pause -> new turn could reproduce the slow playback. Record the audio
clock time at interrupt as the idle start instead, so the next turn still
rebuilds. Adds a regression test for the interrupt-then-idle path.

@threepointone threepointone left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! land when you feel ready, want this to go out today.

@cjol cjol merged commit f18ff01 into main Jun 26, 2026
5 checks passed
@cjol cjol deleted the fix/voice-playback-bridge-reuse branch June 26, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants