Skip to content

Feat(ai-gemini): Gemini Realtime Adapter#405

Open
nikas-belogolov wants to merge 11 commits intoTanStack:mainfrom
nikas-belogolov:feat/gemini-realtime-chat
Open

Feat(ai-gemini): Gemini Realtime Adapter#405
nikas-belogolov wants to merge 11 commits intoTanStack:mainfrom
nikas-belogolov:feat/gemini-realtime-chat

Conversation

@nikas-belogolov
Copy link
Copy Markdown
Contributor

@nikas-belogolov nikas-belogolov commented Mar 29, 2026

🎯 Changes

  • Added realtime ephemeral token generation

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

  • New Features
    • Gemini realtime provider added and selectable for live audio conversations with voice support and a new default voice.
    • Tools UI now enabled for Gemini as well as OpenAI.
    • Realtime sessions surface new status/usage/go_away events and accept provider-specific session options (voice, modalities, tuning).
    • Built-in microphone capture, audio playback, interruption controls, and audio visualization for realtime sessions.
  • Chores
    • Package dependency updates for Gemini integration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Gemini realtime support: new ai-gemini realtime token/adapter/types and MediaHandler, updates shared realtime APIs (connect signature, session config), introduces a realtime event emitter, wires examples to Gemini, and updates package metadata and exports.

Changes

Cohort / File(s) Summary
Changeset
.changeset/huge-lizards-admire.md
Adds a changeset bumping @tanstack/ai-gemini (minor) noting Gemini Realtime Adapter addition.
Example app
examples/ts-react-chat/src/lib/use-realtime.ts, examples/ts-react-chat/src/routes/realtime.tsx
Add 'gemini' provider option, wire geminiRealtime / geminiRealtimeToken, extend Provider type, accept Gemini voice and default voice for Gemini to Puck.
ai-client runtime & types
packages/typescript/ai-client/src/realtime-client.ts, packages/typescript/ai-client/src/realtime-types.ts
Make adapter.connect accept required config: RealtimeSessionConfig; pass rich session config into adapter.connect; add providerOptions, onUsage, and onGoAway to RealtimeClientOptions and propagate usage/go_away events.
ai package — event types & emitter
packages/typescript/ai/src/realtime/types.ts, packages/typescript/ai/src/realtime/event-emitter.ts, packages/typescript/ai/src/realtime/index.ts, packages/typescript/ai/src/index.ts
Add outputSchema to RealtimeToolConfig; add go_away and usage events/payloads; implement createRealtimeEventEmitter() and export it from realtime index and package root.
ai-gemini package metadata
packages/typescript/ai-gemini/package.json
Bump @google/genai version, add @tanstack/ai-client as peer/dev dependency, and bump vite.
ai-gemini public exports
packages/typescript/ai-gemini/src/index.ts, packages/typescript/ai-gemini/src/realtime/index.ts
Re-export geminiRealtime, geminiRealtimeToken, and realtime types from package root and realtime entrypoint.
ai-gemini realtime types & token
packages/typescript/ai-gemini/src/realtime/types.ts, packages/typescript/ai-gemini/src/realtime/token.ts
Add Gemini realtime types (models, voices, provider options) and geminiRealtimeToken factory creating ephemeral tokens via @google/genai.
ai-gemini realtime adapter
packages/typescript/ai-gemini/src/realtime/adapter.ts
New Gemini RealtimeAdapter implementing connect, audio capture/streaming, session event translation (status, go_away, usage, transcripts, tool_call, message_complete, interrupted), sendText/sendImage/sendToolResult/interrupt, and getAudioVisualization.
ai-gemini media handling
packages/typescript/ai-gemini/src/realtime/media-handler.ts
New MediaHandler for microphone capture, AudioWorklet-based PCM framing, downsampling to 16kHz, playback scheduling, and input/output analysers/visualization.
Other adapters updated
packages/typescript/ai-openai/src/realtime/adapter.ts, packages/typescript/ai-elevenlabs/src/realtime/adapter.ts
Normalize connect signature to accept RealtimeSessionConfig and wire createRealtimeEventEmitter() for event handling; ElevenLabs drops unused options param.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Browser Client
  participant Server as App Server
  participant GeminiAuth as Google GenAI Auth
  participant GeminiLive as Google GenAI Live
  participant AudioHW as Microphone/Audio

  Client->>Server: Request realtime token (provider=gemini, options)
  Server->>GeminiAuth: client.authTokens.create(model, expireTime, modalities)
  GeminiAuth-->>Server: ephemeral token (name, expireTime)
  Server-->>Client: RealtimeToken (token, expiresAt, config)

  Client->>GeminiLive: ai.live.connect(token, RealtimeSessionConfig)
  GeminiLive-->>Client: session open / status events
  AudioHW->>Client: capture PCM frames (AudioWorklet → MediaHandler)
  Client->>GeminiLive: sendRealtimeInput({ audio: { data, mimeType } })
  GeminiLive-->>Client: transcript / tool_call / message parts
  Client->>Client: assemble message, emit message_complete / usage / go_away
  GeminiLive-->>Client: binary audio chunks
  Client->>AudioHW: decode & schedule playback (AudioBufferSourceNode)
  Client->>GeminiLive: sendText / sendImage / sendToolResult / interrupt
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

"A rabbit tapped keys in the moonlit glen,
Spun tokens and voices to stitch streams again.
Mics hummed softly, messages leapt,
Puck sang, the playback gently kept.
Hop, code, hop — realtime dreams begin!"

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is incomplete; it only lists 'Added realtime ephemeral token generation' without explaining the broader Gemini Realtime Adapter implementation or addressing the core changes. Expand the description to detail the adapter implementation, media handling, event emission, integration with ai-client, and any architectural decisions discussed (e.g., websocket connection patterns).
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Feat(ai-gemini): Gemini Realtime Adapter' accurately and concisely summarizes the main change—adding Gemini realtime support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nikas-belogolov
Copy link
Copy Markdown
Contributor Author

nikas-belogolov commented Mar 29, 2026

There should be some discussion around how to implement the websocket connection, as there could be client-server (straight to the realtime API) connection, or client-server-server, with the web server (e.g. tanstack start) being a proxy

@nx-cloud
Copy link
Copy Markdown

nx-cloud bot commented Mar 29, 2026

🤖 Nx Cloud AI Fix Eligible

An automatically generated fix could have helped fix failing tasks for this run, but Self-healing CI is disabled for this workspace. Visit workspace settings to enable it and get automatic fixes in future runs.

To disable these notifications, a workspace admin can disable them in workspace settings.


View your CI Pipeline Execution ↗ for commit f40f0c8

Command Status Duration Result
nx affected --targets=test:sherif,test:knip,tes... ❌ Failed 2m 47s View ↗
nx run-many --targets=build --exclude=examples/** ❌ Failed 48s View ↗

☁️ Nx Cloud last updated this comment at 2026-03-29 10:28:30 UTC

@nikas-belogolov nikas-belogolov marked this pull request as ready for review April 13, 2026 19:46
@nikas-belogolov nikas-belogolov requested a review from a team April 13, 2026 19:46
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/typescript/ai-client/src/realtime-client.ts (1)

532-540: ⚠️ Potential issue | 🟡 Minor

Inconsistent tool mapping: applySessionConfig omits outputSchema.

The tool mapping in connect() (lines 104-114) includes both inputSchema and outputSchema, but applySessionConfig() only includes inputSchema. This inconsistency could lead to missing output schemas when the session is updated after initial connection.

🐛 Proposed fix for consistency
     const toolsConfig = tools
       ? Array.from(this.clientTools.values()).map((t) => ({
           name: t.name,
           description: t.description,
           inputSchema: t.inputSchema
             ? convertSchemaToJsonSchema(t.inputSchema)
             : undefined,
+          outputSchema: t.outputSchema
+            ? convertSchemaToJsonSchema(t.outputSchema)
+            : undefined,
         }))
       : undefined
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-client/src/realtime-client.ts` around lines 532 - 540,
The applySessionConfig mapping for tools omits outputSchema, causing
inconsistency with connect(); update applySessionConfig (the code building
toolsConfig from this.clientTools.values()) to include outputSchema the same way
inputSchema is handled by calling convertSchemaToJsonSchema on t.outputSchema
when present, so toolsConfig contains both inputSchema and outputSchema
(mirroring the mapping in connect()).
🧹 Nitpick comments (3)
packages/typescript/ai-client/src/realtime-client.ts (1)

99-102: Remove commented-out code.

This dead code should be removed to keep the codebase clean.

🧹 Proposed removal
-      // const toolsList =
-      //   this.clientTools.size > 0
-      //     ? Array.from(this.clientTools.values())
-      //     : undefined
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-client/src/realtime-client.ts` around lines 99 - 102,
Remove the dead/commented-out block that defines toolsList in realtime-client.ts
(the lines referencing toolsList and this.clientTools) — delete the commented
code entirely so no leftover commented declarations remain; ensure there are no
other references to the removed snippet (search for toolsList and usages of
this.clientTools in the surrounding code) and run lint/format to keep the file
clean.
packages/typescript/ai-gemini/src/index.ts (1)

86-90: Consider exporting Gemini realtime types from the main entry point.

The ./realtime/index module exports types (GeminiRealtimeModel, GeminiRealtimeTokenOptions, GeminiRealtimeOptions) that aren't re-exported here. Other adapters (text, summarize, image, tts) export their configuration types from the main entry point for consumer convenience.

♻️ Proposed addition for type exports
 // Realtime adapter
 export {
   geminiRealtime,
   geminiRealtimeToken,
 } from './realtime/index'
+export type {
+  GeminiRealtimeModel,
+  GeminiRealtimeTokenOptions,
+  GeminiRealtimeOptions,
+} from './realtime/index'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/index.ts` around lines 86 - 90, Export the
realtime types from the main entry point so consumers can import them alongside
geminiRealtime and geminiRealtimeToken; add a type-only re-export like "export
type { GeminiRealtimeModel, GeminiRealtimeTokenOptions, GeminiRealtimeOptions }
from './realtime/index'" in the same file that currently exports geminiRealtime
and geminiRealtimeToken. Ensure you use a type-only export to avoid runtime
bundle changes and reference the exact type names GeminiRealtimeModel,
GeminiRealtimeTokenOptions, and GeminiRealtimeOptions so IDEs and consumers can
import them directly.
examples/ts-react-chat/src/routes/realtime.tsx (1)

279-293: Consider enabling additional configuration options for Gemini.

The tools indicator now correctly appears for Gemini. However, other session configuration options (output mode, temperature, semantic eagerness) remain OpenAI-only in the UI, but the underlying RealtimeSessionConfig supports these for all providers. Consider enabling some of these controls for Gemini if the Gemini Live API supports them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/ts-react-chat/src/routes/realtime.tsx` around lines 279 - 293, The
UI currently gates several session configuration controls to provider ===
'openai'; update those conditionals to also include 'gemini' so that controls
for output mode, temperature, and semanticEagerness are shown when provider ===
'gemini' (same as how you added the tools indicator for Gemini). Locate the
conditional checks and UI blocks in the realtime component that reference
provider (and the controls bound to RealtimeSessionConfig such as outputMode,
temperature, semanticEagerness) and extend their logic to allow 'gemini'; ensure
the form bindings still map to RealtimeSessionConfig fields so changes propagate
to the session payload.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/ts-react-chat/src/lib/use-realtime.ts`:
- Around line 65-69: The Gemini branch is using shared OpenAI defaults (voice:
'alloy' and unconstrained responseModalities) which Gemini Live rejects; update
the geminiRealtime() path so it supplies Gemini-compatible defaults: set voice
to a valid Gemini prebuilt name (e.g., 'Puck' or another supported name) instead
of 'alloy', and set responseModalities to an explicit single-element array
(e.g., [Modality.AUDIO]) for the session; ensure geminiRealtime (or its caller
in useRealtime) overrides the shared defaults and enforces only one modality so
the Gemini connection will succeed.

In `@packages/typescript/ai-client/src/realtime-types.ts`:
- Around line 28-34: The ElevenLabs realtime adapter's connect function still
uses the old signature with clientToolDefs?: ReadonlyArray<AnyClientTool>, which
no longer matches the RealtimeAdapter.connect signature (token: RealtimeToken,
config: RealtimeSessionConfig) and causes type errors; update the connect method
in packages/typescript/ai-elevenlabs/src/realtime/adapter.ts (the function named
connect or the class implementing RealtimeAdapter) to accept (token:
RealtimeToken, config: RealtimeSessionConfig) and remove the legacy
clientToolDefs parameter, update any internal usages/calls within that adapter
to read configuration from the provided RealtimeSessionConfig, and ensure the
exported adapter type still satisfies RealtimeAdapter.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Around line 96-112: liveConfig is missing transcription flags so Gemini won't
populate inputTranscription/outputTranscription; update the LiveConnectConfig
object built in the liveConfig variable to include inputAudioTranscription: {}
and outputAudioTranscription: {} (use the same config.providerOptions merge
pattern so you don't overwrite existing keys) so the API will emit transcription
data that the adapter reads when handling inputTranscription/outputTranscription
events.
- Around line 466-473: The sendImage function currently calls
session.sendRealtimeInput with a non-supported media property; update sendImage
to pass the image under the modality-specific video field instead of media or
deprecated mediaChunks (e.g., call session.sendRealtimeInput({ video: { /*
include the image bytes and mimeType as the frame payload */ } })). Locate
sendImage and replace the media object with a video object containing the image
data and mimeType in the API's expected frame/payload shape so the
session.sendRealtimeInput call uses video rather than media or mediaChunks.
- Around line 179-185: The convertFloat32ToInt16 function currently returns
buf.toString() (comma-separated integers); change it to produce a base64-encoded
string of the raw 16-bit PCM bytes suitable for the Gemini Live API audio.data
field (e.g., MIME audio/pcm;rate=16000). Convert the Float32Array to an
Int16Array (clamping and scaling as done now), then create a Uint8Array view
over the Int16Array's buffer (ensuring correct endianness), encode that byte
array to base64, and return the base64 string so audio.data contains raw PCM
bytes rather than CSV integers.
- Around line 188-190: The GoogleGenAI client is being constructed with an
ephemeral token (token.token) but missing the v1alpha API version; update the
initialization of GoogleGenAI (the ai instance) to include httpOptions: {
apiVersion: 'v1alpha' } when using ephemeral tokens so Live API requests use the
v1alpha endpoint, e.g., detect where new GoogleGenAI({ apiKey: token.token }) is
created and add the httpOptions.apiVersion flag.

In `@packages/typescript/ai-gemini/src/realtime/token.ts`:
- Around line 41-42: The default model string is invalid for the
GeminiRealtimeModel union: change the default assigned to the local variable
model (currently set via options.model ?? 'gemini-live-2.5-flash-native-audio')
to one of the allowed union values (e.g., 'gemini-3.1-flash-live-preview' or
'gemini-2.5-flash-native-audio-preview-12-2025') so the inferred type matches
GeminiRealtimeModel; update the expression using options.model to fall back to a
valid GeminiRealtimeModel literal.

---

Outside diff comments:
In `@packages/typescript/ai-client/src/realtime-client.ts`:
- Around line 532-540: The applySessionConfig mapping for tools omits
outputSchema, causing inconsistency with connect(); update applySessionConfig
(the code building toolsConfig from this.clientTools.values()) to include
outputSchema the same way inputSchema is handled by calling
convertSchemaToJsonSchema on t.outputSchema when present, so toolsConfig
contains both inputSchema and outputSchema (mirroring the mapping in connect()).

---

Nitpick comments:
In `@examples/ts-react-chat/src/routes/realtime.tsx`:
- Around line 279-293: The UI currently gates several session configuration
controls to provider === 'openai'; update those conditionals to also include
'gemini' so that controls for output mode, temperature, and semanticEagerness
are shown when provider === 'gemini' (same as how you added the tools indicator
for Gemini). Locate the conditional checks and UI blocks in the realtime
component that reference provider (and the controls bound to
RealtimeSessionConfig such as outputMode, temperature, semanticEagerness) and
extend their logic to allow 'gemini'; ensure the form bindings still map to
RealtimeSessionConfig fields so changes propagate to the session payload.

In `@packages/typescript/ai-client/src/realtime-client.ts`:
- Around line 99-102: Remove the dead/commented-out block that defines toolsList
in realtime-client.ts (the lines referencing toolsList and this.clientTools) —
delete the commented code entirely so no leftover commented declarations remain;
ensure there are no other references to the removed snippet (search for
toolsList and usages of this.clientTools in the surrounding code) and run
lint/format to keep the file clean.

In `@packages/typescript/ai-gemini/src/index.ts`:
- Around line 86-90: Export the realtime types from the main entry point so
consumers can import them alongside geminiRealtime and geminiRealtimeToken; add
a type-only re-export like "export type { GeminiRealtimeModel,
GeminiRealtimeTokenOptions, GeminiRealtimeOptions } from './realtime/index'" in
the same file that currently exports geminiRealtime and geminiRealtimeToken.
Ensure you use a type-only export to avoid runtime bundle changes and reference
the exact type names GeminiRealtimeModel, GeminiRealtimeTokenOptions, and
GeminiRealtimeOptions so IDEs and consumers can import them directly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 26ce7bc3-6e10-47ce-adc7-516c0d88c602

📥 Commits

Reviewing files that changed from the base of the PR and between a8a4465 and dadfae9.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (13)
  • .changeset/huge-lizards-admire.md
  • examples/ts-react-chat/src/lib/use-realtime.ts
  • examples/ts-react-chat/src/routes/realtime.tsx
  • packages/typescript/ai-client/src/realtime-client.ts
  • packages/typescript/ai-client/src/realtime-types.ts
  • packages/typescript/ai-gemini/package.json
  • packages/typescript/ai-gemini/src/index.ts
  • packages/typescript/ai-gemini/src/realtime/adapter.ts
  • packages/typescript/ai-gemini/src/realtime/index.ts
  • packages/typescript/ai-gemini/src/realtime/token.ts
  • packages/typescript/ai-gemini/src/realtime/types.ts
  • packages/typescript/ai-openai/src/realtime/adapter.ts
  • packages/typescript/ai/src/realtime/types.ts

Comment thread examples/ts-react-chat/src/lib/use-realtime.ts
Comment thread packages/typescript/ai-client/src/realtime-types.ts
Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts
Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts Outdated
Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts
Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts
Comment thread packages/typescript/ai-gemini/src/realtime/token.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/typescript/ai-elevenlabs/src/realtime/adapter.ts (1)

35-47: ⚠️ Potential issue | 🟠 Major

Wire options and config parameters to session initialization.

The connect() method accepts both options (from elevenlabsRealtime()) and _config (RealtimeSessionConfig) but passes neither to createElevenLabsConnection(), causing caller-provided configuration to be silently dropped. While ElevenLabsRealtimeOptions includes meaningful fields (connectionMode, debug), and the Gemini adapter demonstrates proper config wiring, the ElevenLabs implementation ignores both parameters. At minimum, rename _config to config to reflect its intended use and thread it through to session creation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-elevenlabs/src/realtime/adapter.ts` around lines 35 -
47, The connect implementation of elevenlabsRealtime is dropping the
caller-provided ElevenLabsRealtimeOptions and RealtimeSessionConfig; rename the
unused parameter _config to config in the connect signature and thread both
options (from elevenlabsRealtime) and config (RealtimeSessionConfig) into
createElevenLabsConnection so session initialization receives
connectionMode/debug and session config; update the call sites inside
elevenlabsRealtime.connect to pass (token, config, options, clientToolDefs) or
the equivalent parameter order expected by createElevenLabsConnection and adjust
createElevenLabsConnection invocation accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/typescript/ai-elevenlabs/src/realtime/adapter.ts`:
- Around line 35-47: The connect implementation of elevenlabsRealtime is
dropping the caller-provided ElevenLabsRealtimeOptions and
RealtimeSessionConfig; rename the unused parameter _config to config in the
connect signature and thread both options (from elevenlabsRealtime) and config
(RealtimeSessionConfig) into createElevenLabsConnection so session
initialization receives connectionMode/debug and session config; update the call
sites inside elevenlabsRealtime.connect to pass (token, config, options,
clientToolDefs) or the equivalent parameter order expected by
createElevenLabsConnection and adjust createElevenLabsConnection invocation
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9ecd1b6-aed5-4bb5-81bc-c287adfd2e67

📥 Commits

Reviewing files that changed from the base of the PR and between dadfae9 and 0711441.

📒 Files selected for processing (5)
  • packages/typescript/ai-client/src/realtime-client.ts
  • packages/typescript/ai-client/src/realtime-types.ts
  • packages/typescript/ai-elevenlabs/src/realtime/adapter.ts
  • packages/typescript/ai-gemini/src/realtime/adapter.ts
  • packages/typescript/ai-openai/src/realtime/adapter.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/typescript/ai-client/src/realtime-client.ts
  • packages/typescript/ai-client/src/realtime-types.ts
  • packages/typescript/ai-openai/src/realtime/adapter.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
packages/typescript/ai-gemini/src/realtime/adapter.ts (2)

218-220: ⚠️ Potential issue | 🟠 Major

Initialize GoogleGenAI with v1alpha for ephemeral tokens.

Gemini’s ephemeral tokens are Live-only and the official docs require the v1alpha API version when using them. Without that, the SDK can hit the wrong websocket/API path and fail to connect. (ai.google.dev)

🔧 Proposed fix
   const ai = new GoogleGenAI({
-    apiKey: token.token
-  });
+    apiKey: token.token,
+    httpOptions: {
+      apiVersion: 'v1alpha',
+    },
+  })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 218 -
220, The GoogleGenAI client is initialized without the API version so ephemeral
tokens can target the wrong websocket/API path; update the GoogleGenAI
instantiation (the new GoogleGenAI({ apiKey: token.token }) call) to include the
v1alpha API version required for ephemeral tokens (e.g., add the
apiVersion/version option set to "v1alpha") so the SDK uses the correct
live-only endpoint when using token.token.

209-215: ⚠️ Potential issue | 🔴 Critical

Encode PCM as base64 bytes, not CSV.

buf.toString() produces comma-separated integers, but Gemini Live expects audio.data to be base64-encoded raw 16-bit PCM bytes. In the current form microphone audio will be rejected or decoded as garbage. (ai.google.dev)

🐛 Proposed fix
 function convertFloat32ToInt16(buffer: Float32Array) {
-  let l = buffer.length;
-  const buf = new Int16Array(l);
-  while (l--) {
-    buf[l] = Math.min(1, Math.max(-1, buffer[l]!)) * 0x7fff;
-  }
-  return buf.toString();
+  const pcm = new Int16Array(buffer.length)
+  for (let i = 0; i < buffer.length; i++) {
+    pcm[i] = Math.min(1, Math.max(-1, buffer[i]!)) * 0x7fff
+  }
+
+  const bytes = new Uint8Array(pcm.buffer)
+  let binary = ''
+  for (const byte of bytes) {
+    binary += String.fromCharCode(byte)
+  }
+  return btoa(binary)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 209 -
215, convertFloat32ToInt16 currently returns CSV via buf.toString(); update it
to produce base64-encoded raw 16-bit PCM bytes instead: convert the Float32Array
samples into an Int16Array (clamped to -1..1 and scaled by 0x7fff) as done in
convertFloat32ToInt16, then create a Uint8Array view over the Int16Array.buffer
(ensure little-endian PCM ordering) and return a base64 string of those raw
bytes (e.g. Buffer.from(uint8Array).toString('base64') in Node or equivalent in
browsers); ensure the function still accepts a Float32Array and returns the
base64 audio.data string expected by Gemini Live.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Around line 107-113: The code destructures provider-specific fields from
config.providerOptions without guarding for it; update the destructuring in
adapter.ts so it handles an undefined RealtimeSessionConfig.providerOptions by
using a safe fallback (e.g., null-coalescing or an early guard) before
extracting languageCode, contextWindowCompression, proactivity,
enableAffectiveDialog, and thinkingConfig; ensure the change references
config.providerOptions and GeminiRealtimeProviderOptions and preserves types
while providing sensible defaults or returning early when providerOptions is
absent.
- Around line 254-281: The transcription guard incorrectly checks
inputTranscription.finished/outputTranscription.finished (Gemini never sends
`finished`), causing transcript events to be skipped; update the checks to only
verify presence of text (inputTranscription.text/outputTranscription.text) and
use response.serverContent?.turnComplete to determine finality (pass that value
as isFinal), and preserve the currentMode switch to 'thinking' and
emit('mode_change') when appropriate; locate this logic around
inputTranscription/outputTranscription and emit('transcript') to apply the
change.
- Around line 512-520: The sendToolResult implementation in sendToolResult is
sending functionResponses as a single object but the Gemini Live API expects an
array of FunctionResponse objects; update the call to session.sendToolResponse
so that the functionResponses property is an array (e.g., [ { id: callId,
response: { result } } ]) and adjust typing if necessary to match the
FunctionResponse[] shape used elsewhere.
- Around line 242-247: The code is incorrectly encoding response.data as UTF-8;
instead extract the base64 PCM from
response.serverContent.modelTurn.parts[].inlineData.data, decode it to binary
and pass the resulting ArrayBuffer to playIncomingAudioChunk; implement a helper
like base64ToArrayBuffer(base64: string) that uses atob and Uint8Array to
produce an ArrayBuffer, and replace the textEncoder.encode(response.data).buffer
call in the adapter (where playIncomingAudioChunk is invoked) with
base64ToArrayBuffer(response.serverContent.modelTurn.parts[i].inlineData.data)
(ensuring you handle the correct part index and null checks).

---

Duplicate comments:
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Around line 218-220: The GoogleGenAI client is initialized without the API
version so ephemeral tokens can target the wrong websocket/API path; update the
GoogleGenAI instantiation (the new GoogleGenAI({ apiKey: token.token }) call) to
include the v1alpha API version required for ephemeral tokens (e.g., add the
apiVersion/version option set to "v1alpha") so the SDK uses the correct
live-only endpoint when using token.token.
- Around line 209-215: convertFloat32ToInt16 currently returns CSV via
buf.toString(); update it to produce base64-encoded raw 16-bit PCM bytes
instead: convert the Float32Array samples into an Int16Array (clamped to -1..1
and scaled by 0x7fff) as done in convertFloat32ToInt16, then create a Uint8Array
view over the Int16Array.buffer (ensure little-endian PCM ordering) and return a
base64 string of those raw bytes (e.g.
Buffer.from(uint8Array).toString('base64') in Node or equivalent in browsers);
ensure the function still accepts a Float32Array and returns the base64
audio.data string expected by Gemini Live.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90dd861d-bc24-41c4-b3c2-d27c79ea301b

📥 Commits

Reviewing files that changed from the base of the PR and between 0711441 and 5bd8d46.

📒 Files selected for processing (7)
  • examples/ts-react-chat/src/lib/use-realtime.ts
  • packages/typescript/ai-gemini/src/index.ts
  • packages/typescript/ai-gemini/src/realtime/adapter.ts
  • packages/typescript/ai-gemini/src/realtime/index.ts
  • packages/typescript/ai-gemini/src/realtime/token.ts
  • packages/typescript/ai-gemini/src/realtime/types.ts
  • packages/typescript/ai/src/realtime/types.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/typescript/ai-gemini/src/realtime/index.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • packages/typescript/ai-gemini/src/index.ts
  • examples/ts-react-chat/src/lib/use-realtime.ts
  • packages/typescript/ai-gemini/src/realtime/token.ts
  • packages/typescript/ai-gemini/src/realtime/types.ts

Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts Outdated
Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts Outdated
Comment on lines +254 to +281
if (
inputTranscription &&
inputTranscription.text != undefined &&
inputTranscription.finished != undefined
) {
if (inputTranscription.finished && currentMode !== 'thinking') {
currentMode = 'thinking'
emit('mode_change', { mode: 'thinking' })
}

emit('transcript', {
isFinal: inputTranscription.finished,
transcript: inputTranscription.text,
role: 'user',
})
}

if (
outputTranscription &&
outputTranscription.text != undefined &&
outputTranscription.finished != undefined
) {
emit('transcript', {
isFinal: outputTranscription.finished,
transcript: outputTranscription.text,
role: 'assistant',
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat packages/typescript/ai-gemini/src/realtime/adapter.ts | head -300 | tail -100

Repository: TanStack/ai

Length of output: 2809


🏁 Script executed:

wc -l packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 112


🏁 Script executed:

sed -n '240,290p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 1560


🏁 Script executed:

# Find type definition for BidiGenerateContentTranscription
rg "BidiGenerateContentTranscription" --type ts -A 5

Repository: TanStack/ai

Length of output: 37


🏁 Script executed:

# Check imports to understand where the type comes from
sed -n '1,50p' packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 1310


🏁 Script executed:

# Find `@google/genai` types
find . -name "*.d.ts" -o -name "*.ts" | xargs rg "inputTranscription|outputTranscription" -B 3 -A 3 2>/dev/null | head -100

Repository: TanStack/ai

Length of output: 3365


🏁 Script executed:

# Check if there are any type definition files or documentation in node_modules
find node_modules/@google/genai -name "*.d.ts" 2>/dev/null | head -5

Repository: TanStack/ai

Length of output: 37


🏁 Script executed:

# Look for type definitions in the codebase
rg "type.*Transcription|interface.*Transcription" --type ts

Repository: TanStack/ai

Length of output: 8312


🏁 Script executed:

# Search for any documentation or comments about the transcription payload
rg "finished" packages/typescript/ai-gemini/src/realtime/adapter.ts -B 3 -A 3

Repository: TanStack/ai

Length of output: 900


🏁 Script executed:

# Check package.json to see the Google GenAI version
cat packages/typescript/ai-gemini/package.json | grep -A 2 -B 2 "@google/genai"

Repository: TanStack/ai

Length of output: 139


🌐 Web query:

Google GenAI live API BidiGenerateContentTranscription payload schema finished field

💡 Result:

The BidiGenerateContentTranscription message in Google's Gemini Live API (part of the GenAI bidirectional streaming API) is used for transcribing audio input or output in real-time conversations over WebSockets. It appears in BidiGenerateContentServerContent as inputTranscription or outputTranscription. Schema (from official protobuf definitions and API reference): - text: string (Transcription text. Output only.) No "finished" field exists in the core protobuf schema for BidiGenerateContentTranscription across Gemini Developer API, Vertex AI, and client libraries (Dart, JS SDKs). Official docs confirm only the 'text' field. Note: Some SDKs (e.g., JS genai) define a 'finished: bool' field expecting it to signal transcription turn completion, but it's not sent by the server (per open GitHub issue #1429, dated 2026-03-23). Developers must use alternative signals like turnComplete: true or outputTranscription arrival. Vertex AI docs mention a 'finished' bool ("The bool indicates the end of the transcription"), but this appears undocumented/inconsistent with protobuf; primary Gemini API docs do not include it. To enable transcriptions, include empty AudioTranscriptionConfig objects in BidiGenerateContentSetup: input_audio_transcription: {} or output_audio_transcription: {} (no fields; languageCodes supported in some backends like Vertex AI). Primary sources: Google AI for Developers Live API reference (ai.google.dev/api/live), protobuf (github.com/googleapis/googleapis), Vertex AI multimodal-live reference.

Citations:


🌐 Web query:

site:ai.google.dev live API transcription payload structure

💡 Result:

The most prominent and up-to-date (as of 2026) live API for transcription is Google's Gemini Live API, which supports real-time audio input over WebSockets and provides input/output transcriptions. It uses a stateful WebSocket connection at wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent. Key payload structures from official docs: 1. Initial setup message (BidiGenerateContentSetup, sent first after connection): { "model": "string", // e.g., "gemini-3.1-flash-live-preview" "generationConfig": { "candidateCount": integer, "maxOutputTokens": integer, "temperature": number, "topP": number, "topK": integer, "presencePenalty": number, "frequencyPenalty": number, "responseModalities": ["AUDIO"], // or others "speechConfig": object }, "inputAudioTranscription": {}, // Enables input transcription (empty config) "outputAudioTranscription": {}, // Enables output transcription // Optional: systemInstruction, tools, sessionResumption, etc. } Clients wait for server SetupComplete before sending more. 2. Real-time input for live audio (BidiGenerateContentRealtimeInput): Sent as JSON over WS for streaming audio chunks: { "realtimeInput": { "mediaChunks": [ { "data": "base64-encoded-raw-16bit-PCM-audio-chunk", "mimeType": "audio/pcm;rate=16000" // Specify sample rate } ] } } Audio must be raw little-endian 16-bit PCM (input up to 16kHz, output 24kHz). Multiple chunks can be sent concurrently with text/video. 3. Server responses include transcriptions independently: - inputTranscription: { "text": "transcribed user input" } - outputTranscription: { "text": "transcribed model output" } Other messages: clientContent for text/context, toolResponse for functions. For dedicated STT (not conversational), Google Cloud Speech-to-Text uses gRPC/REST streamingRecognize with audio chunks in StreamingRecognizeRequest, but Gemini Live API is the current "live API transcription" focus per recent docs. Examples: - Python SDK: session.send_realtime_input(audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")) - JS: session.sendRealtimeInput({ audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' } })

Citations:


🏁 Script executed:

rg "turnComplete|finished" packages/typescript/ai-gemini/src/realtime/adapter.ts

Repository: TanStack/ai

Length of output: 385


🏁 Script executed:

# Check if there are any test files or issues documented
find packages/typescript/ai-gemini -name "*.test.ts" -o -name "*.spec.ts" | head -5

Repository: TanStack/ai

Length of output: 226


Remove the finished field checks—Gemini never sends this field in transcription payloads.

The server's BidiGenerateContentTranscription message contains only a text field. The finished field does not exist in the official Google GenAI API schema, so the guards at lines 254–281 always fail and transcript events never fire.

Use response.serverContent?.turnComplete to signal transcription completion instead, as already done elsewhere in the code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 254 -
281, The transcription guard incorrectly checks
inputTranscription.finished/outputTranscription.finished (Gemini never sends
`finished`), causing transcript events to be skipped; update the checks to only
verify presence of text (inputTranscription.text/outputTranscription.text) and
use response.serverContent?.turnComplete to determine finality (pass that value
as isFinal), and preserve the currentMode switch to 'thinking' and
emit('mode_change') when appropriate; locate this logic around
inputTranscription/outputTranscription and emit('transcript') to apply the
change.

Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (4)
packages/typescript/ai/src/realtime/event-emitter.ts (1)

27-29: Optional: drop empty Sets from the map on the last unsubscribe.

Once every handler for an event is removed, an empty Set stays in eventHandlers forever. Not a real leak in practice, but easy to clean up while you're here.

♻️ Proposed tweak
       return () => {
-        eventHandlers.get(event)!.delete(handler)
+        const handlers = eventHandlers.get(event)
+        if (!handlers) return
+        handlers.delete(handler)
+        if (handlers.size === 0) {
+          eventHandlers.delete(event)
+        }
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/realtime/event-emitter.ts` around lines 27 - 29,
The unsubscribe lambda currently removes the handler from the Set but leaves an
empty Set in eventHandlers; update the returned function in the
subscribe/unsubscribe logic to check eventHandlers.get(event) after deletion and
if the Set is empty call eventHandlers.delete(event) so the map doesn't retain
empty Sets (refer to the eventHandlers map and the returned () => {
eventHandlers.get(event)!.delete(handler) } closure).
packages/typescript/ai-gemini/src/realtime/media-handler.ts (2)

211-221: Silently swallowing stop() errors is fine, but tighten the catch.

Empty catches with a bound binding will trip common lint rules. Prefer catch { /* already stopped */ }:

     this.scheduledSources.forEach((s) => {
       try {
         s.stop();
-      } catch (e) { }
+      } catch { /* already stopped */ }
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/media-handler.ts` around lines 211
- 221, In stopAudioPlayback(), replace the empty catch with a catch block that
has no binding to avoid lint errors: when iterating over scheduledSources and
calling s.stop(), change catch (e) { } to catch { /* already stopped */ } so you
still swallow errors but satisfy linters; keep the rest of the logic (clearing
scheduledSources and updating this.nextStartTime from
this.audioContext.currentTime) unchanged.

226-237: Dead code: getInputLevel() is unused.

The inputLevel getter (Line 262) uses calculateLevel(this.inputAnalyser) instead of this method, so getInputLevel() is never called. Drop it or consolidate with calculateLevel to avoid two divergent level formulas.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/media-handler.ts` around lines 226
- 237, The getInputLevel() method is dead code because the inputLevel getter
uses calculateLevel(this.inputAnalyser); either remove getInputLevel() or
consolidate its logic into calculateLevel to avoid divergent formulas—update
calculateLevel to use the same averaging logic (Uint8Array frequencyBinCount,
sum / (length * 255)) if you prefer that implementation and then delete
getInputLevel(), or replace the inputLevel getter to call getInputLevel()
instead; adjust/remove the unused getInputLevel symbol accordingly so only one
canonical level-calculation function remains.
packages/typescript/ai-gemini/src/realtime/adapter.ts (1)

118-118: sessionResumptionUpdate is written but never read.

let sessionResumptionUpdate is assigned inside onmessage but nothing ever consumes it — the session resumption TODO implies it's a placeholder. Either drop it until the feature lands or expose it through the connection so consumers can persist it.

Also applies to: 151-154

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` at line 118, The
variable sessionResumptionUpdate (type LiveServerSessionResumptionUpdate) is
assigned inside the onmessage handler but never consumed; either remove it until
the feature is implemented or surface it to consumers so it can be persisted.
Fix by either deleting the declaration and assignments around
sessionResumptionUpdate in adapter.ts (and related unused code at the other
locations), or attach the value to the connection object (e.g.,
connection.sessionResumptionUpdate) or emit it from the onmessage handler via an
existing event/callback so callers can read/persist it; update references in
onmessage and any helper functions that set sessionResumptionUpdate accordingly
to use the chosen approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Around line 342-375: The output analyser is never created because
MediaHandler.setupOutputAudioAnalysis() is not invoked, so
outputAnalyser/outputGainNode remain null and playAudio bypasses the analyser;
update the adapter to call mediaHandler.setupOutputAudioAnalysis() during the
connection setup (the same place setupInputAudioAnalysis() is called) so the
output AnalyserNode and gain node are created before playback; ensure this
invocation occurs before any playAudio() calls that rely on outputAnalyser.
- Line 231: Replace the loose equality check in the realtime adapter where you
compare the model turn's role—change the condition using
response.serverContent.modelTurn?.role == 'model' to use strict equality (===)
so the check in the adapter.ts (the branch that examines
response.serverContent.modelTurn?.role) is consistent with the codebase and
avoids lint warnings.
- Around line 264-274: The race condition occurs because
mediaHandler.startAudio(...) is not awaited before calling
mediaHandler.setupInputAudioAnalysis(), causing setupInputAudioAnalysis() to
early-return when this.mediaStream is still null; update the flow so
startAudio(...) is awaited (await mediaHandler.startAudio(...)) before calling
await mediaHandler.setupInputAudioAnalysis(), and move the initial microphone
start from connect() into an explicit startAudioCapture() method so microphone
permission is only requested when startAudioCapture()/startAudio() is invoked;
ensure getAudioVisualization() consumers run after setupInputAudioAnalysis()
completes so inputLevel/inputFrequencyData/inputTimeDomainData are valid.
- Line 241: Remove the stray debug console.log(part) call—locate the
console.log(part) line in the realtime adapter's incoming-part handler (the
function handling model turn parts / transcription parts) and delete it so the
browser console is no longer spammed; leave the surrounding logic (including the
outputTranscription.finished / text checks) intact and just remove that debug
print statement.
- Around line 265-272: The browser runtime throws because Buffer is Node-only;
replace the Buffer.from(...).toString("base64") call in the
mediaHandler.startAudio callback with a browser-safe ArrayBuffer→base64 helper
(e.g., add arrayBufferToBase64(buffer: ArrayBuffer): string to media-handler.ts
or a util and import it) and call session.sendRealtimeInput({ audio: { data:
arrayBufferToBase64(data), mimeType: 'audio/pcm;rate=16000' } }); reuse the
existing symmetry with convertBase64ToArrayBuffer and ensure the helper uses
Uint8Array + chunked String.fromCharCode + btoa to avoid TOOBIG issues in
browsers.
- Around line 156-158: Inside the if (response.usageMetadata) block, emit the
realtime 'usage' event using the mapped token counts: read
response.usageMetadata.promptTokenCount,
response.usageMetadata.responseTokenCount, and
response.usageMetadata.totalTokenCount and call the adapter's emitter (e.g.,
realtime.emit or this.emit depending on the surrounding code) with an object
shaped { promptTokens: promptTokenCount, completionTokens: responseTokenCount,
totalTokens: totalTokenCount }; ensure you reference the existing symbols
response and response.usageMetadata and emit the event name 'usage' so the base
realtime types receive the expected payload.

In `@packages/typescript/ai-gemini/src/realtime/media-handler.ts`:
- Around line 1-32: The module currently creates workletBlob and workletUrl at
import time (workletCode -> new Blob(...) and URL.createObjectURL(...)), which
throws in non-browser environments; move creation of the Blob and the call to
URL.createObjectURL into a new helper (e.g., getWorkletUrl()) that constructs
and returns the object URL lazily, and call getWorkletUrl() from
initializeAudio() so workletBlob/workletUrl are only created when running in a
browser runtime where Blob and URL are defined.

---

Nitpick comments:
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Line 118: The variable sessionResumptionUpdate (type
LiveServerSessionResumptionUpdate) is assigned inside the onmessage handler but
never consumed; either remove it until the feature is implemented or surface it
to consumers so it can be persisted. Fix by either deleting the declaration and
assignments around sessionResumptionUpdate in adapter.ts (and related unused
code at the other locations), or attach the value to the connection object
(e.g., connection.sessionResumptionUpdate) or emit it from the onmessage handler
via an existing event/callback so callers can read/persist it; update references
in onmessage and any helper functions that set sessionResumptionUpdate
accordingly to use the chosen approach.

In `@packages/typescript/ai-gemini/src/realtime/media-handler.ts`:
- Around line 211-221: In stopAudioPlayback(), replace the empty catch with a
catch block that has no binding to avoid lint errors: when iterating over
scheduledSources and calling s.stop(), change catch (e) { } to catch { /*
already stopped */ } so you still swallow errors but satisfy linters; keep the
rest of the logic (clearing scheduledSources and updating this.nextStartTime
from this.audioContext.currentTime) unchanged.
- Around line 226-237: The getInputLevel() method is dead code because the
inputLevel getter uses calculateLevel(this.inputAnalyser); either remove
getInputLevel() or consolidate its logic into calculateLevel to avoid divergent
formulas—update calculateLevel to use the same averaging logic (Uint8Array
frequencyBinCount, sum / (length * 255)) if you prefer that implementation and
then delete getInputLevel(), or replace the inputLevel getter to call
getInputLevel() instead; adjust/remove the unused getInputLevel symbol
accordingly so only one canonical level-calculation function remains.

In `@packages/typescript/ai/src/realtime/event-emitter.ts`:
- Around line 27-29: The unsubscribe lambda currently removes the handler from
the Set but leaves an empty Set in eventHandlers; update the returned function
in the subscribe/unsubscribe logic to check eventHandlers.get(event) after
deletion and if the Set is empty call eventHandlers.delete(event) so the map
doesn't retain empty Sets (refer to the eventHandlers map and the returned () =>
{ eventHandlers.get(event)!.delete(handler) } closure).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bc0e9977-a973-4d8e-b952-e8572ac35784

📥 Commits

Reviewing files that changed from the base of the PR and between 5bd8d46 and 9287a62.

📒 Files selected for processing (9)
  • packages/typescript/ai-elevenlabs/src/realtime/adapter.ts
  • packages/typescript/ai-gemini/src/index.ts
  • packages/typescript/ai-gemini/src/realtime/adapter.ts
  • packages/typescript/ai-gemini/src/realtime/media-handler.ts
  • packages/typescript/ai-openai/src/realtime/adapter.ts
  • packages/typescript/ai/src/index.ts
  • packages/typescript/ai/src/realtime/event-emitter.ts
  • packages/typescript/ai/src/realtime/index.ts
  • packages/typescript/ai/src/realtime/types.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/typescript/ai-gemini/src/index.ts
  • packages/typescript/ai-openai/src/realtime/adapter.ts

Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts Outdated
Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts Outdated
}

for (const part of response.serverContent.modelTurn.parts || []) {
console.log(part)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove debug console.log.

Leftover console.log(part) will spam the browser console for every model turn part.

🧹 Proposed fix
-            for (const part of response.serverContent.modelTurn.parts || []) {
-              console.log(part)
-
-              if (part.inlineData?.data && outputTranscription?.finished && outputTranscription.text) {
+            for (const part of response.serverContent.modelTurn.parts || []) {
+              if (part.inlineData?.data && outputTranscription?.text) {

Note: the outputTranscription.finished check here has the same issue as the transcription guards above — once that's fixed, this condition simplifies to just checking for text.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` at line 241, Remove
the stray debug console.log(part) call—locate the console.log(part) line in the
realtime adapter's incoming-part handler (the function handling model turn parts
/ transcription parts) and delete it so the browser console is no longer
spammed; leave the surrounding logic (including the outputTranscription.finished
/ text checks) intact and just remove that debug print statement.

Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts
Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts Outdated
Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts
Comment on lines +1 to +32
const workletCode = `
class PCMProcessor extends AudioWorkletProcessor {
constructor() {
super();
this.bufferSize = 4096;
this.buffer = new Float32Array(this.bufferSize);
this.bufferIndex = 0;
}

process(inputs, outputs, parameters) {
const input = inputs[0];
if (!input || !input.length) return true;

const channelData = input[0];

for (let i = 0; i < channelData.length; i++) {
this.buffer[this.bufferIndex++] = channelData[i];

if (this.bufferIndex >= this.bufferSize) {
this.port.postMessage(this.buffer);
this.bufferIndex = 0;
}
}

return true;
}
}

registerProcessor("pcm-processor", PCMProcessor);
`
const workletBlob = new Blob([workletCode], { type: 'application/javascript' })
const workletUrl = URL.createObjectURL(workletBlob)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Module-level Blob + URL.createObjectURL breaks non-browser evaluation.

These lines execute at import time. If this module is ever imported in an environment without DOM globals — SSR (TanStack Start, Next RSC), Node test runners, static analysis — it will throw ReferenceError: Blob is not defined / URL.createObjectURL is not defined before any guard has a chance to run. Defer both to initializeAudio().

🛠️ Proposed fix
 const workletCode = `…`
-const workletBlob = new Blob([workletCode], { type: 'application/javascript' })
-const workletUrl = URL.createObjectURL(workletBlob)
+
+let workletUrl: string | null = null
+function getWorkletUrl(): string {
+  if (!workletUrl) {
+    const blob = new Blob([workletCode], { type: 'application/javascript' })
+    workletUrl = URL.createObjectURL(blob)
+  }
+  return workletUrl
+}

Then use getWorkletUrl() inside initializeAudio().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/media-handler.ts` around lines 1 -
32, The module currently creates workletBlob and workletUrl at import time
(workletCode -> new Blob(...) and URL.createObjectURL(...)), which throws in
non-browser environments; move creation of the Blob and the call to
URL.createObjectURL into a new helper (e.g., getWorkletUrl()) that constructs
and returns the object URL lazily, and call getWorkletUrl() from
initializeAudio() so workletBlob/workletUrl are only created when running in a
browser runtime where Blob and URL are defined.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (5)
packages/typescript/ai-gemini/src/realtime/adapter.ts (5)

237-237: ⚠️ Potential issue | 🟡 Minor

Use strict equality (===).

-          if (response.serverContent.modelTurn?.role == 'model') {
+          if (response.serverContent.modelTurn?.role === 'model') {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` at line 237, Replace
the loose equality in the conditional that checks
response.serverContent.modelTurn?.role == 'model' with strict equality (use ===)
so the check becomes response.serverContent.modelTurn?.role === 'model'; update
the conditional inside the adapter handling code where
response.serverContent.modelTurn is inspected to use === to avoid type-coercion
issues.

246-256: ⚠️ Potential issue | 🟡 Minor

Remove console.log(part) and the outputTranscription?.finished dependency.

Leftover debug log on line 247 will spam the console for every part. Additionally, line 249 still gates on outputTranscription?.finished, inheriting the same reliability issue noted above for transcription; once finished is removed, this simplifies to checking outputTranscription?.text.

🧹 Proposed fix
             for (const part of response.serverContent.modelTurn.parts || []) {
-              console.log(part)
-
-              if (part.inlineData?.data && outputTranscription?.finished && outputTranscription.text) {
+              if (part.inlineData?.data && outputTranscription?.text) {
                 message.parts.push({
                   type: "audio",
                   transcript: outputTranscription.text,
                   audioData: mediaHandler.convertBase64ToArrayBuffer(part.inlineData.data),
                 })
               }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 246 -
256, Remove the leftover debug console.log in the loop over
response.serverContent.modelTurn.parts and stop gating audio part creation on
outputTranscription?.finished; instead, when iterating parts
(response.serverContent.modelTurn.parts) check that part.inlineData?.data and
outputTranscription?.text are present, then push the audio part into
message.parts using
mediaHandler.convertBase64ToArrayBuffer(part.inlineData.data) and transcript
outputTranscription.text (update the conditional used in the block that
currently references outputTranscription?.finished).

174-201: ⚠️ Potential issue | 🟠 Major

Transcription guard still depends on the non-guaranteed finished field.

The Gemini server does not reliably populate finished on BidiGenerateContentTranscription (only text is in the official schema). The current inputTranscription.finished != undefined / outputTranscription.finished != undefined gate will skip transcript emission entirely when the server omits that field. Consider emitting on text presence and using response.serverContent?.turnComplete to drive isFinal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 174 -
201, The current guards in adapter.ts rely on inputTranscription.finished /
outputTranscription.finished which the Gemini server may omit; instead, emit
transcripts whenever inputTranscription.text or outputTranscription.text exists
and determine isFinal from response.serverContent?.turnComplete; for input
handling, keep the currentMode switch (set currentMode = 'thinking' and
emit('mode_change', { mode: 'thinking' })) only when
response.serverContent?.turnComplete is true (or when turnComplete indicates
completion) and emit('transcript', { isFinal:
Boolean(response.serverContent?.turnComplete), transcript:
inputTranscription.text, role: 'user' }) (and similarly for outputTranscription
with role: 'assistant') so transcripts are not skipped when finished is absent.

325-334: ⚠️ Potential issue | 🔴 Critical

functionResponses must be an array per the Gemini Live API schema.

BidiGenerateContentToolResponse.functionResponses is typed as FunctionResponse[]. Sending a single object will fail server-side tool result matching. Also consider including name — some SDK paths require it.

     sendToolResult(callId: string, result: string) {
       session.sendToolResponse({
-        functionResponses: {
-          id: callId,
-          response: {
-            result
-          }
-        }
+        functionResponses: [
+          {
+            id: callId,
+            response: { result },
+          },
+        ],
       })
     },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 325 -
334, sendToolResult constructs a tool response as a single object but Gemini
Live expects BidiGenerateContentToolResponse.functionResponses to be an array;
update sendToolResult to call session.sendToolResponse with functionResponses:
[{ id: callId, name: <toolName>, response: { result } }] (add a toolName
parameter to sendToolResult if not available) so the payload is an array and
includes the name field required by some SDK paths; ensure the signature of
sendToolResult and any callers are updated to provide the tool name and that
types align with FunctionResponse[].

270-280: ⚠️ Potential issue | 🔴 Critical

Critical: startAudio isn't awaited and setupOutputAudioAnalysis is never invoked; Buffer is Node-only.

Three still-unresolved problems in this block:

  1. mediaHandler.startAudio(...) is async — not awaiting it means setupInputAudioAnalysis() on line 280 runs before this.mediaStream is set and early-returns, leaving inputAnalyser permanently null.
  2. setupOutputAudioAnalysis() is never called, so outputAnalyser/outputGainNode remain null and every playAudio() call bypasses the analyser chain — output visualization will always be empty.
  3. Buffer.from(data).toString("base64") will throw ReferenceError: Buffer is not defined in browsers (Vite doesn't polyfill Node globals). This adapter is browser-only (uses getUserMedia/AudioContext).
🛠️ Proposed fix
-  // Request microphone access
-  mediaHandler.startAudio((data) => {
-    session.sendRealtimeInput({
-      audio: {
-        data: Buffer.from(data).toString("base64"),
-        mimeType: 'audio/pcm;rate=16000'
-      }
-    })
-  })
-
-  await mediaHandler.setupInputAudioAnalysis()
+  // Request microphone access
+  await mediaHandler.startAudio((data) => {
+    session.sendRealtimeInput({
+      audio: {
+        data: arrayBufferToBase64(data),
+        mimeType: 'audio/pcm;rate=16000',
+      },
+    })
+  })
+
+  await mediaHandler.setupInputAudioAnalysis()
+  await mediaHandler.setupOutputAudioAnalysis()

Add a browser-safe helper (e.g., in media-handler.ts):

export function arrayBufferToBase64(buffer: ArrayBuffer): string {
  const bytes = new Uint8Array(buffer)
  let binary = ''
  const chunk = 0x8000
  for (let i = 0; i < bytes.length; i += chunk) {
    binary += String.fromCharCode.apply(
      null,
      bytes.subarray(i, i + chunk) as unknown as number[],
    )
  }
  return btoa(binary)
}

Consider also deferring the startAudio kickoff into startAudioCapture() so the mic-permission prompt is gated by the RealtimeConnection contract rather than firing immediately on connect().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 270 -
280, Await the async mediaHandler.startAudio(...) call so
setupInputAudioAnalysis() runs after this.mediaStream is set (e.g., await
mediaHandler.startAudio(...)); also invoke
mediaHandler.setupOutputAudioAnalysis() after starting audio so
outputAnalyser/outputGainNode are initialized (call setupOutputAudioAnalysis()
in the same startup sequence). Replace the Node Buffer conversion with a
browser-safe conversion (use a helper like arrayBufferToBase64 that converts an
ArrayBuffer/Uint8Array to base64 via btoa) instead of
Buffer.from(...).toString('base64'). Optionally move the initial startAudio call
into a startAudioCapture() flow on the RealtimeConnection contract to defer the
mic-permission prompt.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Around line 336-339: The adapter's updateSession() is a silent no-op so calls
from RealtimeClient.applySessionConfig() (which calls
connection.updateSession(...)) are ignored; update updateSession() in
packages/typescript/ai-gemini/src/realtime/adapter.ts to (1) add JSDoc above
updateSession() clearly stating Gemini Live has no mid-session update equivalent
and that this method is intentionally a no-op, and (2) emit a one-time warning
when updateSession() is invoked (use the adapter's existing logger or a
module-level once-flag to avoid spamming) so callers see that their runtime
tool/instruction updates will be ignored. Ensure the doc references that
RealtimeClient.applySessionConfig() calls connection.updateSession and that
behavior differs for this adapter.
- Around line 118-154: sessionResumptionUpdate is dead state: remove the
declaration `let sessionResumptionUpdate: LiveServerSessionResumptionUpdate |
null = null` and delete the `// TODO: implement session resumption` block inside
the `onmessage` callback that assigns `sessionResumptionUpdate =
response.sessionResumptionUpdate`; keep behavior unchanged by eliminating the
unused variable and assignment (or if you prefer to preserve signaling instead,
replace the assignment with an emit like `emit("session_resumption_update",
response.sessionResumptionUpdate)` inside the same `onmessage` handler).
- Around line 156-164: The usage event is currently skipped when any token count
is 0 because of truthy checks; change the guard to check only that
response.usageMetadata exists, then extract totalTokenCount, promptTokenCount,
and responseTokenCount using nullish fallback (or use them directly) so zero is
allowed, and always call emit("usage", { completionTokens: responseTokenCount,
promptTokens: promptTokenCount, totalTokens: totalTokenCount }) when
usageMetadata is present; also remove the unused responseTokensDetails from the
destructuring to avoid dead code.
- Around line 374-380: The inputSampleRate getter currently returns 24000 but
the microphone PCM sent to Gemini is 16000, so update the inputSampleRate
accessor to return 16000 (leave outputSampleRate as 24000); change the
inputSampleRate implementation in the same module (the inputSampleRate getter
used by getAudioVisualization()) so consumers and resamplers receive the correct
16kHz value.

---

Duplicate comments:
In `@packages/typescript/ai-gemini/src/realtime/adapter.ts`:
- Line 237: Replace the loose equality in the conditional that checks
response.serverContent.modelTurn?.role == 'model' with strict equality (use ===)
so the check becomes response.serverContent.modelTurn?.role === 'model'; update
the conditional inside the adapter handling code where
response.serverContent.modelTurn is inspected to use === to avoid type-coercion
issues.
- Around line 246-256: Remove the leftover debug console.log in the loop over
response.serverContent.modelTurn.parts and stop gating audio part creation on
outputTranscription?.finished; instead, when iterating parts
(response.serverContent.modelTurn.parts) check that part.inlineData?.data and
outputTranscription?.text are present, then push the audio part into
message.parts using
mediaHandler.convertBase64ToArrayBuffer(part.inlineData.data) and transcript
outputTranscription.text (update the conditional used in the block that
currently references outputTranscription?.finished).
- Around line 174-201: The current guards in adapter.ts rely on
inputTranscription.finished / outputTranscription.finished which the Gemini
server may omit; instead, emit transcripts whenever inputTranscription.text or
outputTranscription.text exists and determine isFinal from
response.serverContent?.turnComplete; for input handling, keep the currentMode
switch (set currentMode = 'thinking' and emit('mode_change', { mode: 'thinking'
})) only when response.serverContent?.turnComplete is true (or when turnComplete
indicates completion) and emit('transcript', { isFinal:
Boolean(response.serverContent?.turnComplete), transcript:
inputTranscription.text, role: 'user' }) (and similarly for outputTranscription
with role: 'assistant') so transcripts are not skipped when finished is absent.
- Around line 325-334: sendToolResult constructs a tool response as a single
object but Gemini Live expects BidiGenerateContentToolResponse.functionResponses
to be an array; update sendToolResult to call session.sendToolResponse with
functionResponses: [{ id: callId, name: <toolName>, response: { result } }] (add
a toolName parameter to sendToolResult if not available) so the payload is an
array and includes the name field required by some SDK paths; ensure the
signature of sendToolResult and any callers are updated to provide the tool name
and that types align with FunctionResponse[].
- Around line 270-280: Await the async mediaHandler.startAudio(...) call so
setupInputAudioAnalysis() runs after this.mediaStream is set (e.g., await
mediaHandler.startAudio(...)); also invoke
mediaHandler.setupOutputAudioAnalysis() after starting audio so
outputAnalyser/outputGainNode are initialized (call setupOutputAudioAnalysis()
in the same startup sequence). Replace the Node Buffer conversion with a
browser-safe conversion (use a helper like arrayBufferToBase64 that converts an
ArrayBuffer/Uint8Array to base64 via btoa) instead of
Buffer.from(...).toString('base64'). Optionally move the initial startAudio call
into a startAudioCapture() flow on the RealtimeConnection contract to defer the
mic-permission prompt.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c2e20e14-bcc3-4179-9122-4780fa40c181

📥 Commits

Reviewing files that changed from the base of the PR and between 9287a62 and 97500d3.

📒 Files selected for processing (4)
  • packages/typescript/ai-client/src/realtime-client.ts
  • packages/typescript/ai-client/src/realtime-types.ts
  • packages/typescript/ai-gemini/src/realtime/adapter.ts
  • packages/typescript/ai/src/realtime/types.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/typescript/ai-client/src/realtime-types.ts
  • packages/typescript/ai/src/realtime/types.ts

Comment on lines +118 to +154
let sessionResumptionUpdate: LiveServerSessionResumptionUpdate | null = null

function generateMessageId(): string {
return `gemini-msg-${Date.now()}-${++messageIdCounter}`
}

const ai = new GoogleGenAI({
apiKey: token.token,
httpOptions: {
apiVersion: 'v1alpha'
}
});

const session = await ai.live.connect({
model: model,
config: liveConfig,
callbacks: {
onopen() {
emit("status_change", { status: "connected" })
},
onclose() {
emit("status_change", { status: "idle" })
},
onmessage(response) {

const content = response.serverContent;
const inputTranscription = content?.inputTranscription;
const outputTranscription = content?.outputTranscription;

if (response.goAway) {
emit("go_away", { timeLeft: response.goAway.timeLeft })
}

// TODO: implement session resumption
if (response.sessionResumptionUpdate) {
sessionResumptionUpdate = response.sessionResumptionUpdate
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

sessionResumptionUpdate is captured but never used.

The variable is written in the onmessage handler (and has a // TODO: implement session resumption comment), but nothing ever reads it. If session resumption isn't in scope for this PR, consider dropping the dead state to avoid confusion, or open a tracking issue.

Would you like me to open a follow-up issue to track session resumption support?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 118 -
154, sessionResumptionUpdate is dead state: remove the declaration `let
sessionResumptionUpdate: LiveServerSessionResumptionUpdate | null = null` and
delete the `// TODO: implement session resumption` block inside the `onmessage`
callback that assigns `sessionResumptionUpdate =
response.sessionResumptionUpdate`; keep behavior unchanged by eliminating the
unused variable and assignment (or if you prefer to preserve signaling instead,
replace the assignment with an emit like `emit("session_resumption_update",
response.sessionResumptionUpdate)` inside the same `onmessage` handler).

Comment thread packages/typescript/ai-gemini/src/realtime/adapter.ts
Comment on lines +336 to +339
updateSession() {
// No equivalent of updateSession() exists dynamically as it does in OpenAI
// for updating system instructions, tools, etc mid-session.
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

updateSession is a silent no-op.

The RealtimeClient.applySessionConfig() path unconditionally calls connection.updateSession(...) after connect. With this adapter, settings like runtime tool/instruction updates will silently be ignored. At minimum, document this in the adapter's JSDoc (and ideally log a one-time warning) so consumers don't get surprised when mid-session updateSession calls from higher up the stack do nothing. Since Gemini Live has no equivalent operation, this is expected, but callers need visibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 336 -
339, The adapter's updateSession() is a silent no-op so calls from
RealtimeClient.applySessionConfig() (which calls connection.updateSession(...))
are ignored; update updateSession() in
packages/typescript/ai-gemini/src/realtime/adapter.ts to (1) add JSDoc above
updateSession() clearly stating Gemini Live has no mid-session update equivalent
and that this method is intentionally a no-op, and (2) emit a one-time warning
when updateSession() is invoked (use the adapter's existing logger or a
module-level once-flag to avoid spamming) so callers see that their runtime
tool/instruction updates will be ignored. Ensure the doc references that
RealtimeClient.applySessionConfig() calls connection.updateSession and that
behavior differs for this adapter.

Comment on lines +374 to +380
get inputSampleRate() {
return 24000
},

get outputSampleRate() {
return 24000
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

inputSampleRate is wrong — should be 16000 to match the PCM sent to Gemini.

Microphone input is sent as audio/pcm;rate=16000 on line 275, but getAudioVisualization().inputSampleRate reports 24000. This will give consumers wrong data for frequency-domain visualization math and for any downstream resampling. Gemini Live's output PCM is 24 kHz (so outputSampleRate: 24000 is correct), but input is 16 kHz.

         get inputSampleRate() {
-          return 24000
+          return 16000
         },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-gemini/src/realtime/adapter.ts` around lines 374 -
380, The inputSampleRate getter currently returns 24000 but the microphone PCM
sent to Gemini is 16000, so update the inputSampleRate accessor to return 16000
(leave outputSampleRate as 24000); change the inputSampleRate implementation in
the same module (the inputSampleRate getter used by getAudioVisualization()) so
consumers and resamplers receive the correct 16kHz value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant