Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
dd09c7d
docs: design for assembly live tool-call UX (detail + spacing)
alexkroman-assembly Jun 22, 2026
5441756
docs: implementation plan for assembly live tool-call UX
alexkroman-assembly Jun 22, 2026
934b685
docs: design for flat paused voice-bar meter
alexkroman-assembly Jun 22, 2026
f02cd2e
docs: design for live weather tool in assembly live
alexkroman-assembly Jun 22, 2026
62727ae
docs: design for assembly live streaming reply pipeline
alexkroman-assembly Jun 22, 2026
657e410
docs: implementation plan for live weather tool
alexkroman-assembly Jun 22, 2026
3bd1a29
feat: keyless Open-Meteo weather tool for assembly live
alexkroman-assembly Jun 22, 2026
c89e5bf
docs: design for assembly live file read/write in launch dir
alexkroman-assembly Jun 22, 2026
23971fa
docs: implementation plan for assembly live streaming reply pipeline
alexkroman-assembly Jun 22, 2026
482553a
fix: annotate test fixtures as dict[str, object] to satisfy pyright d…
alexkroman-assembly Jun 22, 2026
184c074
docs: make grep/search an explicit requirement in live file design
alexkroman-assembly Jun 22, 2026
407746d
docs: design for read-url (web + PDF) tool in assembly live
alexkroman-assembly Jun 22, 2026
e254e23
wip: in-flight live tool-call UX work (checkpoint before streaming-pi…
alexkroman-assembly Jun 22, 2026
d4a1aef
feat(live): add pop_clauses incremental clause splitter
alexkroman-assembly Jun 22, 2026
453ee01
feat: bind the live weather tool into the assembly live agent
alexkroman-assembly Jun 22, 2026
db21551
docs: implementation plan for read-url tool in assembly live
alexkroman-assembly Jun 22, 2026
03283de
docs: correct live file design — tools already bound, swap backend + …
alexkroman-assembly Jun 22, 2026
1af1a54
feat(live): add build_streamer token-streaming reply leg
alexkroman-assembly Jun 22, 2026
75c060a
docs: implementation plan for assembly live file read/write
alexkroman-assembly Jun 22, 2026
46f5748
test: pin weather capability ordering and exact keyed toolset
alexkroman-assembly Jun 22, 2026
84687d7
docs: mark live file-readwrite plan blocked on streaming-pipeline rebase
alexkroman-assembly Jun 22, 2026
83e6501
fix(live): satisfy ruff/mypy for the streaming reply leg
alexkroman-assembly Jun 22, 2026
68ffad0
test: kill weather tool mutation survivors (count, length guard, WMO …
alexkroman-assembly Jun 22, 2026
a25f83e
feat: read-a-URL (web + PDF) tool module for assembly live
alexkroman-assembly Jun 22, 2026
ab51e73
fix error
alexkroman-assembly Jun 22, 2026
f3491d1
fix(live): scope streaming graph check to brain, not the shared protocol
alexkroman-assembly Jun 22, 2026
5a6a88c
fix(live): drop pragma escape hatch and dead kwargs path in streamer …
alexkroman-assembly Jun 22, 2026
f92a973
feat: wire read_url tool into assembly live
alexkroman-assembly Jun 22, 2026
6f46db5
docs: design + plan for live date/time tool
alexkroman-assembly Jun 22, 2026
22ea40d
feat: local date/time tool module for assembly live
alexkroman-assembly Jun 22, 2026
0581dbf
feat: wire get_current_datetime tool into assembly live
alexkroman-assembly Jun 22, 2026
044164a
docs: spec for spoken tool-call filler in live voice agent
alexkroman-assembly Jun 22, 2026
ed267f4
Merge live read_url (web + PDF) tool into live-tool-call-ux
alexkroman-assembly Jun 22, 2026
c212acb
Merge live get_current_datetime tool into live-tool-call-ux
alexkroman-assembly Jun 22, 2026
e88a05e
docs: spec for half-duplex echo guard in live voice agent
alexkroman-assembly Jun 22, 2026
d0654e2
feat(live): stream the reply through clause-level streaming TTS
alexkroman-assembly Jun 22, 2026
4a51f25
fix(live): narrow stream_reply events to SpeechDelta in the deps test
alexkroman-assembly Jun 22, 2026
cf92e58
test(live): pin _MIN_CLAUSE_CHARS with a soft-separator clause test
alexkroman-assembly Jun 22, 2026
087969a
refactor(live): drop the superseded build_completer reply path
alexkroman-assembly Jun 22, 2026
bbe5f21
docs(live): describe the streaming reply pipeline
alexkroman-assembly Jun 22, 2026
153b1c9
feat(live): real-cwd filesystem backend + write-gating behind files c…
alexkroman-assembly Jun 22, 2026
a8b7655
feat(live): advertise file capability + speakable file tool labels
alexkroman-assembly Jun 22, 2026
e6fb13f
feat(live): write-approval streaming loop in build_streamer
alexkroman-assembly Jun 22, 2026
4fd8a77
feat(live): thread write approver through engine; pause reply deadlin…
alexkroman-assembly Jun 22, 2026
0515664
feat(live): TUI write-approval modal reusing code agent's ApprovalScreen
alexkroman-assembly Jun 22, 2026
dea8620
feat(live): --files flag wiring (TUI approver + headless deny)
alexkroman-assembly Jun 22, 2026
1fc52a7
docs(live): document --files; keyword-only verbose flag + PERF401 fix
alexkroman-assembly Jun 22, 2026
7722989
docs: design spec for removing assembly code (keep live)
alexkroman-assembly Jun 22, 2026
e541b24
fix(live): narrow gated graph to a _GatedGraph protocol for mypy (str…
alexkroman-assembly Jun 22, 2026
677d426
docs: implementation plan for removing assembly code
alexkroman-assembly Jun 22, 2026
e19ba15
fix(tests): mypy narrowing in live TUI toggle test; regenerate code-T…
alexkroman-assembly Jun 22, 2026
2f80d16
docs: design for five keyless tools for assembly live
alexkroman-assembly Jun 22, 2026
a492799
refactor(live): extract prompt.py from brain.py; split tests under 50…
alexkroman-assembly Jun 22, 2026
e874437
docs(plan): re-point agent_cascade/prompt.py firecrawl import in remo…
alexkroman-assembly Jun 22, 2026
dce1551
refactor(live): relocate shared agent modules from code_agent into ag…
alexkroman-assembly Jun 22, 2026
840ddf5
feat(models): default streaming, live, and batch to universal-3-5-pro
alexkroman-assembly Jun 22, 2026
5857c88
feat(code): remove the assembly code command and its code_agent slice
alexkroman-assembly Jun 22, 2026
e2d4e48
docs: switch calculate to simpleeval with model-facing usage in tool …
alexkroman-assembly Jun 22, 2026
40c6b75
docs: add three offline-library tools (date_math, check_holiday, sun_…
alexkroman-assembly Jun 22, 2026
2188049
chore(help): drop the Coding Agent panel after removing assembly code
alexkroman-assembly Jun 22, 2026
e585f08
chore(deps): drop langgraph-checkpoint-sqlite + clean code_agent lint…
alexkroman-assembly Jun 22, 2026
bb1d3a8
chore(deptry): exclude .claude worktrees from dependency scan
alexkroman-assembly Jun 22, 2026
7ea9e48
docs: drop assembly code from README and architecture guide
alexkroman-assembly Jun 22, 2026
a7518a7
fix(pyright): exempt model.py from strict, annotate summarize list, d…
alexkroman-assembly Jun 22, 2026
a513096
docs: implementation plan for eight keyless live tools
alexkroman-assembly Jun 22, 2026
7683295
chore(pyright): update tests-pyright ignore list for renamed/removed …
alexkroman-assembly Jun 22, 2026
d458a7a
fix(live): narrow speech_model Optional in test + vulture-ignore Comp…
alexkroman-assembly Jun 22, 2026
ef47d19
docs: broaden calculate with curated math/statistics functions
alexkroman-assembly Jun 22, 2026
8793b51
chore(sandbox): let safe-chain-wrapped uv run inside the sandbox
alexkroman-assembly Jun 22, 2026
685fa0a
refactor(live): extract reply-runtime primitives from engine.py into …
alexkroman-assembly Jun 22, 2026
d4467d2
test(live): split test_agent_cascade_engine.py reply tests into test_…
alexkroman-assembly Jun 22, 2026
3c1e1ad
test(live): split run_agent_cascade wiring tests into test_live_tui_w…
alexkroman-assembly Jun 22, 2026
9ae478f
docs: design for sandboxed execute in assembly live
alexkroman-assembly Jun 22, 2026
dce478b
docs: adopt sandbox-runtime read posture in execute design
alexkroman-assembly Jun 22, 2026
04a2a00
chore(sandbox): allow uvx tool dirs (~/.local/share|state/uv) for in-…
alexkroman-assembly Jun 22, 2026
d089a5f
feat(live): speak a filler during tool calls and discard interim plan…
alexkroman-assembly Jun 22, 2026
67aca91
test(smoke): drop the removed `assembly code` command from the workfl…
alexkroman-assembly Jun 22, 2026
35bc0fe
Merge remote-tracking branch 'origin/main' into live-tool-call-ux
alexkroman-assembly Jun 22, 2026
47ba899
docs: cwd-scoped cowork, y/n-gated execute, durable memory
alexkroman-assembly Jun 22, 2026
8511eff
docs: wire up subagents (task tool) in execute design
alexkroman-assembly Jun 22, 2026
66478d3
refactor(live): extract _io.py + split filler tests to stay under the…
alexkroman-assembly Jun 23, 2026
26b0425
docs: add spoken approval + consistency pass on execute design
alexkroman-assembly Jun 23, 2026
d8652d9
docs(live): M1 implementation plan for sandboxed execute + memory
alexkroman-assembly Jun 23, 2026
3a85723
fix(live): unbreak branch gate baseline (re-exports, coverage, mutati…
alexkroman-assembly Jun 23, 2026
8dfb982
feat(live): seatbelt sandbox profile renderer + denylist constants
alexkroman-assembly Jun 23, 2026
5e63753
feat(live): bwrap argv builder + renderer parity test
alexkroman-assembly Jun 23, 2026
2c7a17d
feat(live): sandbox capability probe + default subprocess runner
alexkroman-assembly Jun 23, 2026
247add9
feat(live): SandboxedShellBackend.execute confines to cwd or refuses
alexkroman-assembly Jun 23, 2026
686556f
feat(live): sandbox-capable backend, gated execute, durable memory
alexkroman-assembly Jun 23, 2026
18969e2
feat(live): document sandboxed execute + memory; --files help + mutat…
alexkroman-assembly Jun 23, 2026
80388b2
fix(live): order tool affordances above the answer; graceful tool-cal…
alexkroman-assembly Jun 23, 2026
b76ed28
Add honesty and file-safety guidance to live agent prompt
alexkroman-assembly Jun 23, 2026
bc4fb7d
fix(live): reset the reply widget per turn so the answer isn't glued …
alexkroman-assembly Jun 23, 2026
d7e9d77
feat(live): general-purpose subagent spec for the task tool (M2)
alexkroman-assembly Jun 23, 2026
fa666a4
feat(live): wire the gated general-purpose subagent + task label (M2)
alexkroman-assembly Jun 23, 2026
3426fa2
test(live): lock subagent write surfacing through the parent gate (M2)
alexkroman-assembly Jun 23, 2026
c570081
feat(live): advertise delegation under --files; document the task sub…
alexkroman-assembly Jun 23, 2026
c68703c
feat(live): spoken-approval grammar (fail-safe to reject) (M3)
alexkroman-assembly Jun 23, 2026
34833b2
docs(live): M3 spoken-approval plan (grammar done; engine race designed)
alexkroman-assembly Jun 23, 2026
ea710ec
feat(live): voice-or-keyboard approval resolution core (M3)
alexkroman-assembly Jun 23, 2026
a3cca03
feat(live): hands-free spoken approval for --files (M3)
alexkroman-assembly Jun 23, 2026
8e2ba5c
docs(live): document hands-free spoken approval for --files (M3)
alexkroman-assembly Jun 23, 2026
3aeb45e
feat(live): harden cascade prompt with borrowed openclaw techniques
alexkroman-assembly Jun 23, 2026
86e7981
feat(live): return comprehensive weather data from the Open-Meteo tool
alexkroman-assembly Jun 23, 2026
e8c9fa9
test(live): split brain tests under the 500-line gate; green the diff…
claude Jun 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .claude/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,19 @@
"Read(**/*.p12)"
]
},
"sandbox": {
"network": {
"allowLocalBinding": true,
"allowMachLookup": ["com.apple.SystemConfiguration.configd"]
},
"filesystem": {
"allowWrite": [
"~/.cache/uv",
"~/.local/share/uv",
"~/.local/state/uv"
]
}
},
"hooks": {
"SessionStart": [
{
Expand Down
3 changes: 1 addition & 2 deletions .importlinter
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ type = layers
; assembles the command layer — main, command_registry, help_panels, options —
; stays at the package root, above `commands`, and is intentionally unlisted
; (it legitimately imports the command modules to discover/register them).
; Feature slices (agent, tts, streaming, code_agent, code_gen, init, auth, onboard) are
; Feature slices (agent, tts, streaming, agent_cascade, code_gen, init, auth, onboard) are
; likewise unlisted vertical slices governed by contract 2.
layers =
commands
Expand All @@ -34,7 +34,6 @@ source_modules =
aai_cli.agent
aai_cli.agent_cascade
aai_cli.auth
aai_cli.code_agent
aai_cli.code_gen
aai_cli.init
aai_cli.onboard
Expand Down
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ That's it. Run `assembly onboard` for a guided tour, or see [Installation](#-ins
| `assembly live` | Talk live to a tool-using voice agent, wired client-side from Streaming STT + a deepagents brain on the LLM Gateway + streaming TTS — it can web-search, fetch URLs, and read the docs mid-conversation, like the `agent-cascade` starter (sandbox-only) |
| `assembly speak` | Synthesize text to speech over the streaming-TTS WebSocket (sandbox-only) |
| `assembly llm` | Prompt the LLM Gateway over a transcript, files, stdin, or a live stream |
| `assembly code` | Terminal coding agent (deepagents SDK) backed only by the LLM Gateway — reads/writes/edits files, runs shell, searches the docs MCP, and can invoke the `assembly` CLI itself; mutating actions ask for approval. Defaults to voice in a terminal (speak your request, replies read back via streaming TTS in the sandbox); pass `--no-voice` for the keyboard TUI |
| `assembly clip` | Cut audio/video with ffmpeg by diarized speaker, text match, LLM pick, or time range (`--video` keeps the picture for URL sources) — clip boundaries snap into nearby silence |
| `assembly dub` | Re-voice an audio/video file or URL in another language: transcription, LLM translation, per-speaker TTS, ffmpeg track-swap (sandbox-only) |
| `assembly caption` | Burn always-visible captions into a video: transcribe (or reuse a transcript), fetch SRT, ffmpeg burns it in — audio untouched |
Expand Down
15 changes: 15 additions & 0 deletions REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,3 +159,18 @@ Each server is launched independently and best-effort: one that won't start (a
missing `npx`/`uvx`, an offline host) drops only its own tools, so a single broken
tool never sinks the session. MCP tools are a live-run feature and are not
reflected in `--show-code` output.

`--files` lets the agent read, write, and run code in the directory you launch
it from (off by default). Reads run immediately; a write, edit, or command run pauses
the turn for confirmation in the voice TUI — press `y`/`n` (`a` approves the rest of the
session) or just say it ("approve" / "run it" / "go ahead"; anything unclear is treated as
a no). Destructive commands (e.g. `rm -rf`, `sudo`) ignore the spoken answer and require a
keypress. Commands run OS-sandboxed in that directory — confined to it, with no network
access — on macOS (`sandbox-exec`) and Linux (`bwrap`); on any other platform, or if the
sandbox tool is missing, running code is refused rather than run unconfined. Access is
rooted at the launch directory — the agent can't escape it. It can also delegate a
focused subtask to a helper (a sandboxed general-purpose subagent), whose own writes and
runs need the same confirmation. The agent also keeps a per-project memory file
(`./.deepagents/AGENTS.md`) so it resumes knowing what it was working on. A non-interactive
run (a file/URL source, `--json`, `-o text`, or a non-TTY) has no way to confirm a write or
run, so those are declined there while reads still work.
5 changes: 2 additions & 3 deletions aai_cli/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ contract:
`help_panels`, `options`. They assemble/define the command layer (and
`command_registry` imports the command modules to discover them), so they live
*above* `commands` and stay at the root.
- **Feature slices** — `agent/`, `tts/`, `streaming/`, `code_agent/`, `code_gen/`,
- **Feature slices** — `agent/`, `tts/`, `streaming/`, `code_gen/`,
`init/`, `auth/`, `onboard/`. These are cohesive vertical slices that internally mix
protocol + rendering, so they aren't a single horizontal layer; contract 2
forbids them from importing `commands`.
Expand Down Expand Up @@ -151,9 +151,8 @@ heavily-reworked commands with long bodies; small commands keep the inline
- **`streaming/`** + `client.stream_audio` — v3 realtime API. Event callbacks run on the SDK reader thread and guard against `BrokenPipeError` (`stdio.silence_stdout()`) so a closed pipe never dumps a thread traceback.
- **`core/sync_stt.py`** + **`core/signals.py`** + `commands/dictate/` — `assembly dictate`: headless dictation over the **Sync STT API** (`Environment.sync_base`, one POST `/transcribe` per utterance with the required `X-AAI-Model: u3-sync-pro` header; 80 ms–120 s of PCM/WAV). It needs no terminal: recording starts immediately and `dictate_exec._record` polls `signals.stop_on_terminate` between ~100 ms mic chunks for a SIGTERM, which finishes the utterance (clean exit 0) — so a hotkey tool like Hammerspoon can launch it as a background task and `kill -TERM`/`task:terminate()` to transcribe. SIGINT (Ctrl-C) still cancels (exit 130). Both boundaries (the stop latch, mic, HTTP) are injectable, so the suite never needs a real signal or microphone (`tests/test_dictate_exec.py` scripts the SIGTERM latch). Contrast `signals.terminate_as_interrupt` (used by `stream`/`agent`/`speak`), which routes SIGTERM into the *cancel* path instead.
- **`agent/`** — full-duplex voice agent (mic in, TTS out via `voices.py`).
- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, per-sentence TTS, barge-in, history window — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`); under `-v` (`debuglog.active()`) `brain._run_graph` *streams* that graph instead of `invoke`-ing it and logs each tool call/result/interim line as it lands (reusing `code_agent.events.message_events`), so a spoken turn that stalls mid-tool is debuggable — plain `invoke` runs the whole loop internally and `-v` would otherwise show only the httpx lines. **Front-end:** an interactive mic session in human mode runs a **voice-only Textual TUI** (`agent_cascade/tui.py`, `LiveAgentApp`) by default — there's no text input (you can't type to it), just a transcript + an animated voice bar tracking listening/thinking/speaking. It shares the `assembly code` TUI's chrome (`code_agent.banner` wordmark, `code_agent.messages` widgets, `code_agent.tui_status.voicebar_markup`/`VOICE_FRAMES`); the blocking `run_cascade` runs on a worker thread and reaches the UI through a `_TuiRenderer` (the `engine.Renderer` protocol) that hops each call onto the UI thread, and a quit calls `DuplexAudio.close` to end the mic iterator and unblock that worker. `_exec._should_use_tui` gates it: file/sample input, `--json`/`-o text`, and a non-TTY all fall back to the plain `AgentRenderer` line output.
- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, clause-level streaming TTS, barge-in, history window — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`) streamed token-by-token via `brain.build_streamer` (`graph.stream(stream_mode="messages")`): the engine buffers `SpeechDelta`s, flushes complete clauses with `text.pop_clauses` (soft-separator clauses gated by `engine._MIN_CLAUSE_CHARS`), and synthesizes each clause with **streaming TTS** (`tts.session.synthesize(on_audio=…)`) so audio starts on the first frame instead of after the whole reply. The reply runs on a throwaway producer thread feeding a `queue.Queue` the worker drains under a monotonic deadline (the wall-clock backstop that replaced `_complete_within`), and an abandoned-on-timeout graph leg's langchain `ThreadPoolExecutor` worker is detached (`_detach_executor_threads_since`) so it can't wedge interpreter exit. A `ToolNotice` surfaces the "Searching the web…" affordance and drops any unspoken preamble. Under `-v` (`debuglog.active()`) `brain._stream_graph` logs each accumulated assistant line, tool call, and tool result as it streams. **Front-end:** an interactive mic session in human mode runs a **voice-only Textual TUI** (`agent_cascade/tui.py`, `LiveAgentApp`) by default — there's no text input (you can't type to it), just a transcript + an animated voice bar tracking listening/thinking/speaking. It uses its own `banner` wordmark, `messages` widgets, and `tui_status.voicebar_markup`/`VOICE_FRAMES` — all modules that now live in `agent_cascade/`; the blocking `run_cascade` runs on a worker thread and reaches the UI through a `_TuiRenderer` (the `engine.Renderer` protocol) that hops each call onto the UI thread, and a quit calls `DuplexAudio.close` to end the mic iterator and unblock that worker. `_exec._should_use_tui` gates it: file/sample input, `--json`/`-o text`, and a non-TTY all fall back to the plain `AgentRenderer` line output. **`--files`** (off by default) swaps the brain's in-memory backend for a real-cwd, sandbox-capable `SandboxedShellBackend` (`aai_cli/agent_cascade/sandbox.py`): file ops behave as before (traversal-blocked `virtual_mode`), and because it implements `SandboxBackendProtocol` deepagents binds a *functional* `execute` that runs commands OS-sandboxed in the real cwd — `sandbox-exec` (SBPL) on macOS, `bwrap` on Linux, refused (never an unconfined fallback) on any other platform or with the sandbox binary missing; the OS sandbox blocks the network, confines writes to cwd (+ the temp dir), and read-denies credential stores (`~/.ssh`/`~/.aws`/…, `.env*`, `.claude/`). The policy renderers are pure and the subprocess/capability boundaries injected, so the suite asserts *what we'd run* with no real sandbox. `write_file`/`edit_file`/`execute` are gated via `interrupt_on` + an `InMemorySaver`; `brain._stream_gated` detects the post-stream interrupt (`graph.get_state(config).interrupts`), asks an injected `Approver`, and resumes with `Command(resume=…)`, bracketing the human wait in `ApprovalPause` events so `engine._consume` suspends its reply deadline (`risk.py` surfaces a shell-risk warning on the prompt). The voice TUI supplies the approver via `agent_cascade.modals.ApprovalScreen` (`y`/`a`/`n`), which can *also* be resolved hands-free by voice: while a write awaits approval, `_consume` arms `_awaiting_approval` and `engine.on_turn` routes the next final transcript to `app.submit_voice_approval` → `ApprovalScreen.try_voice`, which applies `spoken_approval.spoken_decision` (an unambiguous affirmative approves, anything else rejects — fail-safe; destructive `risk.py`-flagged commands ignore the spoken answer and require a keypress). Headless runs auto-deny (`_exec._deny_writes`). `--files` also turns on durable per-project memory via deepagents' `MemoryMiddleware` (`memory=["./.deepagents/AGENTS.md"]`), distinct from the in-session `InMemorySaver`, and binds one gateway-bound, sandbox-backed general-purpose subagent (deepagents' `task` tool; spec in `agent_cascade/subagents.py`, omitting `model`/`tools` so it inherits both) for delegating a focused subtask. The subagent's own `interrupt_on` mirrors `_WRITE_TOOLS`, and a delegated `write_file`/`edit_file`/`execute` surfaces at the *parent* `get_state().interrupts` (so `_pending_writes` gates it too — verified by a HITL spike, locked in `tests/test_agent_cascade_subagents.py`). Reads (incl. `grep`) stay ungated.
- **`tts/`** + `commands/speak.py` — `assembly speak` synthesizes text to speech over the sandbox streaming-TTS WebSocket (`streaming-tts.sandbox000.…`). **Sandbox-only:** `session.is_available()` is false in production (empty `Environment.streaming_tts_host`), so the command exits 2 with a `--sandbox` hint. `session.synthesize` drives a Begin→Generate→Flush→Audio→Terminate protocol with an injectable `connect` for hermetic tests (mirrors `agent/session.py`); `audio.py` plays the PCM (default) or writes a WAV (`--out`). The single-voice default-playback path **streams**: `synthesize`'s `on_audio(chunk, sample_rate)` callback is wired to `audio.PcmPlayer.feed`, so speech starts on the first Audio frame (it opens the device lazily, since the rate is only known at Begin) instead of after the whole text — the win for a long `--url` page. `--out` (needs the full buffer) and the multi-voice dialogue path (`synthesize_dialogue` → `_output_audio` → buffered `play_pcm`) stay buffered; `synthesize` still returns the complete PCM for the summary regardless.
- **`code_agent/`** + `commands/code/` — `assembly code`: a terminal coding agent (a bespoke port of langchain-ai/deepagents' `code` agent) that talks **only** to the LLM Gateway. `model.py` pins the model to `ChatOpenAI` against `llm_gateway_base`; `agent.py` builds the deepagents graph over a cwd-scoped `LocalShellBackend` (filesystem + shell tools), plus extra tools: the custom `assembly` CLI tool (`cli_tool.py`, runs `python -m aai_cli` with the key via child env, never argv), a URL `fetch_url` tool (`fetch_tool.py`), Firecrawl web search when `FIRECRAWL_API_KEY` is set (`firecrawl_search.py`, shared with the live voice agent), an `ask_user` tool routed through an `AskBridge` to the front-end (`ask_tool.py`), and best-effort docs MCP tools (`docs_mcp.py`). Middleware adds installed skills (`skills.py`) and long-term memory (`memory.py`), each over its own dedicated backend. Sessions persist via a SQLite checkpointer (`store.py`) keyed by `--session`, so conversations resume. Approval gates the mutating tools (write/edit/execute/`assembly`/`fetch_url`); the general-purpose `task` subagent comes from deepagents by default. `session.py` drives the graph turn-by-turn (interrupt/resume = human approval), emitting framework-agnostic `events.py` to either the Textual TUI (`tui.py`, modeled on deepagents-code: transcript + input + approval/ask modals + clipboard copy) or the Rich fallback (`render.py`). The whole orchestration is tested by driving the **real** graph with a fake `BaseChatModel` (`tests/test_code_agent.py`), so no network/TTY is needed. **Voice is the default front-end in an interactive TTY** (`voice.py` + `_exec._run_voice`): `VoiceSession.listen` captures one spoken turn over Streaming STT (gating the mic shut the instant a turn finalizes) and `VoiceSession.speak` reads each assistant reply back over streaming TTS. It runs the **Rich REPL** loop (not the keyboard TUI) with a voice `read_line` + a reply-speaking sink. Readback needs streaming TTS, so it's **sandbox-only** (`tts.session.is_available`); in production the mic input still works and replies stay on screen. A mic-less box degrades to typed input on the first `AUDIO_ERROR_TYPES` `CLIError`; `--no-voice` selects the TUI, and a non-TTY (pipe/CI) the headless loop. Both legs (STT/TTS) are injected like the cascade's, so `tests/test_code_voice.py` drives it with fakes — no mic/speaker/socket.
- **`code_gen/`** — backs `--show-code` on `transcribe`/`stream`/`agent`: builds a ready-to-run Python SDK script from exactly the flags passed (no API key needed; generated code reads `ASSEMBLYAI_API_KEY`).
- **`auth/`** — browser-assisted `assembly login` via AMS + **Stytch B2B OAuth discovery** (`discovery.py`, `flow.py`, `loopback.py`, `ams.py`). Not Stytch Connected Apps.
- **`init/`** — scaffolds a self-contained FastAPI + HTML starter (`audio-transcription`/`live-captions`/`voice-agent` templates), optionally installs deps and opens the browser; writes the key to a git-ignored `.env`.
Expand Down
33 changes: 33 additions & 0 deletions aai_cli/agent/audio.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,11 @@ def __init__(
# access goes through `_lock`. `_out_state` (the target->device ratecv state)
# is touched ONLY by feed(), never the callback, so it needs no lock.
self._in: queue.Queue[bytes | None] = queue.Queue()
# The mic gate: set = listening (real audio), clear = muted (silence to STT). Flipped
# from the UI thread (start/stop listening), read on the capture thread, so it's an
# Event rather than a bare bool. Starts open — a session listens as soon as it connects.
self._listening = threading.Event()
self._listening.set()
# How long capture_frames() waits for a chunk before checking whether the
# device stream silently died (e.g. unplugged); injectable for fast tests.
self._poll_timeout = poll_timeout
Expand Down Expand Up @@ -179,12 +184,40 @@ def capture_frames(self) -> Iterator[bytes]:
continue
if chunk is None:
return
if not self._listening.is_set():
# Muted: feed silence of the same length so the recognizer keeps receiving
# audio (the socket stays alive) but hears nothing, instead of stalling the
# stream. Resampling zeros still yields zeros, so gate before the resample.
chunk = bytes(len(chunk))
if self._device_rate != self._target:
chunk, state = resample_pcm16(
chunk, state, src_rate=self._device_rate, dst_rate=self._target
)
yield chunk

def set_listening(self, *, on: bool) -> None:
"""Open or mute the mic in place, without tearing down the stream.

Muting keeps the full-duplex stream and the live STT/TTS session alive — captured
frames are zeroed to silence (see :meth:`capture_frames`) — so toggling back on
resumes listening instantly, with no socket reconnect.
"""
if on:
self._listening.set()
else:
self._listening.clear()

def toggle_listening(self) -> bool:
"""Flip the mic between listening and muted; return the resulting listening state."""
on = not self._listening.is_set()
self.set_listening(on=on)
return on

@property
def listening(self) -> bool:
"""Whether the mic is feeding real audio to STT (vs muted silence)."""
return self._listening.is_set()

def close(self) -> None:
self._in.put(None) # end capture_frames()
if self._stream is not None:
Expand Down
Loading