Add voice-controlled computer use: assembly control command#271
Conversation
A hands-free, voice-in/voice-out terminal agent that turns spoken instructions into real macOS UI actions — the "voice control plane" a browser/web service can't be, because it drives the actual desktop. Architecture (a `control/` feature slice with every external leg behind an injected seam, so the loop is hermetically testable with no mic, network, subprocess, or macOS): - actions/tools: the action vocabulary + its OpenAI function-calling schema. - engine: the pure observe/act loop (transcript -> LLM tool calls -> execute). - bridge: adapts the LLM Gateway into the engine's Responder seam. - listen: mic Streaming STT -> finalized utterances. - helper: spawns/talks JSON to a bundled Swift helper (CGEvent + the Accessibility API + NSWorkspace) — the "hands". - macos_ui_control.swift: the native helper (Codable JSON-lines protocol). `--dry-run` refuses every UI-mutating action (observe-only). macOS-only; fails fast elsewhere. Registered additively via SPEC; full gate green (100% patch coverage, mutation, types, lint, architecture contracts). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PiUeSiTo5aV99PPfEQkuNc
| if self._json: | ||
| self._event("user", text=text) | ||
| else: | ||
| output.error_console.print(output.muted(f"you: {text}")) |
There was a problem hiding this comment.
ControlRenderer.on_user prints user speech verbatim to stderr; avoid logging unsanitized user-controlled text (mask, truncate, or omit sensitive data).
Details
✨ AI Reasoning
The renderer's on_user implementation prints the finalized spoken instruction directly to stderr (error_console.print) in human mode. This logs unsanitized user-controlled speech (potential PII or CR/LF log injection) with no masking or sanitization.
🔧 How do I fix it?
Keep sensitive data such as emails, passwords, and tokens out of logs. When logging values tied to a user, prefer a safe identifier like a user ID over the raw input, and strip line breaks from any user-provided text you do log.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| hands = deps.helper() | ||
| try: | ||
| api_key = state.resolve_api_key() | ||
| respond = deps.responder(api_key, opts) |
| try: | ||
| api_key = state.resolve_api_key() | ||
| respond = deps.responder(api_key, opts) | ||
| transcripts = deps.transcripts(api_key, opts) |
Implements a new
assembly controlcommand that enables hands-free macOS UI automation through voice instructions. Users speak commands, which are transcribed via Streaming STT and executed by an LLM agent that decides which UI actions to take (typing, key chords, clicking elements, launching apps) through a native Swift helper.Key changes
New
aai_cli/control/module — The core agent loop and supporting infrastructure:engine.py: Pure observe/act loop with injected responder/executor/renderer seams for testabilityactions.py: Action vocabulary (type_text, key_combo, click, launch_app, focus_app, get_ui_tree, screenshot)bridge.py: Adapts LLM Gateway (OpenAI-compatible) into the engine's responder interfacetools.py: Exposes actions as OpenAI function-calling tool definitionshelper.py: Manages the native Swift helper process (compile-once, run-long-lived, JSON-lines protocol)listen.py: Converts mic Streaming STT into an utterance stream (queue + worker thread)render.py: Surfaces loop progress (human stderr narration or NDJSON events)prompt.py: System prompt briefing the model on the voice-control loopNative macOS helper —
aai_cli/control/macos_ui_control.swift:Command wiring —
aai_cli/commands/control/:__init__.py: Typer command with options (device, sample_rate, model, max_tokens, max_steps, dry_run, json)_exec.py: Run logic with injectable dependencies (transcripts, responder, helper) for testabilityComprehensive test coverage:
tests/test_control.py: Pure loop, actions, engine, bridge, rendering (all external legs faked)tests/test_control_exec.py: Helper transport, build, mic listener, command wiring (macOS paths mocked)tests/_control_helpers.py: Shared fakes (RecordingRenderer, FakeProc, scripted responder, etc.)Integration:
Notable implementation details
--dry-runmode refuses mutating actions but runs observe actions so the model can still "see"https://claude.ai/code/session_01PiUeSiTo5aV99PPfEQkuNc