diff --git a/REFERENCE.md b/REFERENCE.md index f052b6ce..df511524 100644 --- a/REFERENCE.md +++ b/REFERENCE.md @@ -160,6 +160,14 @@ missing `npx`/`uvx`, an offline host) drops only its own tools, so a single brok tool never sinks the session. MCP tools are a live-run feature and are not reflected in `--show-code` output. +If the directory you launch from has an `AGENTS.md` or `CLAUDE.md`, `assembly live` +reads it into the agent's context — the same convention coding agents follow — so +spoken answers are grounded in the project at hand. `AGENTS.md` takes precedence +(and identical content, e.g. a `CLAUDE.md` symlinked to it, is included once); an +oversized file is truncated so it can't crowd out the conversation. This is +independent of `--files` (it happens even under `--no-files`, when the agent can't +touch the filesystem) and is not reflected in `--show-code` output. + The agent reads, writes, and runs code in the directory you launch it from (on by default; pass `--no-files` to disable). Reads run immediately; a write, edit, or command run pauses the turn for confirmation in the voice TUI — press `y`/`n` (`a` approves the rest of the diff --git a/aai_cli/AGENTS.md b/aai_cli/AGENTS.md index 7dd31818..926605d1 100644 --- a/aai_cli/AGENTS.md +++ b/aai_cli/AGENTS.md @@ -153,7 +153,7 @@ heavily-reworked commands with long bodies; small commands keep the inline - **`streaming/`** + `client.stream_audio` — v3 realtime API. Event callbacks run on the SDK reader thread and guard against `BrokenPipeError` (`stdio.silence_stdout()`) so a closed pipe never dumps a thread traceback. - **`core/sync_stt.py`** + **`core/signals.py`** + `commands/dictate/` — `assembly dictate`: headless dictation over the **Sync STT API** (`Environment.sync_base`, one POST `/transcribe` per utterance with the required `X-AAI-Model: u3-sync-pro` header; 80 ms–120 s of PCM/WAV). It needs no terminal: recording starts immediately and `dictate_exec._record` polls `signals.stop_on_terminate` between ~100 ms mic chunks for a SIGTERM, which finishes the utterance (clean exit 0) — so a hotkey tool like Hammerspoon can launch it as a background task and `kill -TERM`/`task:terminate()` to transcribe. SIGINT (Ctrl-C) still cancels (exit 130). Both boundaries (the stop latch, mic, HTTP) are injectable, so the suite never needs a real signal or microphone (`tests/test_dictate_exec.py` scripts the SIGTERM latch). Contrast `signals.terminate_as_interrupt` (used by `stream`/`agent`/`speak`), which routes SIGTERM into the *cancel* path instead. - **`agent/`** — full-duplex voice agent (mic in, TTS out via `voices.py`). -- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, clause-level streaming TTS, barge-in — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`) streamed token-by-token via `brain.build_streamer` (`graph.stream(stream_mode="messages")`): **context-window management is the brain's job, not the engine's** — `create_deep_agent` wires deepagents' own `SummarizationMiddleware` into the stack (summarize the oldest turns, offload the evicted history to a file), so the engine feeds the *full* untrimmed running history each turn and lets the graph compact it; the old client-side `text.trim_history`/`config.max_history` sliding window is gone from this path (`max_history` now only drives the hand-rolled `--show-code`/`assembly init` cascade, which doesn't use deepagents). The engine buffers `SpeechDelta`s, flushes complete clauses with `text.pop_clauses` (soft-separator clauses gated by `engine._MIN_CLAUSE_CHARS`), and synthesizes each clause with **streaming TTS** (`tts.session.synthesize(on_audio=…)`) so audio starts on the first frame instead of after the whole reply. The reply runs on a throwaway producer thread feeding a `queue.Queue` the worker drains under a monotonic deadline (the wall-clock backstop that replaced `_complete_within`), and an abandoned-on-timeout graph leg's langchain `ThreadPoolExecutor` worker is detached (`_detach_executor_threads_since`) so it can't wedge interpreter exit. A `ToolNotice` surfaces the "Searching the web…" affordance and drops any unspoken preamble. Under `-v` (`debuglog.active()`) `brain._stream_graph` logs each accumulated assistant line, tool call, and tool result as it streams. **Front-end:** an interactive mic session in human mode runs a **voice-only Textual TUI** (`agent_cascade/tui.py`, `LiveAgentApp`) by default — there's no text input (you can't type to it), just a transcript + an animated voice bar tracking listening/thinking/speaking. It uses its own `banner` wordmark, `messages` widgets, and `tui_status.voicebar_markup`/`VOICE_FRAMES` — all modules that now live in `agent_cascade/`; the blocking `run_cascade` runs on a worker thread and reaches the UI through a `_TuiRenderer` (the `engine.Renderer` protocol) that hops each call onto the UI thread, and a quit calls `DuplexAudio.close` to end the mic iterator and unblock that worker. `_exec._should_use_tui` gates it: file/sample input, `--json`/`-o text`, and a non-TTY all fall back to the plain `AgentRenderer` line output. **`--files`** (on by default; `--no-files` opts out) swaps the brain's in-memory backend for a real-cwd, sandbox-capable `SandboxedShellBackend` (`aai_cli/agent_cascade/sandbox.py`): file ops behave as before (traversal-blocked `virtual_mode`), and because it implements `SandboxBackendProtocol` deepagents binds a *functional* `execute` that runs commands OS-sandboxed in the real cwd — `sandbox-exec` (SBPL) on macOS, `bwrap` on Linux, refused (never an unconfined fallback) on any other platform or with the sandbox binary missing; the OS sandbox blocks the network, confines writes to cwd (+ the temp dir), and read-denies credential stores (`~/.ssh`/`~/.aws`/…, `.env*`, `.claude/`). The policy renderers are pure and the subprocess/capability boundaries injected, so the suite asserts *what we'd run* with no real sandbox. `write_file`/`edit_file`/`execute` are gated via `interrupt_on` + an `InMemorySaver`; `brain._stream_gated` detects the post-stream interrupt (`graph.get_state(config).interrupts`), asks an injected `Approver`, and resumes with `Command(resume=…)`, bracketing the human wait in `ApprovalPause` events so `engine._consume` suspends its reply deadline (`risk.py` surfaces a shell-risk warning on the prompt). The voice TUI supplies the approver via `agent_cascade.modals.ApprovalScreen` (`y`/`a`/`n`), which can *also* be resolved hands-free by voice: while a write awaits approval, `_consume` arms `_awaiting_approval` and `engine.on_turn` routes the next final transcript to `app.submit_voice_approval` → `ApprovalScreen.try_voice`, which applies `spoken_approval.spoken_decision` (an unambiguous affirmative approves, anything else rejects — fail-safe; destructive `risk.py`-flagged commands ignore the spoken answer and require a keypress). Headless runs auto-deny (`_exec._deny_writes`). `--files` also turns on durable per-project memory via deepagents' `MemoryMiddleware` (`memory=["./.deepagents/AGENTS.md"]`), distinct from the in-session `InMemorySaver`, and binds one gateway-bound, sandbox-backed general-purpose subagent (deepagents' `task` tool; spec in `agent_cascade/subagents.py`, omitting `model`/`tools` so it inherits both) for delegating a focused subtask. The subagent's own `interrupt_on` mirrors `_WRITE_TOOLS`, and a delegated `write_file`/`edit_file`/`execute` surfaces at the *parent* `get_state().interrupts` (so `_pending_writes` gates it too — verified by a HITL spike, locked in `tests/test_agent_cascade_subagents.py`). Reads (incl. `grep`) stay ungated. +- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, clause-level streaming TTS, barge-in — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`) streamed token-by-token via `brain.build_streamer` (`graph.stream(stream_mode="messages")`): **context-window management is the brain's job, not the engine's** — `create_deep_agent` wires deepagents' own `SummarizationMiddleware` into the stack (summarize the oldest turns, offload the evicted history to a file), so the engine feeds the *full* untrimmed running history each turn and lets the graph compact it; the old client-side `text.trim_history`/`config.max_history` sliding window is gone from this path (`max_history` now only drives the hand-rolled `--show-code`/`assembly init` cascade, which doesn't use deepagents). The engine buffers `SpeechDelta`s, flushes complete clauses with `text.pop_clauses` (soft-separator clauses gated by `engine._MIN_CLAUSE_CHARS`), and synthesizes each clause with **streaming TTS** (`tts.session.synthesize(on_audio=…)`) so audio starts on the first frame instead of after the whole reply. The reply runs on a throwaway producer thread feeding a `queue.Queue` the worker drains under a monotonic deadline (the wall-clock backstop that replaced `_complete_within`), and an abandoned-on-timeout graph leg's langchain `ThreadPoolExecutor` worker is detached (`_detach_executor_threads_since`) so it can't wedge interpreter exit. A `ToolNotice` surfaces the "Searching the web…" affordance and drops any unspoken preamble. Under `-v` (`debuglog.active()`) `brain._stream_graph` logs each accumulated assistant line, tool call, and tool result as it streams. **Front-end:** an interactive mic session in human mode runs a **voice-only Textual TUI** (`agent_cascade/tui.py`, `LiveAgentApp`) by default — there's no text input (you can't type to it), just a transcript + an animated voice bar tracking listening/thinking/speaking. It uses its own `banner` wordmark, `messages` widgets, and `tui_status.voicebar_markup`/`VOICE_FRAMES` — all modules that now live in `agent_cascade/`; the blocking `run_cascade` runs on a worker thread and reaches the UI through a `_TuiRenderer` (the `engine.Renderer` protocol) that hops each call onto the UI thread, and a quit calls `DuplexAudio.close` to end the mic iterator and unblock that worker. `_exec._should_use_tui` gates it: file/sample input, `--json`/`-o text`, and a non-TTY all fall back to the plain `AgentRenderer` line output. **`--files`** (on by default; `--no-files` opts out) swaps the brain's in-memory backend for a real-cwd, sandbox-capable `SandboxedShellBackend` (`aai_cli/agent_cascade/sandbox.py`): file ops behave as before (traversal-blocked `virtual_mode`), and because it implements `SandboxBackendProtocol` deepagents binds a *functional* `execute` that runs commands OS-sandboxed in the real cwd — `sandbox-exec` (SBPL) on macOS, `bwrap` on Linux, refused (never an unconfined fallback) on any other platform or with the sandbox binary missing; the OS sandbox blocks the network, confines writes to cwd (+ the temp dir), and read-denies credential stores (`~/.ssh`/`~/.aws`/…, `.env*`, `.claude/`). The policy renderers are pure and the subprocess/capability boundaries injected, so the suite asserts *what we'd run* with no real sandbox. `write_file`/`edit_file`/`execute` are gated via `interrupt_on` + an `InMemorySaver`; `brain._stream_gated` detects the post-stream interrupt (`graph.get_state(config).interrupts`), asks an injected `Approver`, and resumes with `Command(resume=…)`, bracketing the human wait in `ApprovalPause` events so `engine._consume` suspends its reply deadline (`risk.py` surfaces a shell-risk warning on the prompt). The voice TUI supplies the approver via `agent_cascade.modals.ApprovalScreen` (`y`/`a`/`n`), which can *also* be resolved hands-free by voice: while a write awaits approval, `_consume` arms `_awaiting_approval` and `engine.on_turn` routes the next final transcript to `app.submit_voice_approval` → `ApprovalScreen.try_voice`, which applies `spoken_approval.spoken_decision` (an unambiguous affirmative approves, anything else rejects — fail-safe; destructive `risk.py`-flagged commands ignore the spoken answer and require a keypress). **Project grounding (independent of `--files`):** `_exec.run_agent_cascade` reads the launch directory's `AGENTS.md`/`CLAUDE.md` via `agent_cascade/project_context.load_project_context()` into `CascadeConfig.project_context`, which `brain.build_graph` threads into `prompt.build_system_prompt(..., project_context=…)` (appended as project background after the persona/tool guidance). `AGENTS.md` wins precedence, identical content (a symlinked `CLAUDE.md`) is de-duplicated, and the total is capped at `project_context.MAX_CONTEXT_CHARS`. It's read at the command boundary (not in `build_graph`) so the brain stays hermetic, and the `--show-code` path builds its own config without it. Headless runs auto-deny (`_exec._deny_writes`). `--files` also turns on durable per-project memory via deepagents' `MemoryMiddleware` (`memory=["./.deepagents/AGENTS.md"]`), distinct from the in-session `InMemorySaver`, and binds one gateway-bound, sandbox-backed general-purpose subagent (deepagents' `task` tool; spec in `agent_cascade/subagents.py`, omitting `model`/`tools` so it inherits both) for delegating a focused subtask. The subagent's own `interrupt_on` mirrors `_WRITE_TOOLS`, and a delegated `write_file`/`edit_file`/`execute` surfaces at the *parent* `get_state().interrupts` (so `_pending_writes` gates it too — verified by a HITL spike, locked in `tests/test_agent_cascade_subagents.py`). Reads (incl. `grep`) stay ungated. - **`tts/`** + `commands/speak.py` — `assembly speak` synthesizes text to speech over the sandbox streaming-TTS WebSocket (`streaming-tts.sandbox000.…`). **Sandbox-only:** `session.is_available()` is false in production (empty `Environment.streaming_tts_host`), so the command exits 2 with a `--sandbox` hint. `session.synthesize` drives a Begin→Generate→Flush→Audio→Terminate protocol with an injectable `connect` for hermetic tests (mirrors `agent/session.py`); `audio.py` plays the PCM (default) or writes a WAV (`--out`). The single-voice default-playback path **streams**: `synthesize`'s `on_audio(chunk, sample_rate)` callback is wired to `audio.PcmPlayer.feed`, so speech starts on the first Audio frame (it opens the device lazily, since the rate is only known at Begin) instead of after the whole text — the win for a long `--url` page. `--out` (needs the full buffer) and the multi-voice dialogue path (`synthesize_dialogue` → `_output_audio` → buffered `play_pcm`) stay buffered; `synthesize` still returns the complete PCM for the summary regardless. - **`code_gen/`** — backs `--show-code` on `transcribe`/`stream`/`agent`: builds a ready-to-run Python SDK script from exactly the flags passed (no API key needed; generated code reads `ASSEMBLYAI_API_KEY`). - **`auth/`** — browser-assisted `assembly login` via AMS + **Stytch B2B OAuth discovery** (`discovery.py`, `flow.py`, `loopback.py`, `ams.py`). Not Stytch Connected Apps. diff --git a/aai_cli/agent_cascade/brain.py b/aai_cli/agent_cascade/brain.py index da55e18a..b5ebc246 100644 --- a/aai_cli/agent_cascade/brain.py +++ b/aai_cli/agent_cascade/brain.py @@ -51,15 +51,12 @@ def invoke( """Run one step of the graph, returning the updated state (incl. messages).""" -# Verbose (`-v`) flow logging for the agent's tool loop. `invoke` runs the whole loop -# internally, so without this `-v` only shows the httpx request lines and never which -# tools the agent reached for or what they returned — exactly what you need to see when -# a spoken turn stalls mid-tool. Logged at INFO so plain `-v` surfaces it. +# Verbose (`-v`) flow logging for the agent's tool loop: `invoke` runs the whole loop internally, +# so without this `-v` never shows which tools the agent reached for when a spoken turn stalls. _FLOW_LOG = logging.getLogger("aai_cli.agent_cascade.brain") -# Tool outputs (a fetched page, a search payload) can be huge; cap what we log per result -# so a single tool call doesn't bury the rest of the flow in stderr. The exact cap is an -# arbitrary tuning knob — a +-1 shift is behaviorally equivalent, so no test can kill it. +# Tool outputs (a fetched page, a search payload) can be huge; cap what we log per result so a +# single tool call doesn't bury the flow. The exact cap is an arbitrary knob (no test can kill it). _RESULT_LOG_CAP = 500 # pragma: no mutate # Human, speakable labels for the tool affordance the live UI shows while a tool runs (so a @@ -89,8 +86,7 @@ def _tool_label(name: str) -> str: # Spoken filler the agent says aloud when it pauses for a tool, so a hands-free turn fills the # silent tool round-trip with *why* it paused instead of dead air (the audible counterpart to the # visual `_TOOL_LABELS` affordance). Each tool gets a few short, speakable variants the engine -# rotates across turns; unknown/MCP tools fall back to `_GENERIC_FILLERS`. Spoken-style only — no -# markdown, no trailing detail — since they're synthesized straight to TTS ahead of the answer. +# rotates across turns; unknown/MCP tools fall back to `_GENERIC_FILLERS` (spoken-style, no markdown). _GENERIC_FILLERS: tuple[str, ...] = ("One sec.", "Let me check.") _TOOL_FILLERS: dict[str, tuple[str, ...]] = { @@ -281,7 +277,11 @@ def build_graph( model=model, tools=builtin + extra, system_prompt=build_system_prompt( - config.system_prompt, tools=builtin, extra_tools=extra, files=config.files + config.system_prompt, + tools=builtin, + extra_tools=extra, + files=config.files, + project_context=config.project_context, ), middleware=_build_middleware(config), **_graph_kwargs(config), diff --git a/aai_cli/agent_cascade/config.py b/aai_cli/agent_cascade/config.py index b06bb0ff..fd038d89 100644 --- a/aai_cli/agent_cascade/config.py +++ b/aai_cli/agent_cascade/config.py @@ -73,3 +73,6 @@ class CascadeConfig: # behavior unchanged (the default in-memory backend, no gating, nothing advertised); on # swaps to a real-cwd FilesystemBackend and gates writes behind human approval. files: bool = False + # The launch directory's AGENTS.md/CLAUDE.md, read into the system prompt so the agent + # answers grounded in the project it's run from (None when no instruction file is present). + project_context: str | None = None diff --git a/aai_cli/agent_cascade/project_context.py b/aai_cli/agent_cascade/project_context.py new file mode 100644 index 00000000..490959e4 --- /dev/null +++ b/aai_cli/agent_cascade/project_context.py @@ -0,0 +1,67 @@ +"""Read project-instruction files (``AGENTS.md``/``CLAUDE.md``) into the live agent's context. + +`assembly live` runs in the user's working directory, so — like a coding agent — it reads the +project's instruction files into its system prompt when present, giving spoken answers grounded +in the project it's launched from. ``AGENTS.md`` is the cross-agent standard and ``CLAUDE.md`` is +frequently a symlink to it, so identical content is included once, and the total is capped so an +oversized instructions file can't crowd the conversation out of the model's window. +""" + +from __future__ import annotations + +from pathlib import Path + +# The instruction files an agentic CLI reads into context, highest precedence first. +CONTEXT_FILENAMES = ("AGENTS.md", "CLAUDE.md") + +# Cap the injected context: the spoken agent only needs the project's gist, and an unusually +# large instructions file would otherwise crowd the live conversation out of the model's window. +# A +-1 shift in the budget is behaviorally equivalent, so no test can kill a mutant on it. +MAX_CONTEXT_CHARS = 16000 # pragma: no mutate + +# Appended when the content is truncated, so the model knows it's seeing only the head of the file. +_TRUNCATION_MARKER = "\n\n[project context truncated]" + + +def _read_instructions(path: Path) -> str | None: + """The stripped contents of one instruction file, or ``None`` if absent/unreadable/empty.""" + try: + text = path.read_text(encoding="utf-8").strip() + except OSError: + return None + return text or None + + +def _truncate(combined: str) -> str: + """Cap the combined context at :data:`MAX_CONTEXT_CHARS`, marking it when truncated. + + The marker is counted against the budget (the slice leaves room for it), so the returned + string never exceeds :data:`MAX_CONTEXT_CHARS` — the cap is a true upper bound, not a target + the marker then overshoots. + """ + if len(combined) > MAX_CONTEXT_CHARS: + return combined[: MAX_CONTEXT_CHARS - len(_TRUNCATION_MARKER)] + _TRUNCATION_MARKER + return combined + + +def load_project_context(directory: Path | None = None) -> str | None: + """Read the project-instruction files in *directory* into one de-duplicated string. + + Looks for each name in :data:`CONTEXT_FILENAMES` under *directory* (the current working + directory by default), returning their stripped contents joined under a per-file heading — + or ``None`` when none are present, readable, or non-empty. Identical files (``CLAUDE.md`` is + commonly a symlink to ``AGENTS.md``) are included once, and the combined text is truncated to + :data:`MAX_CONTEXT_CHARS` so a huge file can't crowd out the live conversation. + """ + base = Path.cwd() if directory is None else directory + sections: list[str] = [] + seen: set[str] = set() + for name in CONTEXT_FILENAMES: + text = _read_instructions(base / name) + if text is None or text in seen: + continue + seen.add(text) + sections.append(f"# {name}\n\n{text}") + if not sections: + return None + return _truncate("\n\n".join(sections)) diff --git a/aai_cli/agent_cascade/prompt.py b/aai_cli/agent_cascade/prompt.py index 27c94e80..af1f7190 100644 --- a/aai_cli/agent_cascade/prompt.py +++ b/aai_cli/agent_cascade/prompt.py @@ -69,6 +69,20 @@ "replacing the whole file unless asked." ) +# Introduces the launch directory's AGENTS.md/CLAUDE.md when one is present, so the model treats +# it as project background to ground its answers rather than as another instruction to recite. +_PROJECT_CONTEXT_INTRO = ( + "The following is background on the project in your working directory, taken from its " + "AGENTS.md/CLAUDE.md. Use it to ground your answers, but keep your reply short and spoken." +) + + +def _append_project_context(prompt: str, project_context: str | None) -> str: + """Append the launch directory's instruction files to the prompt as project background.""" + if not project_context: + return prompt + return f"{prompt}\n\n{_PROJECT_CONTEXT_INTRO}\n\n{project_context}" + def _join_clause(parts: list[str]) -> str: """Join capability phrases into a readable clause: ``a``, ``a and b``, ``a, b, and c``.""" @@ -121,6 +135,7 @@ def build_system_prompt( tools: Sequence[BaseTool], extra_tools: Sequence[BaseTool] = (), files: bool = False, + project_context: str | None = None, ) -> str: """The live agent's system prompt: the user's persona plus tool guidance. @@ -134,6 +149,8 @@ def build_system_prompt( its own knowledge. Whenever tools are bound the guidance also tells the model to report tool outcomes honestly (never narrate a success the tool didn't return), and the ``--files`` path adds a warning to confirm before irreversible writes or code execution. + ``project_context`` (the launch directory's AGENTS.md/CLAUDE.md) is appended as project + background when present, so the agent's answers are grounded in the project it's run from. """ capabilities = _tool_capabilities(tools) extra = _extra_capability(extra_tools) @@ -142,7 +159,9 @@ def build_system_prompt( if files: capabilities.append(_FILE_CAPABILITY) if not capabilities: - return f"{persona}\n\n{_PERSONA_LATCH} {_NO_TOOLS_GUIDANCE}" + return _append_project_context( + f"{persona}\n\n{_PERSONA_LATCH} {_NO_TOOLS_GUIDANCE}", project_context + ) guidance = ( f"You can use tools to help answer: {_join_clause(capabilities)}. Reach for a " "tool when a question needs fresh or external information; answer directly and " @@ -151,4 +170,6 @@ def build_system_prompt( ) if files: guidance = f"{guidance} {_FILE_SAFETY_GUIDANCE}" - return f"{persona}\n\n{_PERSONA_LATCH} {guidance} {_SPOKEN_TAIL}" + return _append_project_context( + f"{persona}\n\n{_PERSONA_LATCH} {guidance} {_SPOKEN_TAIL}", project_context + ) diff --git a/aai_cli/commands/agent_cascade/_exec.py b/aai_cli/commands/agent_cascade/_exec.py index 1b8e2567..ff978493 100644 --- a/aai_cli/commands/agent_cascade/_exec.py +++ b/aai_cli/commands/agent_cascade/_exec.py @@ -20,6 +20,7 @@ from aai_cli.agent.render import AgentRenderer from aai_cli.agent_cascade import engine, firecrawl_search, mcp_tools, voices from aai_cli.agent_cascade.config import DEFAULT_MAX_HISTORY, CascadeConfig +from aai_cli.agent_cascade.project_context import load_project_context from aai_cli.app.agent_shared import resolve_system_prompt as _resolve_system_prompt from aai_cli.app.agent_shared import validate_voice from aai_cli.app.context import AppState @@ -331,6 +332,9 @@ def run_agent_cascade(opts: AgentCascadeOptions, state: AppState, *, json_mode: tts_extra=tts_extra, mcp_servers=mcp_servers, files=opts.files, + # Read the launch directory's AGENTS.md/CLAUDE.md into context, so the agent answers + # grounded in the project it's run from (like a coding agent). + project_context=load_project_context(), ) if _should_use_tui(from_file=from_file, json_mode=json_mode, text_mode=text_mode): diff --git a/tests/test_agent_cascade_brain.py b/tests/test_agent_cascade_brain.py index 7a12f9e5..7f83e452 100644 --- a/tests/test_agent_cascade_brain.py +++ b/tests/test_agent_cascade_brain.py @@ -223,6 +223,24 @@ def fake_create(*, model, tools, system_prompt, middleware): assert any(isinstance(mw, ToolCallLimitMiddleware) for mw in captured["middleware"]) +def test_build_graph_threads_project_context_into_system_prompt(monkeypatch): + import deepagents + + captured = {} + + def fake_create(*, model, tools, system_prompt, middleware): + del model, tools, middleware + captured["system_prompt"] = system_prompt + return "graph" + + monkeypatch.setattr(deepagents, "create_deep_agent", fake_create) + monkeypatch.setattr(model_mod, "build_model", lambda *a, **k: object()) + cfg = CascadeConfig(project_context="# AGENTS.md\n\nRun uv sync first.") + brain.build_graph("k", cfg, tools=[], mcp_tools=[]) + # The launch directory's instruction file rides into the live agent's system prompt. + assert "Run uv sync first." in captured["system_prompt"] + + def test_build_graph_loads_mcp_tools_from_config_when_not_injected(monkeypatch): import deepagents diff --git a/tests/test_agent_cascade_project_context.py b/tests/test_agent_cascade_project_context.py new file mode 100644 index 00000000..088395af --- /dev/null +++ b/tests/test_agent_cascade_project_context.py @@ -0,0 +1,124 @@ +"""Tests for the live agent's project-context loader (aai_cli.agent_cascade.project_context). + +`assembly live` reads the launch directory's AGENTS.md/CLAUDE.md into its system prompt so a +spoken answer is grounded in the project it's run from — the same convention coding agents follow. +""" + +from __future__ import annotations + +import types + +from aai_cli.agent_cascade import project_context +from aai_cli.app.context import AppState +from aai_cli.commands.agent_cascade import _exec +from aai_cli.commands.agent_cascade._exec import run_agent_cascade +from aai_cli.core import config +from tests.test_agent_cascade_command import _opts + + +def test_returns_none_when_no_instruction_files(tmp_path): + # An empty directory has nothing to inject, so the prompt stays the plain persona. + assert project_context.load_project_context(tmp_path) is None + + +def test_reads_agents_md_under_a_heading(tmp_path): + (tmp_path / "AGENTS.md").write_text("Use uv run for everything.", encoding="utf-8") + loaded = project_context.load_project_context(tmp_path) + # The content is included verbatim under a per-file heading naming its source. + assert loaded == "# AGENTS.md\n\nUse uv run for everything." + + +def test_reads_claude_md_when_agents_md_absent(tmp_path): + (tmp_path / "CLAUDE.md").write_text("Project rules here.", encoding="utf-8") + loaded = project_context.load_project_context(tmp_path) + assert loaded == "# CLAUDE.md\n\nProject rules here." + + +def test_includes_both_files_in_precedence_order_when_they_differ(tmp_path): + (tmp_path / "AGENTS.md").write_text("Agents rules.", encoding="utf-8") + (tmp_path / "CLAUDE.md").write_text("Claude rules.", encoding="utf-8") + loaded = project_context.load_project_context(tmp_path) + # Both distinct files are present, AGENTS.md first (its precedence), then CLAUDE.md. + assert loaded == "# AGENTS.md\n\nAgents rules.\n\n# CLAUDE.md\n\nClaude rules." + + +def test_identical_content_is_included_once(tmp_path): + # CLAUDE.md is commonly a symlink to AGENTS.md (as in this repo); identical content must not + # be duplicated into the prompt. We assert the dedup on content, so it covers the symlink case + # without depending on symlink support being available on the test platform. + (tmp_path / "AGENTS.md").write_text("Same guidance.", encoding="utf-8") + (tmp_path / "CLAUDE.md").write_text("Same guidance.", encoding="utf-8") + loaded = project_context.load_project_context(tmp_path) + assert loaded == "# AGENTS.md\n\nSame guidance." + assert loaded.count("Same guidance.") == 1 + + +def test_whitespace_only_file_is_skipped(tmp_path): + # A blank instruction file carries no guidance, so it's treated as absent (None, not an + # empty heading) — the stripped-empty branch. + (tmp_path / "AGENTS.md").write_text(" \n\t\n", encoding="utf-8") + assert project_context.load_project_context(tmp_path) is None + + +def test_oversized_content_is_truncated_to_the_budget(tmp_path): + body = "x" * (project_context.MAX_CONTEXT_CHARS + 5000) + (tmp_path / "AGENTS.md").write_text(body, encoding="utf-8") + loaded = project_context.load_project_context(tmp_path) + assert loaded is not None + # The marker is counted against the budget, so the total never exceeds the cap — it's a true + # upper bound, not a target the marker overshoots. + assert loaded.endswith("[project context truncated]") + assert len(loaded) == project_context.MAX_CONTEXT_CHARS + assert len(loaded) < len(body) + + +def test_content_at_the_budget_is_left_whole(tmp_path): + # A file exactly at the cap is included untruncated (the boundary is inclusive). + # Account for the "# AGENTS.md\n\n" heading so the combined string lands exactly at the cap. + heading = "# AGENTS.md\n\n" + body = "y" * (project_context.MAX_CONTEXT_CHARS - len(heading)) + (tmp_path / "AGENTS.md").write_text(body, encoding="utf-8") + loaded = project_context.load_project_context(tmp_path) + assert loaded is not None + assert "truncated" not in loaded + assert len(loaded) == project_context.MAX_CONTEXT_CHARS + + +def test_defaults_to_the_current_working_directory(tmp_path, monkeypatch): + (tmp_path / "AGENTS.md").write_text("cwd guidance", encoding="utf-8") + monkeypatch.chdir(tmp_path) + # No directory argument -> reads cwd, so the live command picks up the project it's launched in. + assert project_context.load_project_context() == "# AGENTS.md\n\ncwd guidance" + + +def test_missing_directory_reads_as_no_context(tmp_path): + # A nonexistent base directory raises OSError per candidate, which is swallowed -> None. + assert project_context.load_project_context(tmp_path / "does-not-exist") is None + + +def test_context_filenames_order(): + # AGENTS.md (the cross-agent standard) takes precedence over CLAUDE.md. + assert project_context.CONTEXT_FILENAMES == ("AGENTS.md", "CLAUDE.md") + + +# --- command wiring: run_agent_cascade reads the loader into the config ------ + + +def test_run_reads_project_context_into_config(monkeypatch): + monkeypatch.setattr(_exec.tts_session, "require_available", lambda _c: None) + monkeypatch.setattr(config, "resolve_api_key", lambda **_: "k") + monkeypatch.setattr(_exec, "FileSource", lambda src: types.SimpleNamespace(sample_rate=16000)) + monkeypatch.setattr(_exec.client, "resolve_audio_source", lambda source, sample: "clip.wav") + # Stub the loader so the assertion doesn't depend on the repo's own (large) instruction file. + monkeypatch.setattr(_exec, "load_project_context", lambda: "# AGENTS.md\n\nProject background.") + captured = {} + + def fake_real(api_key, config, *, audio, stt_params, approver=None): + captured["config"] = config + return "deps" + + monkeypatch.setattr(_exec.engine.CascadeDeps, "real", fake_real) + monkeypatch.setattr(_exec.engine, "run_cascade", lambda **kwargs: None) + run_agent_cascade(_opts(source="clip.wav"), AppState(), json_mode=False) + # The launch directory's AGENTS.md/CLAUDE.md rides into the cascade config. + assert captured["config"].project_context == "# AGENTS.md\n\nProject background." diff --git a/tests/test_agent_cascade_prompt.py b/tests/test_agent_cascade_prompt.py index 7c45161b..b1b88098 100644 --- a/tests/test_agent_cascade_prompt.py +++ b/tests/test_agent_cascade_prompt.py @@ -200,3 +200,41 @@ def test_datetime_tool_advertised_in_system_prompt(): "persona", tools=[_NamedTool(datetime_tool.DATETIME_TOOL_NAME)] ) assert "current date and time" in text + + +# --- project context (AGENTS.md/CLAUDE.md) ----------------------------------- + + +def test_system_prompt_appends_project_context_when_present(): + # The launch directory's instruction file is appended as project background, introduced so + # the model treats it as grounding rather than another instruction to recite. + text = prompt.build_system_prompt( + "persona", tools=[], project_context="# AGENTS.md\n\nUse uv run." + ) + assert "background on the project in your working directory" in text + assert "# AGENTS.md\n\nUse uv run." in text + # It lands after the persona/guidance, not before it. + assert text.index("persona") < text.index("Use uv run.") + + +def test_system_prompt_appends_project_context_on_the_tools_path(): + # The append happens whether or not tools are bound (the capabilities branch too). + text = prompt.build_system_prompt( + "persona", + tools=[_NamedTool(prompt.WEB_SEARCH_TOOL_NAME)], + project_context="# AGENTS.md\n\nProject facts.", + ) + assert "search the web" in text + assert "Project facts." in text + + +def test_system_prompt_omits_project_context_section_when_absent(): + # With no instruction file the prompt is unchanged — no dangling background heading. + text = prompt.build_system_prompt("persona", tools=[], project_context=None) + assert "background on the project in your working directory" not in text + + +def test_system_prompt_treats_empty_project_context_as_absent(): + # An empty string is falsy, so no background section is appended. + text = prompt.build_system_prompt("persona", tools=[], project_context="") + assert "background on the project in your working directory" not in text