diff --git a/REFERENCE.md b/REFERENCE.md
index f052b6ce..df511524 100644
--- a/REFERENCE.md
+++ b/REFERENCE.md
@@ -160,6 +160,14 @@ missing `npx`/`uvx`, an offline host) drops only its own tools, so a single brok
 tool never sinks the session. MCP tools are a live-run feature and are not
 reflected in `--show-code` output.
 
+If the directory you launch from has an `AGENTS.md` or `CLAUDE.md`, `assembly live`
+reads it into the agent's context — the same convention coding agents follow — so
+spoken answers are grounded in the project at hand. `AGENTS.md` takes precedence
+(and identical content, e.g. a `CLAUDE.md` symlinked to it, is included once); an
+oversized file is truncated so it can't crowd out the conversation. This is
+independent of `--files` (it happens even under `--no-files`, when the agent can't
+touch the filesystem) and is not reflected in `--show-code` output.
+
 The agent reads, writes, and runs code in the directory you launch it from (on by
 default; pass `--no-files` to disable). Reads run immediately; a write, edit, or command run pauses
 the turn for confirmation in the voice TUI — press `y`/`n` (`a` approves the rest of the
diff --git a/aai_cli/AGENTS.md b/aai_cli/AGENTS.md
index 7dd31818..926605d1 100644
--- a/aai_cli/AGENTS.md
+++ b/aai_cli/AGENTS.md
@@ -153,7 +153,7 @@ heavily-reworked commands with long bodies; small commands keep the inline
 - **`streaming/`** + `client.stream_audio` — v3 realtime API. Event callbacks run on the SDK reader thread and guard against `BrokenPipeError` (`stdio.silence_stdout()`) so a closed pipe never dumps a thread traceback.
 - **`core/sync_stt.py`** + **`core/signals.py`** + `commands/dictate/` — `assembly dictate`: headless dictation over the **Sync STT API** (`Environment.sync_base`, one POST `/transcribe` per utterance with the required `X-AAI-Model: u3-sync-pro` header; 80 ms–120 s of PCM/WAV). It needs no terminal: recording starts immediately and `dictate_exec._record` polls `signals.stop_on_terminate` between ~100 ms mic chunks for a SIGTERM, which finishes the utterance (clean exit 0) — so a hotkey tool like Hammerspoon can launch it as a background task and `kill -TERM`/`task:terminate()` to transcribe. SIGINT (Ctrl-C) still cancels (exit 130). Both boundaries (the stop latch, mic, HTTP) are injectable, so the suite never needs a real signal or microphone (`tests/test_dictate_exec.py` scripts the SIGTERM latch). Contrast `signals.terminate_as_interrupt` (used by `stream`/`agent`/`speak`), which routes SIGTERM into the *cancel* path instead.
 - **`agent/`** — full-duplex voice agent (mic in, TTS out via `voices.py`).
-- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, clause-level streaming TTS, barge-in — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`) streamed token-by-token via `brain.build_streamer` (`graph.stream(stream_mode="messages")`): **context-window management is the brain's job, not the engine's** — `create_deep_agent` wires deepagents' own `SummarizationMiddleware` into the stack (summarize the oldest turns, offload the evicted history to a file), so the engine feeds the *full* untrimmed running history each turn and lets the graph compact it; the old client-side `text.trim_history`/`config.max_history` sliding window is gone from this path (`max_history` now only drives the hand-rolled `--show-code`/`assembly init` cascade, which doesn't use deepagents). The engine buffers `SpeechDelta`s, flushes complete clauses with `text.pop_clauses` (soft-separator clauses gated by `engine._MIN_CLAUSE_CHARS`), and synthesizes each clause with **streaming TTS** (`tts.session.synthesize(on_audio=…)`) so audio starts on the first frame instead of after the whole reply. The reply runs on a throwaway producer thread feeding a `queue.Queue` the worker drains under a monotonic deadline (the wall-clock backstop that replaced `_complete_within`), and an abandoned-on-timeout graph leg's langchain `ThreadPoolExecutor` worker is detached (`_detach_executor_threads_since`) so it can't wedge interpreter exit. A `ToolNotice` surfaces the "Searching the web…" affordance and drops any unspoken preamble. Under `-v` (`debuglog.active()`) `brain._stream_graph` logs each accumulated assistant line, tool call, and tool result as it streams. **Front-end:** an interactive mic session in human mode runs a **voice-only Textual TUI** (`agent_cascade/tui.py`, `LiveAgentApp`) by default — there's no text input (you can't type to it), just a transcript + an animated voice bar tracking listening/thinking/speaking. It uses its own `banner` wordmark, `messages` widgets, and `tui_status.voicebar_markup`/`VOICE_FRAMES` — all modules that now live in `agent_cascade/`; the blocking `run_cascade` runs on a worker thread and reaches the UI through a `_TuiRenderer` (the `engine.Renderer` protocol) that hops each call onto the UI thread, and a quit calls `DuplexAudio.close` to end the mic iterator and unblock that worker. `_exec._should_use_tui` gates it: file/sample input, `--json`/`-o text`, and a non-TTY all fall back to the plain `AgentRenderer` line output. **`--files`** (on by default; `--no-files` opts out) swaps the brain's in-memory backend for a real-cwd, sandbox-capable `SandboxedShellBackend` (`aai_cli/agent_cascade/sandbox.py`): file ops behave as before (traversal-blocked `virtual_mode`), and because it implements `SandboxBackendProtocol` deepagents binds a *functional* `execute` that runs commands OS-sandboxed in the real cwd — `sandbox-exec` (SBPL) on macOS, `bwrap` on Linux, refused (never an unconfined fallback) on any other platform or with the sandbox binary missing; the OS sandbox blocks the network, confines writes to cwd (+ the temp dir), and read-denies credential stores (`~/.ssh`/`~/.aws`/…, `.env*`, `.claude/`). The policy renderers are pure and the subprocess/capability boundaries injected, so the suite asserts *what we'd run* with no real sandbox. `write_file`/`edit_file`/`execute` are gated via `interrupt_on` + an `InMemorySaver`; `brain._stream_gated` detects the post-stream interrupt (`graph.get_state(config).interrupts`), asks an injected `Approver`, and resumes with `Command(resume=…)`, bracketing the human wait in `ApprovalPause` events so `engine._consume` suspends its reply deadline (`risk.py` surfaces a shell-risk warning on the prompt). The voice TUI supplies the approver via `agent_cascade.modals.ApprovalScreen` (`y`/`a`/`n`), which can *also* be resolved hands-free by voice: while a write awaits approval, `_consume` arms `_awaiting_approval` and `engine.on_turn` routes the next final transcript to `app.submit_voice_approval` → `ApprovalScreen.try_voice`, which applies `spoken_approval.spoken_decision` (an unambiguous affirmative approves, anything else rejects — fail-safe; destructive `risk.py`-flagged commands ignore the spoken answer and require a keypress). Headless runs auto-deny (`_exec._deny_writes`). `--files` also turns on durable per-project memory via deepagents' `MemoryMiddleware` (`memory=["./.deepagents/AGENTS.md"]`), distinct from the in-session `InMemorySaver`, and binds one gateway-bound, sandbox-backed general-purpose subagent (deepagents' `task` tool; spec in `agent_cascade/subagents.py`, omitting `model`/`tools` so it inherits both) for delegating a focused subtask. The subagent's own `interrupt_on` mirrors `_WRITE_TOOLS`, and a delegated `write_file`/`edit_file`/`execute` surfaces at the *parent* `get_state().interrupts` (so `_pending_writes` gates it too — verified by a HITL spike, locked in `tests/test_agent_cascade_subagents.py`). Reads (incl. `grep`) stay ungated.
+- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, clause-level streaming TTS, barge-in — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`) streamed token-by-token via `brain.build_streamer` (`graph.stream(stream_mode="messages")`): **context-window management is the brain's job, not the engine's** — `create_deep_agent` wires deepagents' own `SummarizationMiddleware` into the stack (summarize the oldest turns, offload the evicted history to a file), so the engine feeds the *full* untrimmed running history each turn and lets the graph compact it; the old client-side `text.trim_history`/`config.max_history` sliding window is gone from this path (`max_history` now only drives the hand-rolled `--show-code`/`assembly init` cascade, which doesn't use deepagents). The engine buffers `SpeechDelta`s, flushes complete clauses with `text.pop_clauses` (soft-separator clauses gated by `engine._MIN_CLAUSE_CHARS`), and synthesizes each clause with **streaming TTS** (`tts.session.synthesize(on_audio=…)`) so audio starts on the first frame instead of after the whole reply. The reply runs on a throwaway producer thread feeding a `queue.Queue` the worker drains under a monotonic deadline (the wall-clock backstop that replaced `_complete_within`), and an abandoned-on-timeout graph leg's langchain `ThreadPoolExecutor` worker is detached (`_detach_executor_threads_since`) so it can't wedge interpreter exit. A `ToolNotice` surfaces the "Searching the web…" affordance and drops any unspoken preamble. Under `-v` (`debuglog.active()`) `brain._stream_graph` logs each accumulated assistant line, tool call, and tool result as it streams. **Front-end:** an interactive mic session in human mode runs a **voice-only Textual TUI** (`agent_cascade/tui.py`, `LiveAgentApp`) by default — there's no text input (you can't type to it), just a transcript + an animated voice bar tracking listening/thinking/speaking. It uses its own `banner` wordmark, `messages` widgets, and `tui_status.voicebar_markup`/`VOICE_FRAMES` — all modules that now live in `agent_cascade/`; the blocking `run_cascade` runs on a worker thread and reaches the UI through a `_TuiRenderer` (the `engine.Renderer` protocol) that hops each call onto the UI thread, and a quit calls `DuplexAudio.close` to end the mic iterator and unblock that worker. `_exec._should_use_tui` gates it: file/sample input, `--json`/`-o text`, and a non-TTY all fall back to the plain `AgentRenderer` line output. **`--files`** (on by default; `--no-files` opts out) swaps the brain's in-memory backend for a real-cwd, sandbox-capable `SandboxedShellBackend` (`aai_cli/agent_cascade/sandbox.py`): file ops behave as before (traversal-blocked `virtual_mode`), and because it implements `SandboxBackendProtocol` deepagents binds a *functional* `execute` that runs commands OS-sandboxed in the real cwd — `sandbox-exec` (SBPL) on macOS, `bwrap` on Linux, refused (never an unconfined fallback) on any other platform or with the sandbox binary missing; the OS sandbox blocks the network, confines writes to cwd (+ the temp dir), and read-denies credential stores (`~/.ssh`/`~/.aws`/…, `.env*`, `.claude/`). The policy renderers are pure and the subprocess/capability boundaries injected, so the suite asserts *what we'd run* with no real sandbox. `write_file`/`edit_file`/`execute` are gated via `interrupt_on` + an `InMemorySaver`; `brain._stream_gated` detects the post-stream interrupt (`graph.get_state(config).interrupts`), asks an injected `Approver`, and resumes with `Command(resume=…)`, bracketing the human wait in `ApprovalPause` events so `engine._consume` suspends its reply deadline (`risk.py` surfaces a shell-risk warning on the prompt). The voice TUI supplies the approver via `agent_cascade.modals.ApprovalScreen` (`y`/`a`/`n`), which can *also* be resolved hands-free by voice: while a write awaits approval, `_consume` arms `_awaiting_approval` and `engine.on_turn` routes the next final transcript to `app.submit_voice_approval` → `ApprovalScreen.try_voice`, which applies `spoken_approval.spoken_decision` (an unambiguous affirmative approves, anything else rejects — fail-safe; destructive `risk.py`-flagged commands ignore the spoken answer and require a keypress). **Project grounding (independent of `--files`):** `_exec.run_agent_cascade` reads the launch directory's `AGENTS.md`/`CLAUDE.md` via `agent_cascade/project_context.load_project_context()` into `CascadeConfig.project_context`, which `brain.build_graph` threads into `prompt.build_system_prompt(..., project_context=…)` (appended as project background after the persona/tool guidance). `AGENTS.md` wins precedence, identical content (a symlinked `CLAUDE.md`) is de-duplicated, and the total is capped at `project_context.MAX_CONTEXT_CHARS`. It's read at the command boundary (not in `build_graph`) so the brain stays hermetic, and the `--show-code` path builds its own config without it. Headless runs auto-deny (`_exec._deny_writes`). `--files` also turns on durable per-project memory via deepagents' `MemoryMiddleware` (`memory=["./.deepagents/AGENTS.md"]`), distinct from the in-session `InMemorySaver`, and binds one gateway-bound, sandbox-backed general-purpose subagent (deepagents' `task` tool; spec in `agent_cascade/subagents.py`, omitting `model`/`tools` so it inherits both) for delegating a focused subtask. The subagent's own `interrupt_on` mirrors `_WRITE_TOOLS`, and a delegated `write_file`/`edit_file`/`execute` surfaces at the *parent* `get_state().interrupts` (so `_pending_writes` gates it too — verified by a HITL spike, locked in `tests/test_agent_cascade_subagents.py`). Reads (incl. `grep`) stay ungated.
 - **`tts/`** + `commands/speak.py` — `assembly speak` synthesizes text to speech over the sandbox streaming-TTS WebSocket (`streaming-tts.sandbox000.…`). **Sandbox-only:** `session.is_available()` is false in production (empty `Environment.streaming_tts_host`), so the command exits 2 with a `--sandbox` hint. `session.synthesize` drives a Begin→Generate→Flush→Audio→Terminate protocol with an injectable `connect` for hermetic tests (mirrors `agent/session.py`); `audio.py` plays the PCM (default) or writes a WAV (`--out`). The single-voice default-playback path **streams**: `synthesize`'s `on_audio(chunk, sample_rate)` callback is wired to `audio.PcmPlayer.feed`, so speech starts on the first Audio frame (it opens the device lazily, since the rate is only known at Begin) instead of after the whole text — the win for a long `--url` page. `--out` (needs the full buffer) and the multi-voice dialogue path (`synthesize_dialogue` → `_output_audio` → buffered `play_pcm`) stay buffered; `synthesize` still returns the complete PCM for the summary regardless.
 - **`code_gen/`** — backs `--show-code` on `transcribe`/`stream`/`agent`: builds a ready-to-run Python SDK script from exactly the flags passed (no API key needed; generated code reads `ASSEMBLYAI_API_KEY`).
 - **`auth/`** — browser-assisted `assembly login` via AMS + **Stytch B2B OAuth discovery** (`discovery.py`, `flow.py`, `loopback.py`, `ams.py`). Not Stytch Connected Apps.
diff --git a/aai_cli/agent_cascade/brain.py b/aai_cli/agent_cascade/brain.py
index da55e18a..b5ebc246 100644
--- a/aai_cli/agent_cascade/brain.py
+++ b/aai_cli/agent_cascade/brain.py
@@ -51,15 +51,12 @@ def invoke(
         """Run one step of the graph, returning the updated state (incl. messages)."""
 
 
-# Verbose (`-v`) flow logging for the agent's tool loop. `invoke` runs the whole loop
-# internally, so without this `-v` only shows the httpx request lines and never which
-# tools the agent reached for or what they returned — exactly what you need to see when
-# a spoken turn stalls mid-tool. Logged at INFO so plain `-v` surfaces it.
+# Verbose (`-v`) flow logging for the agent's tool loop: `invoke` runs the whole loop internally,
+# so without this `-v` never shows which tools the agent reached for when a spoken turn stalls.
 _FLOW_LOG = logging.getLogger("aai_cli.agent_cascade.brain")
 
-# Tool outputs (a fetched page, a search payload) can be huge; cap what we log per result
-# so a single tool call doesn't bury the rest of the flow in stderr. The exact cap is an
-# arbitrary tuning knob — a +-1 shift is behaviorally equivalent, so no test can kill it.
+# Tool outputs (a fetched page, a search payload) can be huge; cap what we log per result so a
+# single tool call doesn't bury the flow. The exact cap is an arbitrary knob (no test can kill it).
 _RESULT_LOG_CAP = 500  # pragma: no mutate
 
 # Human, speakable labels for the tool affordance the live UI shows while a tool runs (so a
@@ -89,8 +86,7 @@ def _tool_label(name: str) -> str:
 # Spoken filler the agent says aloud when it pauses for a tool, so a hands-free turn fills the
 # silent tool round-trip with *why* it paused instead of dead air (the audible counterpart to the
 # visual `_TOOL_LABELS` affordance). Each tool gets a few short, speakable variants the engine
-# rotates across turns; unknown/MCP tools fall back to `_GENERIC_FILLERS`. Spoken-style only — no
-# markdown, no trailing detail — since they're synthesized straight to TTS ahead of the answer.
+# rotates across turns; unknown/MCP tools fall back to `_GENERIC_FILLERS` (spoken-style, no markdown).
 _GENERIC_FILLERS: tuple[str, ...] = ("One sec.", "Let me check.")
 
 _TOOL_FILLERS: dict[str, tuple[str, ...]] = {
@@ -281,7 +277,11 @@ def build_graph(
         model=model,
         tools=builtin + extra,
         system_prompt=build_system_prompt(
-            config.system_prompt, tools=builtin, extra_tools=extra, files=config.files
+            config.system_prompt,
+            tools=builtin,
+            extra_tools=extra,
+            files=config.files,
+            project_context=config.project_context,
         ),
         middleware=_build_middleware(config),
         **_graph_kwargs(config),
diff --git a/aai_cli/agent_cascade/config.py b/aai_cli/agent_cascade/config.py
index b06bb0ff..fd038d89 100644
--- a/aai_cli/agent_cascade/config.py
+++ b/aai_cli/agent_cascade/config.py
@@ -73,3 +73,6 @@ class CascadeConfig:
     # behavior unchanged (the default in-memory backend, no gating, nothing advertised); on
     # swaps to a real-cwd FilesystemBackend and gates writes behind human approval.
     files: bool = False
+    # The launch directory's AGENTS.md/CLAUDE.md, read into the system prompt so the agent
+    # answers grounded in the project it's run from (None when no instruction file is present).
+    project_context: str | None = None
diff --git a/aai_cli/agent_cascade/project_context.py b/aai_cli/agent_cascade/project_context.py
new file mode 100644
index 00000000..490959e4
--- /dev/null
+++ b/aai_cli/agent_cascade/project_context.py
@@ -0,0 +1,67 @@
+"""Read project-instruction files (``AGENTS.md``/``CLAUDE.md``) into the live agent's context.
+
+`assembly live` runs in the user's working directory, so — like a coding agent — it reads the
+project's instruction files into its system prompt when present, giving spoken answers grounded
+in the project it's launched from. ``AGENTS.md`` is the cross-agent standard and ``CLAUDE.md`` is
+frequently a symlink to it, so identical content is included once, and the total is capped so an
+oversized instructions file can't crowd the conversation out of the model's window.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+# The instruction files an agentic CLI reads into context, highest precedence first.
+CONTEXT_FILENAMES = ("AGENTS.md", "CLAUDE.md")
+
+# Cap the injected context: the spoken agent only needs the project's gist, and an unusually
+# large instructions file would otherwise crowd the live conversation out of the model's window.
+# A +-1 shift in the budget is behaviorally equivalent, so no test can kill a mutant on it.
+MAX_CONTEXT_CHARS = 16000  # pragma: no mutate
+
+# Appended when the content is truncated, so the model knows it's seeing only the head of the file.
+_TRUNCATION_MARKER = "\n\n[project context truncated]"
+
+
+def _read_instructions(path: Path) -> str | None:
+    """The stripped contents of one instruction file, or ``None`` if absent/unreadable/empty."""
+    try:
+        text = path.read_text(encoding="utf-8").strip()
+    except OSError:
+        return None
+    return text or None
+
+
+def _truncate(combined: str) -> str:
+    """Cap the combined context at :data:`MAX_CONTEXT_CHARS`, marking it when truncated.
+
+    The marker is counted against the budget (the slice leaves room for it), so the returned
+    string never exceeds :data:`MAX_CONTEXT_CHARS` — the cap is a true upper bound, not a target
+    the marker then overshoots.
+    """
+    if len(combined) > MAX_CONTEXT_CHARS:
+        return combined[: MAX_CONTEXT_CHARS - len(_TRUNCATION_MARKER)] + _TRUNCATION_MARKER
+    return combined
+
+
+def load_project_context(directory: Path | None = None) -> str | None:
+    """Read the project-instruction files in *directory* into one de-duplicated string.
+
+    Looks for each name in :data:`CONTEXT_FILENAMES` under *directory* (the current working
+    directory by default), returning their stripped contents joined under a per-file heading —
+    or ``None`` when none are present, readable, or non-empty. Identical files (``CLAUDE.md`` is
+    commonly a symlink to ``AGENTS.md``) are included once, and the combined text is truncated to
+    :data:`MAX_CONTEXT_CHARS` so a huge file can't crowd out the live conversation.
+    """
+    base = Path.cwd() if directory is None else directory
+    sections: list[str] = []
+    seen: set[str] = set()
+    for name in CONTEXT_FILENAMES:
+        text = _read_instructions(base / name)
+        if text is None or text in seen:
+            continue
+        seen.add(text)
+        sections.append(f"# {name}\n\n{text}")
+    if not sections:
+        return None
+    return _truncate("\n\n".join(sections))
diff --git a/aai_cli/agent_cascade/prompt.py b/aai_cli/agent_cascade/prompt.py
index 27c94e80..af1f7190 100644
--- a/aai_cli/agent_cascade/prompt.py
+++ b/aai_cli/agent_cascade/prompt.py
@@ -69,6 +69,20 @@
     "replacing the whole file unless asked."
 )
 
+# Introduces the launch directory's AGENTS.md/CLAUDE.md when one is present, so the model treats
+# it as project background to ground its answers rather than as another instruction to recite.
+_PROJECT_CONTEXT_INTRO = (
+    "The following is background on the project in your working directory, taken from its "
+    "AGENTS.md/CLAUDE.md. Use it to ground your answers, but keep your reply short and spoken."
+)
+
+
+def _append_project_context(prompt: str, project_context: str | None) -> str:
+    """Append the launch directory's instruction files to the prompt as project background."""
+    if not project_context:
+        return prompt
+    return f"{prompt}\n\n{_PROJECT_CONTEXT_INTRO}\n\n{project_context}"
+
 
 def _join_clause(parts: list[str]) -> str:
     """Join capability phrases into a readable clause: ``a``, ``a and b``, ``a, b, and c``."""
@@ -121,6 +135,7 @@ def build_system_prompt(
     tools: Sequence[BaseTool],
     extra_tools: Sequence[BaseTool] = (),
     files: bool = False,
+    project_context: str | None = None,
 ) -> str:
     """The live agent's system prompt: the user's persona plus tool guidance.
 
@@ -134,6 +149,8 @@ def build_system_prompt(
     its own knowledge. Whenever tools are bound the guidance also tells the model to report
     tool outcomes honestly (never narrate a success the tool didn't return), and the
     ``--files`` path adds a warning to confirm before irreversible writes or code execution.
+    ``project_context`` (the launch directory's AGENTS.md/CLAUDE.md) is appended as project
+    background when present, so the agent's answers are grounded in the project it's run from.
     """
     capabilities = _tool_capabilities(tools)
     extra = _extra_capability(extra_tools)
@@ -142,7 +159,9 @@ def build_system_prompt(
     if files:
         capabilities.append(_FILE_CAPABILITY)
     if not capabilities:
-        return f"{persona}\n\n{_PERSONA_LATCH} {_NO_TOOLS_GUIDANCE}"
+        return _append_project_context(
+            f"{persona}\n\n{_PERSONA_LATCH} {_NO_TOOLS_GUIDANCE}", project_context
+        )
     guidance = (
         f"You can use tools to help answer: {_join_clause(capabilities)}. Reach for a "
         "tool when a question needs fresh or external information; answer directly and "
@@ -151,4 +170,6 @@ def build_system_prompt(
     )
     if files:
         guidance = f"{guidance} {_FILE_SAFETY_GUIDANCE}"
-    return f"{persona}\n\n{_PERSONA_LATCH} {guidance} {_SPOKEN_TAIL}"
+    return _append_project_context(
+        f"{persona}\n\n{_PERSONA_LATCH} {guidance} {_SPOKEN_TAIL}", project_context
+    )
diff --git a/aai_cli/commands/agent_cascade/_exec.py b/aai_cli/commands/agent_cascade/_exec.py
index 1b8e2567..ff978493 100644
--- a/aai_cli/commands/agent_cascade/_exec.py
+++ b/aai_cli/commands/agent_cascade/_exec.py
@@ -20,6 +20,7 @@
 from aai_cli.agent.render import AgentRenderer
 from aai_cli.agent_cascade import engine, firecrawl_search, mcp_tools, voices
 from aai_cli.agent_cascade.config import DEFAULT_MAX_HISTORY, CascadeConfig
+from aai_cli.agent_cascade.project_context import load_project_context
 from aai_cli.app.agent_shared import resolve_system_prompt as _resolve_system_prompt
 from aai_cli.app.agent_shared import validate_voice
 from aai_cli.app.context import AppState
@@ -331,6 +332,9 @@ def run_agent_cascade(opts: AgentCascadeOptions, state: AppState, *, json_mode:
         tts_extra=tts_extra,
         mcp_servers=mcp_servers,
         files=opts.files,
+        # Read the launch directory's AGENTS.md/CLAUDE.md into context, so the agent answers
+        # grounded in the project it's run from (like a coding agent).
+        project_context=load_project_context(),
     )
 
     if _should_use_tui(from_file=from_file, json_mode=json_mode, text_mode=text_mode):
diff --git a/tests/test_agent_cascade_brain.py b/tests/test_agent_cascade_brain.py
index 7a12f9e5..7f83e452 100644
--- a/tests/test_agent_cascade_brain.py
+++ b/tests/test_agent_cascade_brain.py
@@ -223,6 +223,24 @@ def fake_create(*, model, tools, system_prompt, middleware):
     assert any(isinstance(mw, ToolCallLimitMiddleware) for mw in captured["middleware"])
 
 
+def test_build_graph_threads_project_context_into_system_prompt(monkeypatch):
+    import deepagents
+
+    captured = {}
+
+    def fake_create(*, model, tools, system_prompt, middleware):
+        del model, tools, middleware
+        captured["system_prompt"] = system_prompt
+        return "graph"
+
+    monkeypatch.setattr(deepagents, "create_deep_agent", fake_create)
+    monkeypatch.setattr(model_mod, "build_model", lambda *a, **k: object())
+    cfg = CascadeConfig(project_context="# AGENTS.md\n\nRun uv sync first.")
+    brain.build_graph("k", cfg, tools=[], mcp_tools=[])
+    # The launch directory's instruction file rides into the live agent's system prompt.
+    assert "Run uv sync first." in captured["system_prompt"]
+
+
 def test_build_graph_loads_mcp_tools_from_config_when_not_injected(monkeypatch):
     import deepagents
 
diff --git a/tests/test_agent_cascade_project_context.py b/tests/test_agent_cascade_project_context.py
new file mode 100644
index 00000000..088395af
--- /dev/null
+++ b/tests/test_agent_cascade_project_context.py
@@ -0,0 +1,124 @@
+"""Tests for the live agent's project-context loader (aai_cli.agent_cascade.project_context).
+
+`assembly live` reads the launch directory's AGENTS.md/CLAUDE.md into its system prompt so a
+spoken answer is grounded in the project it's run from — the same convention coding agents follow.
+"""
+
+from __future__ import annotations
+
+import types
+
+from aai_cli.agent_cascade import project_context
+from aai_cli.app.context import AppState
+from aai_cli.commands.agent_cascade import _exec
+from aai_cli.commands.agent_cascade._exec import run_agent_cascade
+from aai_cli.core import config
+from tests.test_agent_cascade_command import _opts
+
+
+def test_returns_none_when_no_instruction_files(tmp_path):
+    # An empty directory has nothing to inject, so the prompt stays the plain persona.
+    assert project_context.load_project_context(tmp_path) is None
+
+
+def test_reads_agents_md_under_a_heading(tmp_path):
+    (tmp_path / "AGENTS.md").write_text("Use uv run for everything.", encoding="utf-8")
+    loaded = project_context.load_project_context(tmp_path)
+    # The content is included verbatim under a per-file heading naming its source.
+    assert loaded == "# AGENTS.md\n\nUse uv run for everything."
+
+
+def test_reads_claude_md_when_agents_md_absent(tmp_path):
+    (tmp_path / "CLAUDE.md").write_text("Project rules here.", encoding="utf-8")
+    loaded = project_context.load_project_context(tmp_path)
+    assert loaded == "# CLAUDE.md\n\nProject rules here."
+
+
+def test_includes_both_files_in_precedence_order_when_they_differ(tmp_path):
+    (tmp_path / "AGENTS.md").write_text("Agents rules.", encoding="utf-8")
+    (tmp_path / "CLAUDE.md").write_text("Claude rules.", encoding="utf-8")
+    loaded = project_context.load_project_context(tmp_path)
+    # Both distinct files are present, AGENTS.md first (its precedence), then CLAUDE.md.
+    assert loaded == "# AGENTS.md\n\nAgents rules.\n\n# CLAUDE.md\n\nClaude rules."
+
+
+def test_identical_content_is_included_once(tmp_path):
+    # CLAUDE.md is commonly a symlink to AGENTS.md (as in this repo); identical content must not
+    # be duplicated into the prompt. We assert the dedup on content, so it covers the symlink case
+    # without depending on symlink support being available on the test platform.
+    (tmp_path / "AGENTS.md").write_text("Same guidance.", encoding="utf-8")
+    (tmp_path / "CLAUDE.md").write_text("Same guidance.", encoding="utf-8")
+    loaded = project_context.load_project_context(tmp_path)
+    assert loaded == "# AGENTS.md\n\nSame guidance."
+    assert loaded.count("Same guidance.") == 1
+
+
+def test_whitespace_only_file_is_skipped(tmp_path):
+    # A blank instruction file carries no guidance, so it's treated as absent (None, not an
+    # empty heading) — the stripped-empty branch.
+    (tmp_path / "AGENTS.md").write_text("   \n\t\n", encoding="utf-8")
+    assert project_context.load_project_context(tmp_path) is None
+
+
+def test_oversized_content_is_truncated_to_the_budget(tmp_path):
+    body = "x" * (project_context.MAX_CONTEXT_CHARS + 5000)
+    (tmp_path / "AGENTS.md").write_text(body, encoding="utf-8")
+    loaded = project_context.load_project_context(tmp_path)
+    assert loaded is not None
+    # The marker is counted against the budget, so the total never exceeds the cap — it's a true
+    # upper bound, not a target the marker overshoots.
+    assert loaded.endswith("[project context truncated]")
+    assert len(loaded) == project_context.MAX_CONTEXT_CHARS
+    assert len(loaded) < len(body)
+
+
+def test_content_at_the_budget_is_left_whole(tmp_path):
+    # A file exactly at the cap is included untruncated (the boundary is inclusive).
+    # Account for the "# AGENTS.md\n\n" heading so the combined string lands exactly at the cap.
+    heading = "# AGENTS.md\n\n"
+    body = "y" * (project_context.MAX_CONTEXT_CHARS - len(heading))
+    (tmp_path / "AGENTS.md").write_text(body, encoding="utf-8")
+    loaded = project_context.load_project_context(tmp_path)
+    assert loaded is not None
+    assert "truncated" not in loaded
+    assert len(loaded) == project_context.MAX_CONTEXT_CHARS
+
+
+def test_defaults_to_the_current_working_directory(tmp_path, monkeypatch):
+    (tmp_path / "AGENTS.md").write_text("cwd guidance", encoding="utf-8")
+    monkeypatch.chdir(tmp_path)
+    # No directory argument -> reads cwd, so the live command picks up the project it's launched in.
+    assert project_context.load_project_context() == "# AGENTS.md\n\ncwd guidance"
+
+
+def test_missing_directory_reads_as_no_context(tmp_path):
+    # A nonexistent base directory raises OSError per candidate, which is swallowed -> None.
+    assert project_context.load_project_context(tmp_path / "does-not-exist") is None
+
+
+def test_context_filenames_order():
+    # AGENTS.md (the cross-agent standard) takes precedence over CLAUDE.md.
+    assert project_context.CONTEXT_FILENAMES == ("AGENTS.md", "CLAUDE.md")
+
+
+# --- command wiring: run_agent_cascade reads the loader into the config ------
+
+
+def test_run_reads_project_context_into_config(monkeypatch):
+    monkeypatch.setattr(_exec.tts_session, "require_available", lambda _c: None)
+    monkeypatch.setattr(config, "resolve_api_key", lambda **_: "k")
+    monkeypatch.setattr(_exec, "FileSource", lambda src: types.SimpleNamespace(sample_rate=16000))
+    monkeypatch.setattr(_exec.client, "resolve_audio_source", lambda source, sample: "clip.wav")
+    # Stub the loader so the assertion doesn't depend on the repo's own (large) instruction file.
+    monkeypatch.setattr(_exec, "load_project_context", lambda: "# AGENTS.md\n\nProject background.")
+    captured = {}
+
+    def fake_real(api_key, config, *, audio, stt_params, approver=None):
+        captured["config"] = config
+        return "deps"
+
+    monkeypatch.setattr(_exec.engine.CascadeDeps, "real", fake_real)
+    monkeypatch.setattr(_exec.engine, "run_cascade", lambda **kwargs: None)
+    run_agent_cascade(_opts(source="clip.wav"), AppState(), json_mode=False)
+    # The launch directory's AGENTS.md/CLAUDE.md rides into the cascade config.
+    assert captured["config"].project_context == "# AGENTS.md\n\nProject background."
diff --git a/tests/test_agent_cascade_prompt.py b/tests/test_agent_cascade_prompt.py
index 7c45161b..b1b88098 100644
--- a/tests/test_agent_cascade_prompt.py
+++ b/tests/test_agent_cascade_prompt.py
@@ -200,3 +200,41 @@ def test_datetime_tool_advertised_in_system_prompt():
         "persona", tools=[_NamedTool(datetime_tool.DATETIME_TOOL_NAME)]
     )
     assert "current date and time" in text
+
+
+# --- project context (AGENTS.md/CLAUDE.md) -----------------------------------
+
+
+def test_system_prompt_appends_project_context_when_present():
+    # The launch directory's instruction file is appended as project background, introduced so
+    # the model treats it as grounding rather than another instruction to recite.
+    text = prompt.build_system_prompt(
+        "persona", tools=[], project_context="# AGENTS.md\n\nUse uv run."
+    )
+    assert "background on the project in your working directory" in text
+    assert "# AGENTS.md\n\nUse uv run." in text
+    # It lands after the persona/guidance, not before it.
+    assert text.index("persona") < text.index("Use uv run.")
+
+
+def test_system_prompt_appends_project_context_on_the_tools_path():
+    # The append happens whether or not tools are bound (the capabilities branch too).
+    text = prompt.build_system_prompt(
+        "persona",
+        tools=[_NamedTool(prompt.WEB_SEARCH_TOOL_NAME)],
+        project_context="# AGENTS.md\n\nProject facts.",
+    )
+    assert "search the web" in text
+    assert "Project facts." in text
+
+
+def test_system_prompt_omits_project_context_section_when_absent():
+    # With no instruction file the prompt is unchanged — no dangling background heading.
+    text = prompt.build_system_prompt("persona", tools=[], project_context=None)
+    assert "background on the project in your working directory" not in text
+
+
+def test_system_prompt_treats_empty_project_context_as_absent():
+    # An empty string is falsy, so no background section is appended.
+    text = prompt.build_system_prompt("persona", tools=[], project_context="")
+    assert "background on the project in your working directory" not in text