tangle-network · drewstone · Jul 2, 2026 · Jul 2, 2026
diff --git a/README.md b/README.md
@@ -96,5 +96,9 @@ All 28 live in [`examples/`](./examples).
 
 - New here? [`docs/concepts.md`](./docs/concepts.md) — the mental model in plain terms.
 - [`docs/canonical-api.md`](./docs/canonical-api.md) — find the primitive: "I want to ___ → use ___".
+- [`docs/api/primitive-catalog.md`](./docs/api/primitive-catalog.md) — every export in one generated, never-stale list with its import path. Check it before building anything new.
+- Import subpaths: the root export is the product surface (`handleChatTurn`, `improve`); deeper capabilities ship as subpaths — `/loops` (multi-agent + the loop kernel), `/mcp` (tool servers), `/intelligence` (observability drop-in), `/lifecycle`, `/agent`, `/profiles`, `/platform`, `/analyst-loop`, `/environment-provider`.
 - [`docs/architecture.md`](./docs/architecture.md) — the design, end to end.
 - [`bench/HARNESS.md`](./bench/HARNESS.md) — the experiment harness and how to run a benchmark.
+
+**Contributing:** `pnpm i && pnpm test` gets you running; the full local gate is the [`package.json`](./package.json) scripts (`lint`, `typecheck`, `docs:check`).
diff --git a/bench/HARNESS.md b/bench/HARNESS.md
@@ -124,6 +124,7 @@ the gate + measurement tools:
   corpus-replay.mts  --selector: selector@k vs random@k vs oracle@k over a corpus (THE offline gate)
   corpus-report.mts  paired-bootstrap CI + Benjamini-Hochberg over corpora
   gate-cli.mts  the recursive diverse-vs-blind gate through `runGate` (Supervisor)
+  run-benchmarks-cli.mts  runBenchmarks: any subset of the ADAPTERS registry × model/harness cells, one combined ranked report (#420)
   commit0-env-run.mts  the HARD domain through `runBenchmark` (the optimization suite)
   terminal-compare.ts  Terminal-Bench compare (own main)
 unit tests (the only fully-green, cred-free runnable surface besides offline replay):

diff --git a/docs/README.md b/docs/README.md
@@ -33,7 +33,7 @@ The map of every doc. **Start here** if you're new; the deeper tracks follow.
 | [execution-model.md](./execution-model.md) | the picture | The unified `Executor` port (router/bridge/cli/sandbox/BYO) + two engines, driver vs worker, spawn mechanics. |
 | [agent-bus-protocol.md](./agent-bus-protocol.md) | normative protocol | The multi-agent call bus — depth limits, headers, refusal contract. |
 | [durability-adapters.md](./durability-adapters.md) | subsystem | Journal + durability for resumable conversations + supervisor trees. |
-| [intelligence-sdk.md](./intelligence-sdk.md) | subsystem | The product intelligence drop-in (`withTangleIntelligence`). |
+| [intelligence-sdk.md](./intelligence-sdk.md) | product SDK | The Intelligence SDK reference — the `/intelligence` subpath (observe, effort tiers, certified delivery). |
 | [BUILDING.md](./BUILDING.md) | process | Building discipline: goal first, cheapest decisive proof, verification rules. |
 | [ANTI_PATTERNS.md](./ANTI_PATTERNS.md) | process | Named failure modes. |
 | [MAINTAINING.md](./MAINTAINING.md) | process | How the generated API reference + the docs-freshness gate stay honest. |
@@ -44,7 +44,7 @@ The map of every doc. **Start here** if you're new; the deeper tracks follow.
 |---|---|---|
 | [simplification-plan.md](./research/simplification-plan.md) | **live tracker** | The in-flight simplification/rearchitecture: the converged design, the scratch list, the doc/module inventory, the workstreams + completion criteria. |
 | [research/README.md](./research/README.md) | research index | Forward-looking design threads + decision log. Not the canonical spine. |
-| [archive/](./archive/) | retired notes | Superseded/niche docs kept for history (delivery manifest, conversation economics, artifact-lifecycle, go-live, results). |
+| [archive/](./archive/) | retired notes | Superseded/niche docs kept for history (delivery manifest, conversation economics, artifact-lifecycle, go-live, results, benchmark-matrix consolidation). |
 
 ## Conventions
 

diff --git a/docs/benchmark-matrix-consolidation.md → ...archive/benchmark-matrix-consolidation.md b/docs/benchmark-matrix-consolidation.md → ...archive/benchmark-matrix-consolidation.md
@@ -1,3 +1,5 @@
+> Track: Archive · Status: the agent-runtime portion SHIPPED — external benchmark adapters (`5e2e81a0`) + DABStep (`5d610e78`) are on main, and `runBenchmarks` (the registry subset sweep, plan step 2) landed as #420 (`bench/src/run-benchmarks.ts`, `run-benchmarks-cli.mts`, `run-benchmarks-report.ts`). Plan steps 3–5 (agent-lab external-bench/product adapters; tax/legal/gtm folds) are cross-repo work owned by those repos — the agent-lab branches named below no longer exist. The living map of this repo's bench surface is `bench/HARNESS.md`.
+
 # Benchmark matrix consolidation
 
 How to run any subset of `{harnesses × models × personas × scenarios × external-benchmarks}` and rank the cells, using the existing library primitives — and the plan to fold the per-product matrices onto them.

diff --git a/docs/canonical-api.md b/docs/canonical-api.md
@@ -39,7 +39,7 @@ This table is judgment-only: it maps an intent to the ONE primitive to reach for
 
 | I want to… | Use (import) | Do NOT build |
 |---|---|---|
-| **Just run a supervisor to a goal (one call, scaffolding defaulted)** — START HERE | `supervise(profile, task, { budget, backend? })` — `/loops` | hand-wiring `createSupervisor().run` + `blobs`/`perWorker`/`journal`/`executors`; reaching for the lower-level run-verbs below before you need a specific counterparty |
+| **Just run a supervisor to a goal (one call, scaffolding defaulted)** — START HERE (running agents) | `supervise(profile, task, { budget, backend? })` — `/loops` | hand-wiring `createSupervisor().run` + `blobs`/`perWorker`/`journal`/`executors`; reaching for the lower-level run-verbs below before you need a specific counterparty |
 | **Supervise agents to solve a graded `AgenticSurface` task** (workers `runAgentic` the surface, settle on its own check, driver self-improves from the failing tests) | `superviseSurface(profile, task, { surface, worker })` — `/loops` | a worker-seam + a "self-improving supervisor" wrapper around `supervise()`; passing a custom `makeWorkerAgent` that runs `runAgentic` |
 | Run a genome through a topology shape over the keystone Supervisor, end-to-end | `runPersonified({ persona, shape, task, budget })` — `/loops` | a hand-rolled `createSupervisor().run` + seam-wiring helper |
 | Loop a worker over one evolving artifact, K rounds, stop-when-good | `loopUntil(seed, spec)` as the `shape` — `/loops` | a `while(!done){runWorker();decide()}` hand-loop or "multi-attempt refine driver" |
@@ -65,7 +65,7 @@ This table is judgment-only: it maps an intent to the ONE primitive to reach for
 | Run + **resume** ONE persistent box across turns | `openSandboxRun(client, opts, deliverable)` — `/loops` | a per-domain `new Sandbox`+`box.fs.read`+delete copy |
 | Pick / register a leaf backend, or bring your own agent | `createExecutor({ backend })` / `createExecutorRegistry()` / implement `Executor` — `/loops` | a per-vendor adapter or closed `inline\|sandbox\|cli` switch (won't report through the `UsageEvent` channel) |
 | Evolve a **prompt/string** surface | `gepaProposer({ llm, model, target })` (default inside `selfImprove`; the skill-surface twin is `skillOptProposer`, same source) — `agent-eval/campaign` | a hand-rolled prompt-mutation reflection loop with its own Pareto bookkeeping |
-| Self-improve a profile (one pluggable verb) — START HERE | `improve(profile, findings, { surface, gate })` — root `.` (the RSI verb; defaults the generator from `surface`, wraps `selfImprove`) | a bespoke optimize loop, or calling `selfImprove`/a skill-optimizer directly for the common case |
+| Self-improve a profile (one pluggable verb) — START HERE (self-improvement) | `improve(profile, findings, { surface, gate })` — root `.` (the RSI verb; defaults the generator from `surface`, wraps `selfImprove`) | a bespoke optimize loop, or calling `selfImprove`/a skill-optimizer directly for the common case |
 | Measure **one profile artifact's marginal lift** (with-vs-without, score+cost) / catalog artifacts | `measureMarginalLift(...)` / `ArtifactRegistry` (`applyArtifact` is the one `ArtifactKind`→`AgentProfile`-field bridge) — `/lifecycle` | a hand-rolled with/without ablation loop, or a per-kind `if kind==='skill'…` profile-field switch |
 | Run the **whole artifact lifecycle** — generate→measure→promote→store→compose, then drift-watch/dedupe the live set — over ANY profile surface (skill/prompt/tool/MCP) | `runLifecycle({ baseline, generators, evalRunner, gate })` then `composeProfile(registry, base, query)`; maintain with `driftWatch(...)` / `dedupeArtifacts(...)` — `/lifecycle` | a per-surface improve loop, a hand-rolled promote→compose step, or re-running `measureMarginalLift` without the registry/gate spine. The ONLY per-surface code is a thin `CandidateGenerator` (`skillGenerator` distills, `promptGenerator`/`buildableGenerator` for the rest) |
 | Run the self-improvement loop with full substrate control | `selfImprove({ agent, scenarios, judge, baselineSurface })` — `agent-eval/contract` | a bespoke optimize loop or a parallel skill-optimizer |

diff --git a/examples/README.md b/examples/README.md
@@ -1,26 +1,23 @@
 # agent-runtime examples
 
-A learning path. Read the examples in order — each one adds a single concept on top of the last.
-The fastest way to feel the package is to read **ONE** example: [`driver-loop/`](./driver-loop/)
-(below), which shows the move every supervisor is built on.
+Start by reading ONE example: [`driver-loop/`](./driver-loop/) — the move every supervisor is built on.
+The catalog below is a learning path for when you want more: each example adds a single concept on top of the last.
 
 Every example imports from `@tangle-network/agent-runtime` (the surface consumers use), not from
 relative paths, and they are typechecked by `pnpm run typecheck:examples` — except `researcher-loop`,
 which needs the optional `@tangle-network/agent-knowledge` peer that agent-runtime doesn't depend on
 and CI doesn't install, so it is excluded from that typecheck (run it with `agent-knowledge` installed).
 
-## Quickstart — run these three (≈5 min, two run offline)
-
-Get the feel before reading the full map. In order:
+## Quickstart — the golden path (≈5 min; the first two are $0, offline)
 
 ```bash
-pnpm tsx examples/driver-loop/driver-loop.ts                  # SEE THE FOLD — offline, no creds
-TANGLE_API_KEY=... pnpm tsx examples/supervise/supervise.ts   # one-call supervisor over real workers
-pnpm tsx examples/improve/improve.ts                          # the gated self-improvement verb — offline
+pnpm tsx examples/driver-loop/driver-loop.ts                  # 1. SEE THE FOLD — offline, no creds
+pnpm tsx examples/improve/improve.ts                          # 2. the gated self-improvement verb — offline
+TANGLE_API_KEY=... pnpm tsx examples/supervise/supervise.ts   # 3. one-call supervisor over real workers (router key)
 ```
 
-`driver-loop` is the one move everything else is built on; `supervise` is the one-call product entry;
-`improve` is the one self-improvement verb. The full learning path is below.
+`driver-loop` is the one move everything else is built on; `improve` is the one self-improvement
+verb; `supervise` is the one-call product entry. The full learning path is below.
 
 ## Vocabulary
 
@@ -94,6 +91,12 @@ purpose — read [`driver-loop/`](./driver-loop/) for the contrast (a driver tha
 | 22 | [`product-eval/`](./product-eval/) | You want user-sim product evals: a persona over a multi-round conversation via `runPersonaConversation`, then score the transcript (`maxTurns` is a ceiling, not a target). Needs `TANGLE_API_KEY` (the engine takes a `backendFor` override, but this example wires the live router). |
 | 23 | [`agentic-data-creation/`](./agentic-data-creation/) | You want the **Autodata inner loop**: an agent manufactures HARD training examples from a doc and keeps only the ones that DISCRIMINATE a strong solver from a weak one. Composes the fold (`runLoop`+refine driver), N× sampling (`runLoop`+fanout driver), `llmJudge`, `CostLedger`, and `Corpus`; the one new piece is `discriminativeAcceptRule`. Shows the calibration (plain gap ≈ 0.02 vs agentic ≈ 0.31). Offline. |
 
+## Research harnesses (not on the learning path)
+
+| Example | Use this when… |
+|---|---|
+| [`ablation-suite/`](./ablation-suite/) | You want the coordination-vs-raw-compute ablation (continuous vs ralph vs supervisor, cost-aware, paired-bootstrap Δ) — the suite behind the supervisor +20.8pp result. Needs `TANGLE_API_KEY`; run `ARMS=cal` first (its README explains why). |
+
 ## Conventions
 
 - Examples are synthetic unless noted. `strategy-evolution`, `product-eval`, `supervise`, and