|
17 | 17 |
|
18 | 18 | - NEVER mock external services in tests — use real implementations (Docker containers for databases/services, real HTTP servers for network tests, real binaries for CLI tool tests) |
19 | 19 | - tests that validate sandbox behavior MUST run code through the secure-exec sandbox (NodeRuntime/proc.exec()), never directly on the host |
20 | | -- CLI tool tests (Pi, Claude Code, OpenCode) must execute inside the sandbox: Pi runs as JS in the VM, Claude Code and OpenCode spawn their binaries via the sandbox's child_process.spawn bridge |
| 20 | +- NOTHING runs on the host except Docker containers for e2e-docker tests — all CLI tool code executes inside the sandbox VM |
| 21 | +- Pi, Claude Code, and OpenCode are ALL pure JavaScript/TypeScript — they ALL run in-VM via `import()` through `kernel.openShell()` or `kernel.spawn()` |
| 22 | +- Claude Code is a bundled ESM Node.js script (`@anthropic-ai/claude-code/cli.js`), not a native binary — its `.node` addons (tree-sitter, audio-capture) are optional and gracefully degrade |
| 23 | +- OpenCode is TypeScript (https://github.com/anomalyco/opencode) — the distributed `opencode` command is a Bun-compiled binary but the source is vanilla TS/JS; build the JS bundle from source and run it in-VM |
| 24 | +- if the sandbox can't run something, that is a secure-exec bug to fix, not a reason to spawn on the host |
| 25 | +- NEVER work around sandbox limitations with host-side execution — this is the #1 rule for CLI tool tests: |
| 26 | + - do NOT use `child_process.spawn` or `child_process.spawnSync` from INSIDE sandbox code to run a tool binary on the host (e.g. `spawnSync('claude', [...args])` routing through the child_process bridge) — this is still host execution, the tool's JS runs on the host not in the V8 isolate |
| 27 | + - do NOT use `node:child_process.spawn` from TEST code to run tools on the host |
| 28 | + - do NOT create `HostBinaryDriver` classes that spawn binaries on the host |
| 29 | + - do NOT use `script -qefc` or `python3 pty.spawn` to give host processes a PTY |
| 30 | + - do NOT add `sandboxSkip` / probe-based skip logic that silently skips when the sandbox can't do something |
| 31 | + - do NOT mark a story as passing if the tool runs on the host instead of in the V8 isolate |
| 32 | + - the ONLY correct pattern is: `kernel.spawn('node', ['-e', 'import("tool-entry.js")'])` or equivalent — the tool's JavaScript executes inside the V8 sandbox isolate |
| 33 | + - if `import()` hangs, if ESM loading fails, if the TUI crashes — those are secure-exec bugs to fix in packages/nodejs/src/, packages/core/src/, or native/v8-runtime/src/ |
21 | 34 | - e2e-docker fixtures connect to real Docker containers (Postgres, MySQL, Redis, SSH/SFTP) — skip gracefully via `skipUnlessDocker()` when Docker is unavailable |
22 | 35 | - interactive/PTY tests must use `kernel.openShell()` with `@xterm/headless`, not host PTY via `script -qefc` |
| 36 | +- CLI tool tests (Pi, Claude Code, OpenCode) must support both mock and real LLM API tokens: |
| 37 | + - check `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` env vars at test startup |
| 38 | + - if a real token is present, use it instead of the mock LLM server — this validates true e2e behavior |
| 39 | + - Pi supports both Anthropic and OpenAI tokens; OpenCode uses OpenAI; Claude Code uses Anthropic |
| 40 | + - log which mode each test suite is using at startup: `"Using real ANTHROPIC_API_KEY"`, `"Using real OPENAI_API_KEY"`, or `"Using mock LLM server"` |
| 41 | + - tests must pass with both mock and real tokens — mock is the fallback, real is preferred |
| 42 | + - to run with real tokens locally: `source ~/misc/env.txt` before running tests |
| 43 | + - real-token tests may use longer timeouts (up to 60s) since they hit external APIs |
23 | 44 |
|
24 | 45 | ## Tooling |
25 | 46 |
|
|
0 commit comments