You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+21-1Lines changed: 21 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,9 +17,29 @@
17
17
18
18
- NEVER mock external services in tests — use real implementations (Docker containers for databases/services, real HTTP servers for network tests, real binaries for CLI tool tests)
19
19
- tests that validate sandbox behavior MUST run code through the secure-exec sandbox (NodeRuntime/proc.exec()), never directly on the host
20
-
- CLI tool tests (Pi, Claude Code, OpenCode) must execute inside the sandbox: Pi runs as JS in the VM, Claude Code and OpenCode spawn their binaries via the sandbox's child_process.spawn bridge
20
+
- NOTHING runs on the host except Docker containers for e2e-docker tests — all CLI tool code executes inside the sandbox VM
21
+
- Pi and Claude Code are pure JS — they run in-VM via `import()` or `node -e` through `kernel.openShell()`, NOT via host `child_process.spawn`
22
+
- Claude Code is a bundled ESM Node.js script (`@anthropic-ai/claude-code/cli.js`), not a native binary — its `.node` addons (tree-sitter, audio-capture) are optional and gracefully degrade
23
+
- OpenCode is a compiled Bun binary — it runs via the sandbox's `child_process.spawn` bridge (the bridge routes to the host, but from the sandbox's perspective it's a normal spawn)
24
+
- NEVER use `HostBinaryDriver` or `script -qefc` to bypass the sandbox — if the sandbox can't run something, fix the sandbox
25
+
- if a test skips because of a sandbox limitation (module resolution, stdin streaming, isTTY), that is a secure-exec bug to fix, not a test to skip
26
+
- NEVER work around sandbox limitations with host-side execution:
27
+
- do NOT create `HostBinaryDriver` classes that spawn binaries on the host
28
+
- do NOT use `script -qefc` to give host processes a PTY
29
+
- do NOT use `node:child_process.spawn` from test code to run tools on the host when the tool should run in-VM
30
+
- do NOT add `sandboxSkip` / probe-based skip logic that silently skips when the sandbox can't do something
31
+
- do NOT mark a story as passing if tests are skipping due to sandbox limitations
32
+
- if the sandbox can't run something, the correct action is to FIX the sandbox code in packages/nodejs/src/ or packages/core/src/
21
33
- e2e-docker fixtures connect to real Docker containers (Postgres, MySQL, Redis, SSH/SFTP) — skip gracefully via `skipUnlessDocker()` when Docker is unavailable
22
34
- interactive/PTY tests must use `kernel.openShell()` with `@xterm/headless`, not host PTY via `script -qefc`
35
+
- CLI tool tests (Pi, Claude Code, OpenCode) must support both mock and real LLM API tokens:
36
+
- check `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` env vars at test startup
37
+
- if a real token is present, use it instead of the mock LLM server — this validates true e2e behavior
38
+
- Pi supports both Anthropic and OpenAI tokens; OpenCode uses OpenAI; Claude Code uses Anthropic
39
+
- log which mode each test suite is using at startup: `"Using real ANTHROPIC_API_KEY"`, `"Using real OPENAI_API_KEY"`, or `"Using mock LLM server"`
40
+
- tests must pass with both mock and real tokens — mock is the fallback, real is preferred
41
+
- to run with real tokens locally: `source ~/misc/env.txt` before running tests
42
+
- real-token tests may use longer timeouts (up to 60s) since they hit external APIs
0 commit comments