Skip to content

Commit c955684

Browse files
authored
Merge pull request #44 from rivet-dev/nathan/prd-in-vm-enforcement
Enforce in-VM execution for all CLI tool tests
2 parents bfe637a + aa5e8b3 commit c955684

34 files changed

Lines changed: 2860 additions & 718 deletions

CLAUDE.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,30 @@
1717

1818
- NEVER mock external services in tests — use real implementations (Docker containers for databases/services, real HTTP servers for network tests, real binaries for CLI tool tests)
1919
- tests that validate sandbox behavior MUST run code through the secure-exec sandbox (NodeRuntime/proc.exec()), never directly on the host
20-
- CLI tool tests (Pi, Claude Code, OpenCode) must execute inside the sandbox: Pi runs as JS in the VM, Claude Code and OpenCode spawn their binaries via the sandbox's child_process.spawn bridge
20+
- NOTHING runs on the host except Docker containers for e2e-docker tests — all CLI tool code executes inside the sandbox VM
21+
- Pi, Claude Code, and OpenCode are ALL pure JavaScript/TypeScript — they ALL run in-VM via `import()` through `kernel.openShell()` or `kernel.spawn()`
22+
- Claude Code is a bundled ESM Node.js script (`@anthropic-ai/claude-code/cli.js`), not a native binary — its `.node` addons (tree-sitter, audio-capture) are optional and gracefully degrade
23+
- OpenCode is TypeScript (https://github.com/anomalyco/opencode) — the distributed `opencode` command is a Bun-compiled binary but the source is vanilla TS/JS; build the JS bundle from source and run it in-VM
24+
- if the sandbox can't run something, that is a secure-exec bug to fix, not a reason to spawn on the host
25+
- NEVER work around sandbox limitations with host-side execution — this is the #1 rule for CLI tool tests:
26+
- do NOT use `child_process.spawn` or `child_process.spawnSync` from INSIDE sandbox code to run a tool binary on the host (e.g. `spawnSync('claude', [...args])` routing through the child_process bridge) — this is still host execution, the tool's JS runs on the host not in the V8 isolate
27+
- do NOT use `node:child_process.spawn` from TEST code to run tools on the host
28+
- do NOT create `HostBinaryDriver` classes that spawn binaries on the host
29+
- do NOT use `script -qefc` or `python3 pty.spawn` to give host processes a PTY
30+
- do NOT add `sandboxSkip` / probe-based skip logic that silently skips when the sandbox can't do something
31+
- do NOT mark a story as passing if the tool runs on the host instead of in the V8 isolate
32+
- the ONLY correct pattern is: `kernel.spawn('node', ['-e', 'import("tool-entry.js")'])` or equivalent — the tool's JavaScript executes inside the V8 sandbox isolate
33+
- if `import()` hangs, if ESM loading fails, if the TUI crashes — those are secure-exec bugs to fix in packages/nodejs/src/, packages/core/src/, or native/v8-runtime/src/
2134
- e2e-docker fixtures connect to real Docker containers (Postgres, MySQL, Redis, SSH/SFTP) — skip gracefully via `skipUnlessDocker()` when Docker is unavailable
2235
- interactive/PTY tests must use `kernel.openShell()` with `@xterm/headless`, not host PTY via `script -qefc`
36+
- CLI tool tests (Pi, Claude Code, OpenCode) must support both mock and real LLM API tokens:
37+
- check `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` env vars at test startup
38+
- if a real token is present, use it instead of the mock LLM server — this validates true e2e behavior
39+
- Pi supports both Anthropic and OpenAI tokens; OpenCode uses OpenAI; Claude Code uses Anthropic
40+
- log which mode each test suite is using at startup: `"Using real ANTHROPIC_API_KEY"`, `"Using real OPENAI_API_KEY"`, or `"Using mock LLM server"`
41+
- tests must pass with both mock and real tokens — mock is the fallback, real is preferred
42+
- to run with real tokens locally: `source ~/misc/env.txt` before running tests
43+
- real-token tests may use longer timeouts (up to 60s) since they hit external APIs
2344

2445
## Tooling
2546

native/v8-runtime/Cargo.lock

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

native/v8-runtime/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ name = "secure-exec-v8"
1010
path = "src/main.rs"
1111

1212
[dependencies]
13-
v8 = "130"
13+
v8 = "134"
1414
crossbeam-channel = "0.5"
1515
signal-hook = "0.3"
1616
libc = "0.2"

native/v8-runtime/src/bridge.rs

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -528,8 +528,7 @@ pub fn resolve_pending_promise(
528528
resolver.resolve(scope, undef.into());
529529
}
530530

531-
// Flush microtasks after resolution
532-
scope.perform_microtask_checkpoint();
531+
// Microtask checkpoint is the caller's responsibility (explicit policy).
533532

534533
Ok(())
535534
}

0 commit comments

Comments
 (0)