Skip to content

Commit 1c86f3d

Browse files
committed
feat: US-083 - Fix cd test, add VFS lazy directory population for WasmVM
1 parent 65a6cb8 commit 1c86f3d

9 files changed

Lines changed: 1485 additions & 38 deletions

File tree

docs-internal/specs/cli-tool-e2e.md

Lines changed: 677 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Spec: Terminal E2E Testing with Headless xterm
2+
3+
## Status
4+
5+
Draft
6+
7+
## Motivation
8+
9+
The interactive shell (`kernel.openShell()` / `kernel.connectTerminal()`) has
10+
integration tests that assert on raw byte streams — substring checks on PTY
11+
output that ignore escape sequences, cursor movement, and screen layout. This
12+
means:
13+
14+
- Tests can't verify what the user actually sees. A command could produce
15+
correct bytes but render incorrectly (wrong line, overwritten output,
16+
missing newline).
17+
- PTY line discipline behavior (echo, canonical buffering, signal chars) is
18+
only partially tested — current tests check that bytes pass through, not
19+
that the screen state is correct after a sequence of interactions.
20+
- WasmVM shell commands (`ls`, `echo`, `cat`) have no terminal-level tests
21+
at all. The existing `driver.test.ts` tests use `kernel.exec()` (non-interactive),
22+
so brush-shell interactive behavior is untested.
23+
- Cross-runtime spawning from the shell (e.g. `node -e "..."` from brush-shell)
24+
has no output verification.
25+
26+
The goal is exact-match testing of the full terminal screen after each
27+
interaction, so that any rendering regression is caught.
28+
29+
## Approach
30+
31+
### Headless terminal emulator
32+
33+
Use `@xterm/headless` — the headless build of xterm.js — as a virtual terminal
34+
in tests. It parses escape sequences and maintains a screen buffer identical
35+
to what a real terminal UI would show. No DOM, no browser, runs in Node/vitest.
36+
37+
Data flow:
38+
39+
```
40+
shell.write(input)
41+
42+
43+
PTY master → line discipline → PTY slave → shell process (WasmVM/mock)
44+
│ │
45+
│◄──────────── shell output ◄───────────────┘
46+
47+
48+
shell.onData(bytes) → term.write(bytes) → xterm screen buffer
49+
50+
51+
screenshotTrimmed() → deterministic string for assertions
52+
```
53+
54+
### Test helper: `TerminalHarness`
55+
56+
A small class that wires `openShell()` to an `@xterm/headless` Terminal:
57+
58+
```typescript
59+
class TerminalHarness {
60+
readonly term: Terminal;
61+
readonly shell: ShellHandle;
62+
63+
/** Send input through the PTY. Resolves after data settles. */
64+
async type(input: string): Promise<void>;
65+
66+
/**
67+
* Full screen as a string: every row from the xterm buffer, trailing
68+
* whitespace trimmed per line, trailing empty lines dropped, joined
69+
* with '\n'. This is the canonical representation for assertions.
70+
*/
71+
screenshotTrimmed(): string;
72+
73+
/** Single row from the screen buffer (0-indexed), trimmed. */
74+
line(row: number): string;
75+
76+
/**
77+
* Wait until screenshotTrimmed() contains `text`. Polls the screen
78+
* buffer; throws after timeoutMs. Use `occurrence` to wait for the
79+
* Nth match (e.g. wait for the 2nd prompt after a command completes).
80+
*/
81+
async waitFor(text: string, occurrence?: number, timeoutMs?: number): Promise<void>;
82+
83+
/** Send ^D on empty line and await shell exit. Returns exit code. */
84+
async exit(): Promise<number>;
85+
86+
/** Kill shell and dispose terminal. */
87+
async dispose(): Promise<void>;
88+
}
89+
```
90+
91+
### Assertion style
92+
93+
Every output assertion MUST be an exact match on the full screen state.
94+
No `toContain()`, no substring checks. The test specifies exactly what
95+
every visible line should be:
96+
97+
```typescript
98+
expect(h.screenshotTrimmed()).toBe([
99+
'$ echo hello',
100+
'hello',
101+
'$ ',
102+
].join('\n'));
103+
```
104+
105+
This ensures that:
106+
- The typed command appears (PTY echo)
107+
- The output is on the correct line
108+
- The prompt returns after the command
109+
- No extra/missing lines exist
110+
- Previous output is preserved across commands
111+
112+
When exact matching is impractical (e.g. timestamps, PIDs), individual lines
113+
can be matched with `line(row)` + regex, but this should be the exception.
114+
115+
## Test locations
116+
117+
Two test files, testing different layers:
118+
119+
### `packages/kernel/test/shell-terminal.test.ts`
120+
121+
Tests the PTY and terminal plumbing using `MockRuntimeDriver`. No WASM binary
122+
required. These tests verify the kernel's line discipline, echo, signal
123+
handling, and screen rendering work correctly.
124+
125+
| Test | What it verifies |
126+
|------|-----------------|
127+
| Clean initial state | Shell opens, screen is empty or shows prompt |
128+
| Echo on input | Typed text appears on screen via PTY echo |
129+
| Command output on correct line | Mock echo-back appears below the input line |
130+
| Output preservation | Multiple commands — all previous output stays visible |
131+
| `^C` sends SIGINT | Screen shows `^C`, shell stays alive, can type more |
132+
| `^D` exits cleanly | Shell exits with code 0, no extra output |
133+
| Backspace erases character | `helo` + BS + `lo\n` → screen shows `hello` |
134+
| Long line wrapping | Input exceeding cols wraps to next row |
135+
136+
### `packages/runtime/wasmvm/test/shell-terminal.test.ts`
137+
138+
Tests real WasmVM shell commands through the terminal. Requires the WASM
139+
binary (guarded with `skipIf(!hasWasmBinary)`). These tests verify that
140+
brush-shell and the WasmVM command dispatch produce correct interactive output.
141+
142+
| Test | What it verifies |
143+
|------|-----------------|
144+
| `echo` prints output | `echo hello` → "hello" on next line, prompt returns |
145+
| `ls /` shows listing | Directory entries rendered correctly |
146+
| Output preserved across commands | `echo AAA` then `echo BBB` — both visible |
147+
| `cat` reads VFS file | Write file to VFS, `cat` it, content appears |
148+
| Pipe works | `echo foo \| cat` → "foo" |
149+
| Exit code on bad command | `nonexistent` → error message on screen |
150+
| `node -e` cross-runtime | `node -e "console.log(42)"` → "42" on screen |
151+
| `python3 -c` cross-runtime | `python3 -c "print(99)"` → "99" on screen |
152+
153+
## Dependency
154+
155+
Add `@xterm/headless` as a devDependency to both `packages/kernel` and
156+
`packages/runtime/wasmvm`. It is a pure JavaScript package with no native
157+
addons or DOM dependency.
158+
159+
```
160+
pnpm -F @secure-exec/kernel add -D @xterm/headless
161+
pnpm -F @anthropic-ai/wasmvm add -D @xterm/headless
162+
```
163+
164+
## Implementation phases
165+
166+
### Phase 1: Kernel terminal tests (mock driver)
167+
168+
1. Add `@xterm/headless` devDependency to `packages/kernel`.
169+
2. Create `TerminalHarness` utility in `packages/kernel/test/terminal-harness.ts`.
170+
3. Create `packages/kernel/test/shell-terminal.test.ts` with mock driver tests.
171+
4. Verify all tests pass with `pnpm vitest run` in `packages/kernel`.
172+
173+
### Phase 2: WasmVM terminal tests (real shell)
174+
175+
1. Add `@xterm/headless` devDependency to `packages/runtime/wasmvm`.
176+
2. Import or duplicate `TerminalHarness` in `packages/runtime/wasmvm/test/`.
177+
3. Create `packages/runtime/wasmvm/test/shell-terminal.test.ts` with real
178+
command tests (gated behind `hasWasmBinary`).
179+
4. Verify tests pass with the WASM binary built.
180+
181+
### Phase 3: Cross-runtime tests
182+
183+
1. Add Node and Python runtime mounting in the WasmVM terminal tests.
184+
2. Test `node -e` and `python3 -c` output appears correctly on screen.
185+
3. These tests require all three runtimes mounted into the same kernel.
186+
187+
## Risks and open questions
188+
189+
### Prompt format
190+
191+
brush-shell's interactive prompt format (`$ `, `bash-5.2$ `, or something
192+
else) needs to be captured empirically. Tests will break if the prompt
193+
changes. Mitigation: define the expected prompt as a constant at the top
194+
of the test file.
195+
196+
### Timing
197+
198+
`waitFor()` polls the screen buffer. Commands that take a long time (e.g.
199+
Python startup via Pyodide) need generous timeouts. Keep default timeout
200+
low (2s) for fast commands, allow per-call override.
201+
202+
### Cross-runtime stdout routing
203+
204+
`node` and `python3` spawned from brush-shell route through `proc_spawn`
205+
kernel → runtime driver → back through PTY. This path has known issues
206+
(stdout may not flow back through the PTY slave). If cross-runtime tests
207+
fail, the fix is in the spawn/stdio wiring, not the test infrastructure.
208+
209+
### Terminal dimensions
210+
211+
Tests should use a fixed size (e.g. 80x24) so line wrapping is deterministic.
212+
The harness constructor should enforce this.

0 commit comments

Comments
 (0)