|
| 1 | +# Spec: Terminal E2E Testing with Headless xterm |
| 2 | + |
| 3 | +## Status |
| 4 | + |
| 5 | +Draft |
| 6 | + |
| 7 | +## Motivation |
| 8 | + |
| 9 | +The interactive shell (`kernel.openShell()` / `kernel.connectTerminal()`) has |
| 10 | +integration tests that assert on raw byte streams — substring checks on PTY |
| 11 | +output that ignore escape sequences, cursor movement, and screen layout. This |
| 12 | +means: |
| 13 | + |
| 14 | +- Tests can't verify what the user actually sees. A command could produce |
| 15 | + correct bytes but render incorrectly (wrong line, overwritten output, |
| 16 | + missing newline). |
| 17 | +- PTY line discipline behavior (echo, canonical buffering, signal chars) is |
| 18 | + only partially tested — current tests check that bytes pass through, not |
| 19 | + that the screen state is correct after a sequence of interactions. |
| 20 | +- WasmVM shell commands (`ls`, `echo`, `cat`) have no terminal-level tests |
| 21 | + at all. The existing `driver.test.ts` tests use `kernel.exec()` (non-interactive), |
| 22 | + so brush-shell interactive behavior is untested. |
| 23 | +- Cross-runtime spawning from the shell (e.g. `node -e "..."` from brush-shell) |
| 24 | + has no output verification. |
| 25 | + |
| 26 | +The goal is exact-match testing of the full terminal screen after each |
| 27 | +interaction, so that any rendering regression is caught. |
| 28 | + |
| 29 | +## Approach |
| 30 | + |
| 31 | +### Headless terminal emulator |
| 32 | + |
| 33 | +Use `@xterm/headless` — the headless build of xterm.js — as a virtual terminal |
| 34 | +in tests. It parses escape sequences and maintains a screen buffer identical |
| 35 | +to what a real terminal UI would show. No DOM, no browser, runs in Node/vitest. |
| 36 | + |
| 37 | +Data flow: |
| 38 | + |
| 39 | +``` |
| 40 | +shell.write(input) |
| 41 | + │ |
| 42 | + ▼ |
| 43 | +PTY master → line discipline → PTY slave → shell process (WasmVM/mock) |
| 44 | + │ │ |
| 45 | + │◄──────────── shell output ◄───────────────┘ |
| 46 | + │ |
| 47 | + ▼ |
| 48 | +shell.onData(bytes) → term.write(bytes) → xterm screen buffer |
| 49 | + │ |
| 50 | + ▼ |
| 51 | +screenshotTrimmed() → deterministic string for assertions |
| 52 | +``` |
| 53 | + |
| 54 | +### Test helper: `TerminalHarness` |
| 55 | + |
| 56 | +A small class that wires `openShell()` to an `@xterm/headless` Terminal: |
| 57 | + |
| 58 | +```typescript |
| 59 | +class TerminalHarness { |
| 60 | + readonly term: Terminal; |
| 61 | + readonly shell: ShellHandle; |
| 62 | + |
| 63 | + /** Send input through the PTY. Resolves after data settles. */ |
| 64 | + async type(input: string): Promise<void>; |
| 65 | + |
| 66 | + /** |
| 67 | + * Full screen as a string: every row from the xterm buffer, trailing |
| 68 | + * whitespace trimmed per line, trailing empty lines dropped, joined |
| 69 | + * with '\n'. This is the canonical representation for assertions. |
| 70 | + */ |
| 71 | + screenshotTrimmed(): string; |
| 72 | + |
| 73 | + /** Single row from the screen buffer (0-indexed), trimmed. */ |
| 74 | + line(row: number): string; |
| 75 | + |
| 76 | + /** |
| 77 | + * Wait until screenshotTrimmed() contains `text`. Polls the screen |
| 78 | + * buffer; throws after timeoutMs. Use `occurrence` to wait for the |
| 79 | + * Nth match (e.g. wait for the 2nd prompt after a command completes). |
| 80 | + */ |
| 81 | + async waitFor(text: string, occurrence?: number, timeoutMs?: number): Promise<void>; |
| 82 | + |
| 83 | + /** Send ^D on empty line and await shell exit. Returns exit code. */ |
| 84 | + async exit(): Promise<number>; |
| 85 | + |
| 86 | + /** Kill shell and dispose terminal. */ |
| 87 | + async dispose(): Promise<void>; |
| 88 | +} |
| 89 | +``` |
| 90 | + |
| 91 | +### Assertion style |
| 92 | + |
| 93 | +Every output assertion MUST be an exact match on the full screen state. |
| 94 | +No `toContain()`, no substring checks. The test specifies exactly what |
| 95 | +every visible line should be: |
| 96 | + |
| 97 | +```typescript |
| 98 | +expect(h.screenshotTrimmed()).toBe([ |
| 99 | + '$ echo hello', |
| 100 | + 'hello', |
| 101 | + '$ ', |
| 102 | +].join('\n')); |
| 103 | +``` |
| 104 | + |
| 105 | +This ensures that: |
| 106 | +- The typed command appears (PTY echo) |
| 107 | +- The output is on the correct line |
| 108 | +- The prompt returns after the command |
| 109 | +- No extra/missing lines exist |
| 110 | +- Previous output is preserved across commands |
| 111 | + |
| 112 | +When exact matching is impractical (e.g. timestamps, PIDs), individual lines |
| 113 | +can be matched with `line(row)` + regex, but this should be the exception. |
| 114 | + |
| 115 | +## Test locations |
| 116 | + |
| 117 | +Two test files, testing different layers: |
| 118 | + |
| 119 | +### `packages/kernel/test/shell-terminal.test.ts` |
| 120 | + |
| 121 | +Tests the PTY and terminal plumbing using `MockRuntimeDriver`. No WASM binary |
| 122 | +required. These tests verify the kernel's line discipline, echo, signal |
| 123 | +handling, and screen rendering work correctly. |
| 124 | + |
| 125 | +| Test | What it verifies | |
| 126 | +|------|-----------------| |
| 127 | +| Clean initial state | Shell opens, screen is empty or shows prompt | |
| 128 | +| Echo on input | Typed text appears on screen via PTY echo | |
| 129 | +| Command output on correct line | Mock echo-back appears below the input line | |
| 130 | +| Output preservation | Multiple commands — all previous output stays visible | |
| 131 | +| `^C` sends SIGINT | Screen shows `^C`, shell stays alive, can type more | |
| 132 | +| `^D` exits cleanly | Shell exits with code 0, no extra output | |
| 133 | +| Backspace erases character | `helo` + BS + `lo\n` → screen shows `hello` | |
| 134 | +| Long line wrapping | Input exceeding cols wraps to next row | |
| 135 | + |
| 136 | +### `packages/runtime/wasmvm/test/shell-terminal.test.ts` |
| 137 | + |
| 138 | +Tests real WasmVM shell commands through the terminal. Requires the WASM |
| 139 | +binary (guarded with `skipIf(!hasWasmBinary)`). These tests verify that |
| 140 | +brush-shell and the WasmVM command dispatch produce correct interactive output. |
| 141 | + |
| 142 | +| Test | What it verifies | |
| 143 | +|------|-----------------| |
| 144 | +| `echo` prints output | `echo hello` → "hello" on next line, prompt returns | |
| 145 | +| `ls /` shows listing | Directory entries rendered correctly | |
| 146 | +| Output preserved across commands | `echo AAA` then `echo BBB` — both visible | |
| 147 | +| `cat` reads VFS file | Write file to VFS, `cat` it, content appears | |
| 148 | +| Pipe works | `echo foo \| cat` → "foo" | |
| 149 | +| Exit code on bad command | `nonexistent` → error message on screen | |
| 150 | +| `node -e` cross-runtime | `node -e "console.log(42)"` → "42" on screen | |
| 151 | +| `python3 -c` cross-runtime | `python3 -c "print(99)"` → "99" on screen | |
| 152 | + |
| 153 | +## Dependency |
| 154 | + |
| 155 | +Add `@xterm/headless` as a devDependency to both `packages/kernel` and |
| 156 | +`packages/runtime/wasmvm`. It is a pure JavaScript package with no native |
| 157 | +addons or DOM dependency. |
| 158 | + |
| 159 | +``` |
| 160 | +pnpm -F @secure-exec/kernel add -D @xterm/headless |
| 161 | +pnpm -F @anthropic-ai/wasmvm add -D @xterm/headless |
| 162 | +``` |
| 163 | + |
| 164 | +## Implementation phases |
| 165 | + |
| 166 | +### Phase 1: Kernel terminal tests (mock driver) |
| 167 | + |
| 168 | +1. Add `@xterm/headless` devDependency to `packages/kernel`. |
| 169 | +2. Create `TerminalHarness` utility in `packages/kernel/test/terminal-harness.ts`. |
| 170 | +3. Create `packages/kernel/test/shell-terminal.test.ts` with mock driver tests. |
| 171 | +4. Verify all tests pass with `pnpm vitest run` in `packages/kernel`. |
| 172 | + |
| 173 | +### Phase 2: WasmVM terminal tests (real shell) |
| 174 | + |
| 175 | +1. Add `@xterm/headless` devDependency to `packages/runtime/wasmvm`. |
| 176 | +2. Import or duplicate `TerminalHarness` in `packages/runtime/wasmvm/test/`. |
| 177 | +3. Create `packages/runtime/wasmvm/test/shell-terminal.test.ts` with real |
| 178 | + command tests (gated behind `hasWasmBinary`). |
| 179 | +4. Verify tests pass with the WASM binary built. |
| 180 | + |
| 181 | +### Phase 3: Cross-runtime tests |
| 182 | + |
| 183 | +1. Add Node and Python runtime mounting in the WasmVM terminal tests. |
| 184 | +2. Test `node -e` and `python3 -c` output appears correctly on screen. |
| 185 | +3. These tests require all three runtimes mounted into the same kernel. |
| 186 | + |
| 187 | +## Risks and open questions |
| 188 | + |
| 189 | +### Prompt format |
| 190 | + |
| 191 | +brush-shell's interactive prompt format (`$ `, `bash-5.2$ `, or something |
| 192 | +else) needs to be captured empirically. Tests will break if the prompt |
| 193 | +changes. Mitigation: define the expected prompt as a constant at the top |
| 194 | +of the test file. |
| 195 | + |
| 196 | +### Timing |
| 197 | + |
| 198 | +`waitFor()` polls the screen buffer. Commands that take a long time (e.g. |
| 199 | +Python startup via Pyodide) need generous timeouts. Keep default timeout |
| 200 | +low (2s) for fast commands, allow per-call override. |
| 201 | + |
| 202 | +### Cross-runtime stdout routing |
| 203 | + |
| 204 | +`node` and `python3` spawned from brush-shell route through `proc_spawn` → |
| 205 | +kernel → runtime driver → back through PTY. This path has known issues |
| 206 | +(stdout may not flow back through the PTY slave). If cross-runtime tests |
| 207 | +fail, the fix is in the spawn/stdio wiring, not the test infrastructure. |
| 208 | + |
| 209 | +### Terminal dimensions |
| 210 | + |
| 211 | +Tests should use a fixed size (e.g. 80x24) so line wrapping is deterministic. |
| 212 | +The harness constructor should enforce this. |
0 commit comments