Skip to content

Commit 92e9ed9

Browse files
NathanFlurryclaude
andcommitted
docs: add missing kernel spec sections from adversarial review
Addresses all 8 use cases: - K-10: Unified blocking I/O wait system (WaitQueue/WaitHandle) - K-11: Inode layer (allocator, refcount, deferred unlink, hard links) - K-8 expanded: Full sigaction (SA_RESTART, sigprocmask, coalescing) - Socket API: add shutdown(), socketpair(), sendTo(), recvFrom(), poll() - Socket flags: MSG_PEEK, MSG_DONTWAIT, MSG_NOSIGNAL, O_NONBLOCK - Socket states: add read-closed/write-closed for half-close - HostSocket: add setOption(), shutdown() - Wildcard address matching, error semantics, datagram limits - Updated test file list and migration order Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 9d03f2f commit 92e9ed9

1 file changed

Lines changed: 184 additions & 26 deletions

File tree

docs-internal/specs/kernel-consolidation.md

Lines changed: 184 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -36,36 +36,59 @@ KernelSocket {
3636
domain: AF_INET | AF_INET6 | AF_UNIX
3737
type: SOCK_STREAM | SOCK_DGRAM
3838
protocol: number
39-
state: 'created' | 'bound' | 'listening' | 'connected' | 'closed'
39+
state: 'created' | 'bound' | 'listening' | 'connected' | 'read-closed' | 'write-closed' | 'closed'
40+
nonBlocking: boolean // O_NONBLOCK
4041
localAddr?: { host: string, port: number } | { path: string }
4142
remoteAddr?: { host: string, port: number } | { path: string }
4243
options: Map<number, number> // SO_REUSEADDR, TCP_NODELAY, etc.
4344
pid: number // owning process
44-
readBuffer: Uint8Array[] // incoming data queue
45-
readWaiters: Array<(data: Uint8Array) => void>
45+
readBuffer: Uint8Array[] // incoming data queue (SOCK_DGRAM: each element = one datagram)
46+
readWaiters: WaitHandle[] // unified wait/wake (see K-10)
4647
writeBuffer: Uint8Array[] // outgoing data queue (for non-blocking)
4748
backlog: KernelSocket[] // pending connections (listening sockets only)
48-
acceptWaiters: Array<(socket: KernelSocket) => void>
49+
acceptWaiters: WaitHandle[]
4950
}
5051
5152
SocketTable {
5253
private sockets: Map<number, KernelSocket>
5354
private nextSocketId: number
54-
private listeners: Map<string, KernelSocket> // "host:port" → listening socket
55+
private listeners: Map<string, KernelSocket> // "host:port" OR "/vfs/path" → listening socket
5556
5657
create(domain, type, protocol, pid): number // returns socket ID
58+
socketpair(domain, type, protocol, pid): [number, number] // returns two connected socket IDs
5759
bind(socketId, addr): void
5860
listen(socketId, backlog): void
5961
accept(socketId): KernelSocket | null // null = EAGAIN
6062
connect(socketId, addr): void // in-kernel for loopback, host adapter for external
61-
send(socketId, data, flags): number // bytes sent
62-
recv(socketId, maxBytes, flags): Uint8Array | null
63+
shutdown(socketId, how: 'read' | 'write' | 'both'): void // half-close
64+
send(socketId, data, flags): number // bytes sent (SOCK_STREAM)
65+
sendTo(socketId, data, flags, destAddr): number // bytes sent (SOCK_DGRAM)
66+
recv(socketId, maxBytes, flags): Uint8Array | null // SOCK_STREAM
67+
recvFrom(socketId, maxBytes, flags): { data: Uint8Array, srcAddr: SockAddr } | null // SOCK_DGRAM
6368
close(socketId): void
69+
poll(socketId): { readable: boolean, writable: boolean, hangup: boolean }
6470
setsockopt(socketId, level, optname, optval): void
6571
getsockopt(socketId, level, optname): number
66-
getLocalAddr(socketId): SockAddr
67-
getRemoteAddr(socketId): SockAddr
72+
getLocalAddr(socketId): SockAddr // getsockname()
73+
getRemoteAddr(socketId): SockAddr // getpeername()
6874
}
75+
76+
// Flags for send/recv:
77+
// MSG_PEEK — read without consuming from buffer
78+
// MSG_DONTWAIT — non-blocking for this single call (regardless of O_NONBLOCK)
79+
// MSG_NOSIGNAL — don't raise SIGPIPE on broken connection
80+
81+
// For SOCK_DGRAM readBuffer: each Uint8Array element is one complete datagram.
82+
// Message boundaries are preserved — two 100-byte sends produce two 100-byte recvs.
83+
// For SOCK_STREAM readBuffer: elements may be coalesced or split at arbitrary boundaries.
84+
// Max UDP datagram size: 65535 bytes. Max receive queue depth: 128 datagrams.
85+
86+
// Wildcard address matching: connect('127.0.0.1', 8080) matches a listener
87+
// bound to '0.0.0.0:8080'. The listeners map must check both exact and wildcard.
88+
89+
// Error semantics for send() on closed connection: EPIPE (+ SIGPIPE unless MSG_NOSIGNAL).
90+
// Error semantics for send() on reset connection: ECONNRESET.
91+
// Error semantics for send() on unconnected SOCK_STREAM: ENOTCONN.
6992
```
7093

7194
**Testing:** Standalone test in `packages/core/test/kernel/socket-table.test.ts`:
@@ -258,6 +281,130 @@ Runtimes call kernel DNS before falling through to host adapter.
258281
- TTL expiry → host adapter called again
259282
- Flush → all entries cleared
260283

284+
### 2.4 Unified Blocking I/O Wait System (K-10)
285+
286+
Currently each blocking operation (pipe read, socket recv, flock, poll) implements its own wait/wake logic. Add a unified `WaitHandle` primitive in `packages/core/src/kernel/wait.ts`:
287+
288+
```
289+
WaitHandle {
290+
wait(timeoutMs?: number): Promise<void> // suspends caller until woken or timeout
291+
wake(): void // wakes one waiter
292+
wakeAll(): void // wakes all waiters
293+
}
294+
295+
WaitQueue {
296+
private waiters: WaitHandle[]
297+
enqueue(): WaitHandle // creates and enqueues a new WaitHandle
298+
wakeOne(): void
299+
wakeAll(): void
300+
}
301+
```
302+
303+
All kernel subsystems use `WaitQueue` for blocking:
304+
- **Pipe read** (buffer empty) → `pipeState.readWaiters.enqueue().wait()`
305+
- **Pipe write** (buffer full) → `pipeState.writeWaiters.enqueue().wait()`
306+
- **Socket accept** (no pending connection) → `socket.acceptWaiters.enqueue().wait()`
307+
- **Socket recv** (no data) → `socket.readWaiters.enqueue().wait()`
308+
- **flock** (lock held by another process) → `fileLock.waiters.enqueue().wait()`
309+
- **poll() with timeout -1**`waitQueue.enqueue().wait()` on each polled FD, race with timeout
310+
311+
**WasmVM integration:** The WasmVM worker thread blocks on `Atomics.wait()` during any syscall. The main thread handler calls `waitQueue.enqueue().wait()` (which is a JS Promise). When the condition is met, `wake()` resolves the Promise, the main thread writes the response to the signal buffer, and `Atomics.notify()` wakes the worker. The existing 30s `RPC_WAIT_TIMEOUT_MS` applies — for indefinite waits (poll timeout -1), the main thread handler loops: wait → timeout → check condition → re-wait.
312+
313+
**Node.js integration:** The Node.js bridge is async. Blocking semantics are implemented via `applySyncPromise` (V8's synchronous Promise resolution). `recv()` returns a Promise that resolves when the WaitHandle is woken. The isolate event loop pumps until the Promise settles.
314+
315+
**Testing:** Standalone test in `packages/core/test/kernel/wait-queue.test.ts`:
316+
- Create WaitHandle, wake it — verify wait() resolves
317+
- Create WaitHandle with timeout — verify it times out
318+
- Multiple waiters, wakeOne — verify only one wakes
319+
- wakeAll — verify all wake
320+
- Wait on pipe read with empty buffer — write data — verify read unblocks
321+
- Wait on flock held by process A — process A unlocks — verify process B unblocks
322+
323+
### 2.5 Inode Layer (K-11)
324+
325+
Add `packages/core/src/kernel/inode-table.ts`:
326+
327+
```
328+
Inode {
329+
ino: number // unique inode number
330+
nlink: number // hard link count
331+
openRefCount: number // number of open FDs referencing this inode
332+
mode: number // file type + permissions (S_IFREG, S_IFDIR, etc.)
333+
uid: number
334+
gid: number
335+
size: number
336+
atime: Date
337+
mtime: Date
338+
ctime: Date
339+
birthtime: Date
340+
}
341+
342+
InodeTable {
343+
private inodes: Map<number, Inode>
344+
private nextIno: number
345+
346+
allocate(mode, uid, gid): Inode
347+
get(ino: number): Inode | null
348+
incrementLinks(ino): void // hard link created
349+
decrementLinks(ino): void // hard link or directory entry removed
350+
incrementOpenRefs(ino): void // FD opened
351+
decrementOpenRefs(ino): void // FD closed — if nlink=0 and openRefCount=0, delete data
352+
shouldDelete(ino): boolean // nlink=0 && openRefCount=0
353+
}
354+
```
355+
356+
VFS nodes reference inodes by `ino` number. Multiple directory entries (hard links) share the same inode. `stat()` returns inode metadata.
357+
358+
**Deferred deletion:** When `unlink()` removes the last directory entry (`nlink → 0`) but FDs are still open (`openRefCount > 0`), the inode and its data persist. The file disappears from directory listings but remains accessible via open FDs. When the last FD is closed (`openRefCount → 0`), the inode and data are deleted. `stat()` on an open FD to an unlinked file returns `nlink: 0`.
359+
360+
**Hard links:** `link(existingPath, newPath)` creates a new directory entry pointing to the same inode. `incrementLinks()` bumps `nlink`. Both paths return the same `ino` from `stat()`.
361+
362+
**Integration with FD table:** `ProcessFDTable.open()` calls `inodeTable.incrementOpenRefs(ino)`. `ProcessFDTable.close()` calls `inodeTable.decrementOpenRefs(ino)` and checks `shouldDelete()`.
363+
364+
**Testing:** Standalone test in `packages/core/test/kernel/inode-table.test.ts`:
365+
- Allocate inode, verify ino is unique
366+
- Create hard link — verify nlink increments, both paths return same ino
367+
- Unlink file with open FD — verify data persists, stat returns nlink=0
368+
- Close last FD on unlinked file — verify inode and data are deleted
369+
- stat() on unlinked-but-open file — verify correct metadata
370+
371+
### 2.6 Signal Handler Registry (K-8, expanded)
372+
373+
Expand beyond section 4.8's basic signal delivery to full POSIX sigaction semantics:
374+
375+
```
376+
SignalHandler {
377+
handler: 'default' | 'ignore' | FunctionPointer // SIG_DFL, SIG_IGN, or user function
378+
mask: Set<number> // signals blocked during handler execution (sa_mask)
379+
flags: number // SA_RESTART, SA_NOCLDSTOP, etc.
380+
}
381+
382+
ProcessSignalState {
383+
handlers: Map<number, SignalHandler> // signal number → handler
384+
blockedSignals: Set<number> // sigprocmask: currently blocked signals
385+
pendingSignals: Map<number, number> // signal → count (queued while blocked)
386+
}
387+
```
388+
389+
**sigaction(signal, handler, mask, flags):** Registers a handler for `signal`. When the signal is delivered:
390+
1. If handler is `'ignore'` → signal is discarded
391+
2. If handler is `'default'` → kernel applies default action (SIGTERM→exit, SIGINT→exit, SIGCHLD→ignore, etc.)
392+
3. If handler is a function pointer → kernel invokes it with `sa_mask` signals temporarily blocked
393+
394+
**SA_RESTART:** If a signal interrupts a blocking syscall (recv, accept, read, wait, poll) and SA_RESTART is set, the syscall is restarted automatically after the handler returns. Without SA_RESTART, the syscall returns EINTR.
395+
396+
**sigprocmask(how, set):** `SIG_BLOCK` adds signals to `blockedSignals`, `SIG_UNBLOCK` removes them, `SIG_SETMASK` replaces. Signals delivered while blocked are queued in `pendingSignals`. When unblocked, pending signals are delivered in order (lowest signal number first, per POSIX).
397+
398+
**Signal coalescing:** Standard signals (1-31) are coalesced — if SIGINT is delivered twice while blocked, only one instance is queued. The `pendingSignals` count is capped at 1 for standard signals.
399+
400+
**Testing:** Standalone test in `packages/core/test/kernel/signal-handlers.test.ts`:
401+
- Register SIGINT handler, deliver SIGINT — verify handler called instead of default exit
402+
- SA_RESTART: handler interrupts blocking recv, verify recv restarts
403+
- No SA_RESTART: handler interrupts blocking recv, verify EINTR returned
404+
- sigprocmask SIG_BLOCK SIGINT, deliver SIGINT, verify not delivered until SIG_UNBLOCK
405+
- Two SIGINTs while blocked — verify only one delivered (coalescing)
406+
- SIG_IGN for SIGCHLD — verify child exit doesn't invoke handler
407+
261408
---
262409

263410
## Part 3: Node.js Bridge Migration
@@ -579,6 +726,8 @@ interface HostSocket {
579726
write(data: Uint8Array): Promise<void>
580727
read(): Promise<Uint8Array | null> // null = EOF
581728
close(): Promise<void>
729+
setOption(level: number, optname: number, optval: number): void // forward kernel socket options
730+
shutdown(how: 'read' | 'write' | 'both'): void // TCP FIN
582731
}
583732

584733
interface HostListener {
@@ -605,14 +754,19 @@ All kernel components are tested standalone — no Node.js runtime, no WasmVM, n
605754

606755
```
607756
packages/core/test/kernel/
608-
socket-table.test.ts # K-1: Socket lifecycle, state transitions, EMFILE
609-
loopback.test.ts # K-2: In-kernel client↔server routing
757+
socket-table.test.ts # K-1: Socket lifecycle, state transitions, EMFILE, socketpair
758+
loopback.test.ts # K-2: In-kernel client↔server routing, wildcard address matching
610759
server-socket.test.ts # K-3: listen/accept, backlog, EADDRINUSE
611-
udp-socket.test.ts # K-4: Datagram send/recv, message boundaries
612-
unix-socket.test.ts # K-5: VFS-path binding, stream + dgram modes
760+
udp-socket.test.ts # K-4: Datagram send/recv, message boundaries, max dgram size
761+
unix-socket.test.ts # K-5: VFS-path binding, stream + dgram modes, socketpair
613762
network-permissions.test.ts # K-7: Deny-by-default, loopback exemption
763+
wait-queue.test.ts # K-10: Unified wait/wake, pipe blocking, flock blocking
764+
inode-table.test.ts # K-11: Inode alloc, hard links, deferred unlink, refcount
765+
signal-handlers.test.ts # K-8: sigaction, SA_RESTART, sigprocmask, coalescing
614766
timer-table.test.ts # Timer lifecycle, budgets, process cleanup
615767
dns-cache.test.ts # Cache hit/miss, TTL, flush
768+
socket-shutdown.test.ts # shutdown() half-close, read-closed/write-closed states
769+
socket-flags.test.ts # MSG_PEEK, MSG_DONTWAIT, MSG_NOSIGNAL, O_NONBLOCK
616770
```
617771

618772
### Test pattern:
@@ -691,19 +845,23 @@ it('WasmVM server accepts Node.js client connection', async () => {
691845

692846
## Migration Order
693847

694-
1. **Socket table + loopback** (K-1, K-2, K-3) — core abstraction, everything depends on it
695-
2. **Network permissions** (K-7) — must exist before exposing sockets to runtimes
696-
3. **Node.js HTTP server migration** (N-2, N-3) — highest ROI, unlocks 492 tests
697-
4. **Node.js net socket migration** (N-4) — needed for HTTP server
698-
5. **UDP sockets** (K-4) — unlocks 76 dgram tests + WasmVM #17
699-
6. **Unix domain sockets** (K-5) — unlocks WasmVM #2
700-
7. **WasmVM syscall wiring** — expose socket table via RPC
701-
8. **Signal handlers** (K-8) — independent, can parallel with above
702-
9. **Timer/handle migration** (N-5, N-7, N-8) — lower priority, mainly cleanup
703-
10. **VFS change notifications** (K-9) — independent, lower priority
704-
11. **DNS cache** (N-10) — nice-to-have
705-
12. **FD table unification** (N-1) — important but risky, do after networking stabilizes
706-
13. **Crypto session cleanup** (N-12) — lowest priority
848+
1. **Unified wait/wake system** (K-10) — foundation for all blocking I/O
849+
2. **Inode layer** (K-11) — foundation for correct VFS semantics (deferred unlink, hard links)
850+
3. **Socket table + loopback + shutdown** (K-1, K-2, K-3) — core networking, depends on K-10 for blocking
851+
4. **Network permissions** (K-7) — must exist before exposing sockets to runtimes
852+
5. **FD table unification** (N-1) — sockets need to share the FD number space with files/pipes
853+
6. **Node.js net socket migration** (N-4) — migrate existing Node.js sockets to kernel
854+
7. **Node.js HTTP server migration** (N-2, N-3) — highest ROI, unlocks 492 tests
855+
8. **WasmVM socket migration** — route existing WasmVM sockets through kernel
856+
9. **WasmVM server sockets** — add bind/listen/accept WASI extensions
857+
10. **UDP sockets** (K-4) — unlocks 76 dgram tests + WasmVM #17
858+
11. **Unix domain sockets + socketpair** (K-5) — unlocks WasmVM #2
859+
12. **Signal handler registry** (K-8) — sigaction, SA_RESTART, sigprocmask, cooperative WASM delivery
860+
13. **Socket flags** — MSG_PEEK, MSG_DONTWAIT, MSG_NOSIGNAL, expanded setsockopt
861+
14. **Timer/handle migration** (N-5, N-7, N-8) — cleanup, kernel-enforced budgets
862+
15. **VFS change notifications** (K-9) — fs.watch support
863+
16. **DNS cache** (N-10) — shared across runtimes
864+
17. **Crypto session cleanup** (N-12) — lowest priority
707865

708866
---
709867

0 commit comments

Comments
 (0)