|
181 | 181 | - `AF_UNIX` sockets are local IPC, not host networking: `SocketTable` bind/listen/connect for path sockets must stay fully in-kernel, bypass `permissions.network`, and only use the VFS/listener registry for reachability and socket-file state |
182 | 182 | - kernel-owned `SocketTable` instances must validate owner PIDs against the shared process table at allocation time; only standalone/internal socket tables should omit that validator |
183 | 183 | - when kernel `bind()` assigns an internal ephemeral port for `port: 0`, preserve that original ephemeral intent on the socket so external host-backed listeners can still call the host adapter with `port: 0` and then rewrite `localAddr` to the real host-assigned port |
184 | | -- **the VFS is not the host file system** — files written by sandbox code live in the VFS (in-memory by default); host filesystem is accessible only through explicit read-only overlays (e.g., `node_modules`) configured by the embedder |
185 | | -- when the kernel uses `InMemoryFileSystem`, rebind it to the shared `kernel.inodeTable` before wrapping it with devices/permissions; deferred-unlink FD I/O must use inode-based helpers on the raw in-memory FS, not pathname lookups |
186 | | -- `InMemoryFileSystem` directory metadata must stay POSIX-shaped: directory `nlink` is `2 + immediate child directory count`, `readDir*()` must synthesize `.`/`..`, and symlink `lstat()` / typed readdir entries should expose the symlink's own stable inode instead of `ino: 0` |
| 184 | +- **the VFS is not the host file system** — files written by sandbox code live in the VFS (ChunkedVFS by default); host filesystem is accessible only through explicit read-only overlays (e.g., `node_modules`) configured by the embedder |
| 185 | +- the default in-memory VFS is `ChunkedVFS(InMemoryMetadataStore + InMemoryBlockStore)` created via `createInMemoryFileSystem()` from `@secure-exec/core`. The old monolithic `InMemoryFileSystem` class was removed. |
187 | 186 | - deferred unlink must stay inode-backed: once a pathname is removed, new path lookups must fail immediately, but existing FDs must keep working through `FileDescription.inode` until the last reference closes |
188 | 187 | - `KernelInterface.fdOpen()` is synchronous, so open-time file semantics (`O_CREAT`, `O_EXCL`, `O_TRUNC`) must go through sync-capable VFS hooks threaded through the device and permission wrappers — do not move those checks into async read/write paths |
189 | 188 | - **embedders provide host adapters** that implement actual I/O — a Node.js embedder provides real `fs` and `net`; a browser embedder provides `fetch`-based networking and no file system; sandbox code doesn't know which adapter backs the kernel |
190 | 189 | - when implementing new I/O features (e.g., UDP, TCP servers, fs.watch), they MUST route through the kernel — never bypass it to hit the host directly |
191 | 190 | - see `docs/nodejs-compatibility.mdx` for the architecture diagram |
192 | 191 |
|
| 192 | +## Virtual Filesystem (VFS) Architecture |
| 193 | + |
| 194 | +The VFS uses a layered chunked architecture: `VirtualFileSystem` (kernel interface) is implemented by `ChunkedVFS`, which composes `FsMetadataStore` (directory tree, inodes, chunk mapping) + `FsBlockStore` (dumb key-value blob store). |
| 195 | + |
| 196 | +- **ChunkedVFS** (`packages/core/src/vfs/chunked-vfs.ts`): composes a metadata store and block store into a full `VirtualFileSystem`. Created via `createChunkedVfs(options)`. |
| 197 | +- **Tiered storage**: files <= `inlineThreshold` (default 64 KB) are stored inline in the metadata store. Larger files are split into fixed-size chunks (default 4 MB) in the block store. Automatic promotion/demotion when files cross the threshold. |
| 198 | +- **Per-inode async mutex**: prevents interleaved read-modify-write corruption on concurrent async operations (pwrite, writeFile, truncate, removeFile, rename). Read-only ops (pread, readFile, stat) do not acquire the mutex. |
| 199 | +- **Optional write buffering**: when `writeBuffering: true`, pwrite buffers dirty chunks in memory and flushes on fsync or auto-flush interval. Reads always see buffered data. |
| 200 | +- **Optional versioning**: when `versioning: true`, block keys include a random ID to avoid overwrites. Exposes `createVersion`, `listVersions`, `restoreVersion`, `pruneVersions` API. |
| 201 | +- **Block key format**: `{ino}/{chunkIndex}` (or `{ino}/{chunkIndex}/{randomId}` with versioning). |
| 202 | + |
| 203 | +### Available implementations |
| 204 | + |
| 205 | +- **InMemoryMetadataStore** (`packages/core/src/vfs/memory-metadata.ts`): pure JS Map-based. For ephemeral VMs and tests. |
| 206 | +- **SqliteMetadataStore** (`packages/core/src/vfs/sqlite-metadata.ts`): SQLite-backed via `better-sqlite3`. Supports versioning. Constructor accepts `{ dbPath }` where `:memory:` creates an in-memory database. |
| 207 | +- **InMemoryBlockStore** (`packages/core/src/vfs/memory-block-store.ts`): pure JS Map-based. |
| 208 | +- **HostBlockStore** (`packages/core/src/vfs/host-block-store.ts`): persists blocks as files on the host filesystem. For local dev environments. |
| 209 | +- **S3BlockStore** (in agent-os `packages/fs-s3/`): S3-compatible object storage. Server-side copy support. |
| 210 | + |
| 211 | +### Kernel VFS integration |
| 212 | + |
| 213 | +- The kernel delegates `pwrite` to the VFS interface instead of doing read-modify-write internally. |
| 214 | +- The kernel calls `vfs.fsync?.(path)` fire-and-forget in `releaseDescriptionInode` when the last FD for a file is closed. |
| 215 | +- All kernel I/O goes through the `VirtualFileSystem` interface only. The old `InMemoryFileSystem`-specific fast paths (`readFileByInode`, `preadByInode`, `writeFileByInode`, `statByInode`) and `rawInMemoryFs` field were removed. |
| 216 | +- `fdOpen` is synchronous. Open-time semantics (`O_CREAT`, `O_EXCL`, `O_TRUNC`) use `prepareOpenSync` on the VFS for sync-capable handling. |
| 217 | + |
| 218 | +## VFS Conformance Test Suites |
| 219 | + |
| 220 | +Three conformance test suites are exported from `@secure-exec/core` for external VFS implementations to validate against: |
| 221 | + |
| 222 | +- **VFS conformance** (`packages/core/src/test/vfs-conformance.ts`): tests the full `VirtualFileSystem` contract. Register with `defineVfsConformanceTests({ name, createFs, cleanup, capabilities })`. Capability flags gate optional test groups: `symlinks`, `hardLinks`, `permissions`, `utimes`, `truncate`, `pread`, `pwrite`, `mkdir`, `removeDir`, `fsync`, `copy`, `readDirStat`. |
| 223 | +- **Block store conformance** (`packages/core/src/test/block-store-conformance.ts`): tests the `FsBlockStore` contract. Register with `defineBlockStoreTests({ name, createStore, capabilities })`. Capability flag: `copy`. |
| 224 | +- **Metadata store conformance** (`packages/core/src/test/metadata-store-conformance.ts`): tests the `FsMetadataStore` contract. Register with `defineMetadataStoreTests({ name, createStore, capabilities })`. Capability flag: `versioning`. |
| 225 | + |
| 226 | +Test registration files go in `packages/core/test/vfs/`. Use small thresholds (e.g., 256 bytes inline, 1024 bytes chunk) for fast edge case tests. |
| 227 | + |
193 | 228 | ## Code Transformation Policy |
194 | 229 |
|
195 | 230 | - NEVER use regex-based source code transformation for JavaScript/TypeScript (e.g., converting ESM to CJS, rewriting imports, extracting exports) |
|
0 commit comments