Start shared sidecar containers before the harness#12
Merged
Conversation
The launcher will soon start tongs before the anvil, which means running docker commands and waiting for a sidecar to accept connections. Route every docker call through a DockerCLI object so the orchestration logic can be exercised against an in-process fake instead of a live daemon, and add the readiness prober that decides a tong is up. DockerCLI wraps the handful of docker verbs the launch depends on: removing a container, starting one detached, inspecting its running state and config-hash label, reading an image healthcheck, running an exec, dialing a TCP port from a throwaway container on the network, and running the anvil in the foreground so the launcher regains control when it exits. wait_ready dispatches on a tong's resolved readiness declaration -- a TCP dial of its canonical alias, an image healthcheck or exec command, or an immediate pass -- and degrades a TCP probe to a container-running check when no probe image is available. Nothing calls these yet; they are the seam the shared-tong launch path builds on.
When a tong is discovered, the launcher now starts it, waits for it to report ready, makes it reachable from the anvil, runs the anvil in the foreground, and leaves the tong running afterwards -- the first time the launcher touches the live launch path beyond passing the anvil through. This first cut handles only `shared` tongs reached over the anvil's existing network. A `shared` tong is one long-lived container keyed by a stable name: a running one whose config-hash label still matches is reused untouched, and a missing, stopped, or stale one is (re)started. A `port` or `volume` tong's reachability is injected into the anvil as environment, plus a shared mount for `volume`. Anything that needs machinery not wired here -- a `session` lifecycle, a secret reference, or an `mcp` interface -- is refused with a clear message rather than started half-wired. The passthrough invariant is unchanged and still tested byte-for-byte: with no tong discovered the launcher execs the anvil argv verbatim, and only a present tong drives the start/ready/inject path. Validation runs before any docker call so an invalid definition stops the launch cleanly. The anvil image is threaded in as `--anvil-image` so a TCP readiness probe can dial a tong's network-internal port from a throwaway container.
Decide the TCP-probe degrade (no anvil image, or no network to dial on) once before the readiness loop and warn a single time, instead of re-warning on every poll -- a long readiness wait no longer floods stderr. Folding the missing network into the same condition also avoids handing the probe a None network. Extend the tests to cover a present-but-stopped shared container (recreated, not reused), two shared tongs started and injected in one launch, and a Ctrl-C during the run reporting 130 while leaving the shared tongs up.
The `volume` interface (a named volume shared between a tong and the harness) has no consumer: the credential, broker, and shared-service tongs are all network or side-effect tongs, and a file-watcher reaches the tree through a `workspace` mount, not a shared volume. Rather than ship the half-wired kind (the named volume was injected into the harness but never mounted into the tong), refuse a `volume` interface here and revisit if a real case appears. Also refuse a `shared` tong that mounts the `workspace`. A `shared` tong is one long-lived container reused across sessions, so binding one session's workspace into it would expose that workspace to every later session that reuses the container. A per-workspace mount belongs on a `session` tong; the docker-socket mount (the broker pattern) stays allowed on a shared tong. The startable set is now `shared` `port`/`none` tongs without secrets
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The host launcher (
scripts/run_anvil.py) wraps the harness container run and,until now, only discovered sidecar definitions without starting them. This is
the first change where a discovered sidecar actually runs.
What this does
When a sidecar is discovered, the launcher now starts it, waits for it to report
ready, makes it reachable from the harness, runs the harness in the foreground,
and leaves the sidecar running afterwards.
This first cut handles only long-lived (
shared) sidecars reached over theharness's existing network:
sharedsidecar is one container keyed by a stable name. A running one whoseconfig-hash label still matches is reused untouched; a missing, stopped, or
stale one is (re)started.
portsidecar's reachability is injected into the harness asSWARMFORGE_TONG_<NAME>_HOST/_PORTenvironment. Anonesidecar is startedbut has no harness-facing surface.
alias, an image healthcheck or exec command, or an explicit skip. The TCP probe
runs from a throwaway container on the network, so
--anvil-imageis threadedin to give it an image with
python3; without one it degrades to acontainer-running check.
Anything needing machinery not wired here is refused with a clear message rather
than started half-configured: a per-session lifecycle, a secret reference, an MCP
interface, a
volumeinterface (a shared named volume, which has no consumeryet), or a
sharedsidecar that mounts the workspace (a reused container wouldleak one session's workspace into the next; a per-workspace mount belongs on a
per-session sidecar).
Structure
Every docker call goes through a small
DockerCLIseam so the start / ready /inject / run sequence is unit-tested against an in-process fake -- covering
reuse, recreate (absent/stopped/stale), readiness timeout, port/volume injection,
multiple sidecars, the refusal of unsupported sidecars, and Ctrl-C leaving the
shared sidecars up.