Skip to content

improve docker-in-docker opts#1697

Merged
luke-lombardi merged 13 commits into
mainfrom
ll/improve-docker-opts
Jun 18, 2026
Merged

improve docker-in-docker opts#1697
luke-lombardi merged 13 commits into
mainfrom
ll/improve-docker-opts

Conversation

@luke-lombardi

@luke-lombardi luke-lombardi commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary by cubic

Improves Docker‑in‑Docker reliability across gvisor and runc, hardens checkpoint create/restore/prune, tightens Python image build rules, and expands CPU‑only image build e2e tests. Adds graceful shutdown/cleanup, cancellable gRPC retries with backoff, quieter logs, clearer snapshot errors, and better killed‑process handling.

  • Refactors

    • Docker‑in‑Docker: added runtime.AddDockerInDockerCapabilities(); runsc probes flags to set the packet‑write option; scheduler allows Docker‑enabled requests on both runc and gvisor; sandbox pre‑stop kills inner containers and stops dockerd/containerd; SDK runs inner Docker with host network/PID; Docker list helpers return stdout strings.
    • Exec/streams: switched to a non‑streaming exec with readiness retries; kill waits for disappearance and marks gone PIDs; status/list return a synthetic exit code for missing processes until cleared; SDK stdout/stderr RPCs validate ok; combined stream preserves buffered partial lines.
    • Images/cache/repo: apply base Python requirements when verifying existing images; stricter IgnorePython; checkpoint pruning respects recent TTL/mtime and DB cutoff; repository GetOrCreateStub is workspace‑scoped and touches updated_at; cache manager starts the standby cache server immediately and stops registration on canceled contexts.
    • Build tests/tooling: added CPU‑only Python image e2e checks with a base Dockerfile; make build-test runs via uv with the SDK project path and supports MODE filters.
  • Bug Fixes

    • Checkpoints: require both runtime and filesystem payloads; reject filesystem‑only archives.
    • Pod sandbox: fixed stderr error label; stdout/stderr/list propagate ok and error messages; memory snapshots validate ok and require a checkpoint ID; better cancellation classification during Docker startup.
    • Reliability/logging: gRPC retry interceptor stops on canceled contexts and respects backoff; cache host discovery “no hosts” logged at debug; CacheFS read‑ahead warning downgraded; registry copy logs missing source at debug; image pull logs are debug for build‑path fallbacks; network slot cleanup uses RPC timeouts; shutdown suppresses cleanup errors after cancellation; forceSymlink creates parent dirs; tar creation retries transient failures.
    • Object storage: HeadObject/CreateObject handle workspace storage availability correctly and avoid false existence checks when not using workspace storage.

Written for commit a6313df. Summary will update on new commits.

Review in cubic

@luke-lombardi luke-lombardi merged commit 42a5505 into main Jun 18, 2026
3 of 4 checks passed
@luke-lombardi luke-lombardi deleted the ll/improve-docker-opts branch June 18, 2026 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant