Skip to content

bd init --shared-server: 10s waitForReady timeout too short for first-run Dolt superuser creation on Windows #3142

@ghbaud

Description

@ghbaud

Summary

bd init --force --shared-server reliably fails on Windows 11 with timeout after 10s waiting for server at 127.0.0.1:3308. The hardcoded 10-second waitForReady budget at internal/doltserver/doltserver.go:762 is insufficient because Dolt's first-run SQL engine initialization (between "Creating root@localhost superuser" and "Server ready. Accepting connections.") takes ~63 seconds on this hardware — verified by running dolt sql-server manually with identical configuration. bd kills its child well before Dolt finishes initializing, then reports the child as "not accepting connections."

Environment

  • Windows 11 Pro 10.0.26200 (native, MSYS2/mingw64 bash)
  • bd 1.0.0 (commit 9ab8a79)
  • dolt 1.85.0 (standalone from official installer)

Reproduction

From a clean state (no existing ~/.beads/shared-server/):

rm -rf ~/.beads/shared-server/
bd init --force --shared-server --non-interactive --role=maintainer \
  --prefix=wh --from-jsonl --destroy-token=DESTROY-wh

Error:

Error: failed to start shared Dolt server: server started (PID NNNNN) but not
accepting connections on port 3308: timeout after 10s waiting for server at 127.0.0.1:3308
Check logs: C:\Users\<user>\.beads\shared-server\dolt-server.log

dolt-server.log contains only two lines — no error, no shutdown message, no "Server ready":

Starting server with Config HP="127.0.0.1:3308"|T="28800000"|R="false"|L="info"
time="..." level=info msg="Creating root@localhost superuser"

Confirmed via tasklist that the dolt.exe process is gone post-timeout. This is bd killing its child at the 10s mark, not Dolt crashing — the absence of any shutdown notice in the log is consistent with TerminateProcess, not an orderly exit.

Reproduced twice from fully clean state (both ~/.beads/shared-server/ nuked and a fresh retry). Same failure every time.

Diagnostic — Dolt's real time-to-ready on this machine

Running dolt sql-server manually with an identical config file (and identical cwd), waiting patiently instead of timing out:

21:45:00  Starting server with Config HP="127.0.0.1:3307"
21:45:00  Creating root@localhost superuser
21:46:03  Server ready. Accepting connections.

63 seconds end-to-end. I also polled both TCP bind and MySQL query responsiveness during startup:

Time TCP bound? MySQL query works?
2s no no
4s yes no
... yes no
63s yes yes

TCP bind happens at ~4 s (fast), but the MySQL listener is not query-ready until 63 s. The bottleneck is first-run bootstrap work: privileges.db creation, stats .dolt/stats/.dolt/ nested subrepo init, and whatever other work Dolt does before the SQL engine accepts connections.

On subsequent restarts with a warm privileges.db, the 10 s budget is probably fine. But bd init --force on a fresh ~/.beads/shared-server/ always hits the cold path.

Relationship to existing shared-server issues

I reviewed the shared-server lifecycle bug cluster before filing:

None of them describe the 10-second waitForReady timeout specifically failing on first-run Dolt bootstrap with the exact log signature above. The ~63-second measurement and the "Creating root@localhost superuser" → silence → SIGKILL pattern are new data points for this cluster.

PR #3139's new --external flag (merged 2026-04-08) would let me sidestep this entirely by pre-starting the server and handing bd the port — but it's on main, not in v1.0.0.

Suggested fixes (in order of invasiveness)

  1. Env var override. BEADS_DOLT_READY_TIMEOUT (seconds, default 10) passed through to waitForReady. Minimally invasive, lets affected users self-mitigate immediately. The change should be ~5 lines.
  2. Longer default on first-run path. Detect fresh initialization (no existing .bd-dolt-ok marker and/or no privileges.db) and use a larger budget (~120 s) only on that branch. Subsequent starts keep the current fast path. Better UX than option 1 because users don't need to know about a knob.
  3. Progressive log polling. Instead of a fixed timeout, tail dolt-server.log looking for the "Server ready. Accepting connections." line as the readiness signal. Same information Dolt is already emitting; bd can consume it directly. Most robust, most refactoring.

Option 1 is the cheapest change and unblocks affected users immediately. Option 2 is the cleanest long-term UX. Option 3 is the most robust against future changes in Dolt's startup cost.

Workaround I'm using

--server mode with an externally-managed dolt sql-server, pre-started manually and kept alive across sessions via a project-local SessionStart hook. The migration itself took 5 seconds once the daemon was already running (bd connected immediately via its built-in MySQL client).

I'd prefer --shared-server because it has less setup burden — beads would manage the daemon lifecycle transparently. That's why I'm filing this rather than keeping the workaround silently: the clean architectural answer is blocked by a ~5-line fix.

Willingness to contribute

Happy to test a patch against my environment (same repro every time) and report startup timings. If preferred, I can draft a PR for Option 1 (env var override) — it should be a trivial change and would give affected users an immediate escape hatch while the team decides on Option 2 or 3.

/cc @maphew @coffeegoddd

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions