Skip to content

feat(service): emit resume-on-restart wrapper alongside service unit#7

Merged
crisandrews merged 1 commit intocrisandrews:mainfrom
JD2005L:feat/resume-on-restart
Apr 16, 2026
Merged

feat(service): emit resume-on-restart wrapper alongside service unit#7
crisandrews merged 1 commit intocrisandrews:mainfrom
JD2005L:feat/resume-on-restart

Conversation

@JD2005L
Copy link
Copy Markdown
Contributor

@JD2005L JD2005L commented Apr 15, 2026

What

/agent:service install now writes a small bash shim at ~/.clawcode/service/<slug>-resume-wrapper.sh and points the unit's ExecStart (or plist ProgramArguments) at it. The wrapper runs claude --continue so the service rehydrates the prior session on restart, preserving conversation history instead of starting fresh.

Why

Pairs naturally with the new opt-in watchdog from v1.3.0: when an external probe detects a silent failure and triggers a restart, the replacement process resumes the prior conversation instead of losing all context. Previously, every restart — whether from a crash, a deploy, or a watchdog-initiated recovery — meant a clean slate. For agents running long conversations on messaging channels (Telegram, WhatsApp, etc.), context loss across restarts is the kind of papercut that accumulates fast.

The motivating case for me: on my fork, a stalled inference → watchdog kills the process → service restarts → user's next message lands in a brand-new session that has no idea what they were just talking about. With this wrapper the user barely notices the restart happened.

Wrapper behavior

  • Runs claude --continue by default.
  • Falls back to a plain start when there is no prior session jsonl (first boot — --continue would otherwise error).
  • Falls back to a plain start when the last session jsonl is more than 7 days old. Long-stale resumes can behave oddly and in practice the user usually wants a fresh start after a week-off anyway.

All three paths exec the same underlying claude --dangerously-skip-permissions <extraArgs> — the wrapper only decides whether --continue is on the command line.

Opt out

service_plan({ action: "install", resumeOnRestart: false })

Returns the pre-change plan: unit/plist invokes claude directly, no wrapper is written, no extraFiles in the plan. Intended for users who explicitly want a fresh session on every restart.

Changes

lib/service-generator.ts

  • New ExtraFile type + ServicePlan.extraFiles — lets the plan request auxiliary files be written before install commands run. Used here for the wrapper; general enough for future auxiliary files (e.g. a loader, a pre-flight probe) without needing to extend ServicePlan again.
  • generateResumeWrapper() — pure generator for the bash shim. Takes claudeBin + workspace + extraArgs, bakes them into a self-contained script.
  • resumeWrapperPath(slug) — canonical per-slug install path under ~/.clawcode/service/.
  • generateSystemdUnit + generatePlist: new skipDefaultArgs flag. When the wrapper is in use it already embeds --dangerously-skip-permissions + extraArgs, so the unit/plist point at the wrapper bare without any args.
  • ServiceOptions.resumeOnRestart (default true).
  • uninstall action appends a best-effort rm -f for the wrapper so teardown leaves no stragglers.

skills/service/SKILL.md

  • Install flow writes each plan.extraFiles entry (mkdir parent, Write content, chmod to declared mode) after the settings.json pre-check from v1.3.0 and before writing the unit/plist.

docs/service.md

  • New "Resume-on-restart wrapper" section explaining the default behavior, fallbacks, and the opt-out.

Verification

Sanity-tested the generator directly via bun -e:

Case Result
install / linux wrapper emitted at ~/.clawcode/service/<slug>-resume-wrapper.sh, unit ExecStart points at it
install / darwin wrapper emitted, plist ProgramArguments points at it
install / resumeOnRestart: false stock claude invocation, no extraFiles
uninstall wrapper-cleanup rm -f appended to commands

Also verified end-to-end on my agent: killed the service while mid-conversation, waited for auto-restart, observed that the new process picked up with full prior context.

Interactions with the watchdog recipe

No code coupling — they are orthogonal features. The watchdog detects "service is up but silently broken" and issues a restart; the wrapper controls what happens on restart regardless of who triggered it. Running both gives the best recovery experience but neither requires the other.

Not changed

  • generatePlist on macOS still relies on launchd's ProcessType=Interactive to provide a TTY — no systemd-style script(1) wrapping here.
  • pkill ExecStartPre from v1.3.0 left alone; remains the right answer for the multi-instance race.
  • Existing installs are unaffected until the user re-runs /agent:service install.

Alternative considered

Embedding the --continue logic directly in generateSystemdUnit via ExecStartPre + conditional ExecStart. Rejected because launchd has no ExecStartPre equivalent and we'd end up with two different code paths. A wrapper script generalizes cleanly across both platforms.

/agent:service install now writes a small bash shim at
~/.clawcode/service/<slug>-resume-wrapper.sh and points the unit's
ExecStart (or plist ProgramArguments) at it. The wrapper runs
`claude --continue` so the service rehydrates the prior session on
restart, preserving conversation history instead of starting fresh.

This is especially useful with the new opt-in watchdog (v1.3.0) —
when the watchdog detects a silent failure and restarts the service,
the replacement process resumes the prior conversation instead of
losing all context.

Behavior (all inside the wrapper):
- Runs `claude --continue` by default.
- Falls back to a plain start when there is no prior session jsonl
  (first boot — --continue would error).
- Falls back to a plain start when the last session is more than 7
  days old. Long-stale resumes can behave oddly and in practice the
  user usually wants a fresh start after a week-off anyway.

Opt out via `service_plan({ action: 'install', resumeOnRestart: false })`.
That returns the pre-change plan — unit/plist invokes `claude` directly.

Changes:

- lib/service-generator.ts
  - New ExtraFile type + ServicePlan.extraFiles — lets the plan
    request auxiliary files be written before install commands run.
    Used for the wrapper; general enough for future auxiliary files.
  - generateResumeWrapper() — pure generator for the bash shim.
    Takes claudeBin + workspace + extraArgs; bakes them into a
    self-contained script.
  - resumeWrapperPath(slug) — canonical per-slug install path
    under ~/.clawcode/service/.
  - generateSystemdUnit + generatePlist: new skipDefaultArgs flag.
    When the wrapper is in use it already includes
    --dangerously-skip-permissions + extraArgs, so the unit/plist
    point at the wrapper bare.
  - ServiceOptions.resumeOnRestart (default true).
  - uninstall appends a best-effort `rm -f` for the wrapper so
    teardown leaves no stragglers.

- skills/service/SKILL.md: install flow writes each plan.extraFiles
  entry (mkdir parent, Write content, chmod to declared mode) after
  the settings.json pre-check and before writing the unit/plist.

- docs/service.md: new "Resume-on-restart wrapper" section explaining
  the default behavior, fallbacks, and the opt-out.

Sanity-tested via `bun -e` on the generator directly:
install / linux     -> wrapper emitted, unit points at it
install / darwin    -> wrapper emitted, plist ProgramArguments points at it
install / opt-out   -> stock claude invocation, no extraFiles
uninstall           -> wrapper-cleanup command appended
@JD2005L
Copy link
Copy Markdown
Contributor Author

JD2005L commented Apr 15, 2026

Heads-up: this PR should land together with or after #9. #7 alone emits a resume-wrapper whose exec "$CLAUDE_BIN" ... runs claude in daemon stdio without a PTY. On systemd, that combination reproduces a permanent restart loop after Claude Code's in-process auto-updater regenerates files mid-run (graceful exit returns 1, Restart=on-failure fires forever). #9 wraps ExecStart in /usr/bin/script -q -c '...' /dev/null and pins DISABLE_AUTOUPDATER=1, which fixes the loop and makes the wrapper safe to emit. Reproduced and patched live on my fork tonight.

@crisandrews crisandrews merged commit c34a1d4 into crisandrews:main Apr 16, 2026
JD2005L added a commit to JD2005L/ClawCode that referenced this pull request Apr 17, 2026
Follow-up to crisandrews#16 with a clarified rationale for keeping the env var.

The PTY wrap from crisandrews#9 fixes the SessionEnd-hook failure that turns graceful
exit into code 1 and causes restart churn. DISABLE_AUTOUPDATER=1 addresses
a different problem: Claude Code's in-process auto-updater regenerates
files it manages mid-run, including the resume-on-restart wrapper script
generated by crisandrews#7. A long-running daemon rewriting its own ExecStart target
while live is a file-integrity issue, separate from the crash loop, and
the PTY wrap does nothing for it.

On the "don't modify Claude Code internals" principle from crisandrews#16: the
principle stands, but DISABLE_AUTOUPDATER is a documented env var Claude
Code exposes for this use case. Setting it is a supported interface, not
a monkey-patch. The restored comment names the env var and the specific
file-regeneration scenario inline so future readers see the intent.

Also relevant to crisandrews#12 (/agent:update skill): the skill's explicit-manual-
update flow assumes in-process auto-update is off in service mode. With
auto-update running again, the skill competes with an updater that may
silently rewrite service files behind the operator.
crisandrews added a commit that referenced this pull request Apr 17, 2026
Supersedes #17 (conflict resolution against post-#8 main). JD's #17
targeted pre-#8 main so the block collided with the newly-added HOME/TERM
Environment lines. This commit places the same 8-line block immediately
after HOME/TERM so systemd sees all three env vars side by side.

The rationale (verbatim from #17, which is correct):

PTY wrap and DISABLE_AUTOUPDATER fix different problems:
- PTY wrap: SessionEnd hook needs a controlling terminal to spawn
  /bin/sh at graceful shutdown.
- DISABLE_AUTOUPDATER: prevents Claude Code's in-process auto-updater
  from regenerating daemon-relevant files (including the
  resume-on-restart wrapper from #7) while the daemon is running.

#16 incorrectly treated DISABLE_AUTOUPDATER as redundant
defense-in-depth against the crash loop. It is actually addressing the
file-integrity scenario, which the PTY wrap does not cover. Setting
DISABLE_AUTOUPDATER is also using a documented Claude Code env var,
not monkey-patching internals.

macOS plist remains unchanged (JD's #17 scope; parity can follow once
the systemd-side default settles).

Co-Authored-By: JD2005L <34459020+JD2005L@users.noreply.github.com>
crisandrews added a commit that referenced this pull request Apr 17, 2026
Supersedes #17 (conflict resolution against post-#8 main). JD's #17
targeted pre-#8 main so the block collided with the newly-added HOME/TERM
Environment lines. This commit places the same 8-line block immediately
after HOME/TERM so systemd sees all three env vars side by side.

The rationale (verbatim from #17, which is correct):

PTY wrap and DISABLE_AUTOUPDATER fix different problems:
- PTY wrap: SessionEnd hook needs a controlling terminal to spawn
  /bin/sh at graceful shutdown.
- DISABLE_AUTOUPDATER: prevents Claude Code's in-process auto-updater
  from regenerating daemon-relevant files (including the
  resume-on-restart wrapper from #7) while the daemon is running.

#16 incorrectly treated DISABLE_AUTOUPDATER as redundant
defense-in-depth against the crash loop. It is actually addressing the
file-integrity scenario, which the PTY wrap does not cover. Setting
DISABLE_AUTOUPDATER is also using a documented Claude Code env var,
not monkey-patching internals.

macOS plist remains unchanged (JD's #17 scope; parity can follow once
the systemd-side default settles).

Co-authored-by: crisandrews <crisandrews@users.noreply.github.com>
Co-authored-by: JD2005L <34459020+JD2005L@users.noreply.github.com>
crisandrews added a commit that referenced this pull request Apr 17, 2026
Summary of changes in this release (full detail in CHANGELOG.md):

Added
- Resume-on-restart wrapper for service mode (#7)
- Service hardening defaults: HOME/TERM env, StartLimitBurst guard, persistent log path (#8)
- /agent:update skill + heartbeat version-check with day-gate and per-version dedupe (#12)

Fixed
- WORKSPACE resolution so memory_search hits user's project dir, not plugin dir (#6, closes #5)
- Linux systemd crash loop after Claude Code auto-updates mid-run — PTY wrap in ExecStart + DISABLE_AUTOUPDATER=1 for file-integrity (#9, #17/#18)
- macOS launchd PTY wrap parity (#16)
- Cross-user /agent:import discovery + post-import path sanity check (#10)

Performance
- reconcile-crons.sh fast-path on steady-state sessions (#11)

Thanks to @JD2005L for the whole batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants