Skip to content

feat(service): harden defaults — env, crash-loop guard, persistent logs#8

Open
JD2005L wants to merge 1 commit intocrisandrews:mainfrom
JD2005L:feat/service-hardening
Open

feat(service): harden defaults — env, crash-loop guard, persistent logs#8
JD2005L wants to merge 1 commit intocrisandrews:mainfrom
JD2005L:feat/service-hardening

Conversation

@JD2005L
Copy link
Copy Markdown

@JD2005L JD2005L commented Apr 15, 2026

What

Three orthogonal install-hardening items that each stand on their own:

  1. Inject HOME and TERM explicitly into both systemd unit and launchd plist.
  2. Crash-loop guard on systemd via StartLimitIntervalSec=300 + StartLimitBurst=5.
  3. Persistent default log path/tmp/clawcode-<slug>.log~/.clawcode/logs/<slug>.log, with install plan creating the dir.

Why

1. Explicit HOME + TERM

systemd user services and launchd both start with a largely empty environment. Anything in Claude Code or a plugin that reads \$HOME (resolving ~), or probes \$TERM (color output, TUI detection, several Node libraries), behaves unpredictably when those are unset. Some combinations fail loudly; more often they fail subtly — e.g. a skill that works interactively and then no-ops under the service with no clear error. Setting them at the unit/plist level is cheap and universally safe.

2. Crash-loop guard

With Restart=always and RestartSec=10, a deterministic boot-time error (bad config, missing binary, malformed extraArgs) churns forever at 10 s intervals, flooding logs and journald. StartLimitIntervalSec=300 + StartLimitBurst=5 tells systemd to give up after 5 restarts in 5 minutes so the failure surfaces in systemctl status instead of hiding in noise. No change on healthy services.

macOS launchd already has its own throttling (ThrottleInterval, ExitTimeOut), so no plist change is needed for this item.

3. Persistent log path

/tmp is wiped on reboot, so a service that auto-restarts through a reboot loses the log that explains why it was failing in the first place. Moving to ~/.clawcode/logs/ keeps logs available for post-mortem and matches where the rest of the agent's per-user state sits.

systemd's append: and launchd's StandardOutPath do NOT create missing parent dirs and the service will silently refuse to start without them, so the install plan now includes a "Create log directory" command up front.

Changes

lib/service-generator.ts

  • defaultLogPath now returns ~/.clawcode/logs/<slug>.log (was /tmp/clawcode-<slug>.log).
  • generateSystemdUnit emits Environment=HOME=..., Environment=TERM=xterm-256color, StartLimitIntervalSec=300, StartLimitBurst=5.
  • generatePlist emits EnvironmentVariables with HOME and TERM.
  • buildPlan install action prepends a "Create log directory" command.

docs/service.md

  • Example systemd unit and plist updated.
  • Logs section reflects new default + explains why the log dir is created at install.
  • Restart-loop troubleshooting row points at the new log location and mentions StartLimitBurst so users know to check systemctl status when systemd has given up.

Verification

bun -e on the generator directly:

--- unit ---
[Unit]
Description=ClawCode Agent (claude)
After=network.target

[Service]
Type=simple
WorkingDirectory=/home/claude
Environment=HOME=/home/claude
Environment=TERM=xterm-256color
ExecStartPre=-/usr/bin/pkill -f \"claude.*--dangerously-skip-permissions\"
ExecStart=/usr/local/bin/claude --dangerously-skip-permissions
Restart=always
RestartSec=10
# Crash-loop guard: stop restarting after 5 failures within 5 minutes
# so a deterministic boot-time error doesn't churn forever.
StartLimitIntervalSec=300
StartLimitBurst=5
StandardOutput=append:/home/claude/.clawcode/logs/claude.log
StandardError=append:/home/claude/.clawcode/logs/claude.log

--- commands ---
  Create log directory: mkdir -p \"/home/claude/.clawcode/logs\"
  Create systemd user directory: mkdir -p \"/home/claude/.config/systemd/user\"
  Reload systemd: systemctl --user daemon-reload
  Enable + start the service: systemctl --user enable --now clawcode-claude.service

Plist output similarly verified — EnvironmentVariables dict with HOME and TERM, persistent StandardOutPath.

Not changed (explicitly)

  • ExecStartPre=-/usr/bin/pkill ... from v1.3.0 left alone — it's the right answer for the multi-instance race.
  • Restart=always and RestartSec=10 kept. These are opinionated and the current defaults are reasonable; crash-loop guard is the right fix for their only real downside.
  • No script(1)-wrap of ExecStart. I run this locally and it's arguably helpful for some skills that expect a PTY, but the value is niche enough that it doesn't belong in a default.

Relationship to other PRs

Three orthogonal install hardening items that stand on their own:

1. Inject HOME and TERM explicitly.
   systemd user services and launchd both start with a largely empty
   environment. Anything in Claude Code or a plugin that reads $HOME
   (e.g. resolving `~`), or probes $TERM (color output, TUI detection,
   some node libraries), behaves unpredictably when those are unset.
   Setting HOME=<os.homedir()> and TERM=xterm-256color at the unit /
   plist level is a cheap, universally safe default.

2. Crash-loop guard on systemd (StartLimitIntervalSec + StartLimitBurst).
   With `Restart=always` and `RestartSec=10`, a deterministic boot-time
   error (bad config, missing binary, malformed extraArgs) churns
   forever at 10 s intervals, flooding logs and journald. Adding
   `StartLimitIntervalSec=300` + `StartLimitBurst=5` tells systemd to
   give up after 5 restarts in 5 minutes so the failure surfaces in
   `systemctl status` instead of hiding in noise. No behavior change
   on healthy services. macOS launchd already has its own throttling
   (ThrottleInterval / ExitTimeOut), so no plist change needed.

3. Persistent log path: /tmp → ~/.clawcode/logs/<slug>.log.
   /tmp is wiped on reboot, so a service that auto-restarts through a
   reboot loses the log that explains why it was failing. Moving under
   ~/.clawcode/logs/ matches where the service's other per-agent state
   lives (~/.claude/... for Claude Code, ~/.clawcode/ for clawcode
   scaffolding) and keeps logs around for post-mortem. The install
   plan now includes a `Create log directory` command up front —
   systemd's `append:` and launchd's StandardOutPath do NOT create
   missing parent dirs and the service silently refuses to start
   without them.

Docs updated:
- The example systemd unit and plist in `docs/service.md` reflect
  the new env lines, crash-loop guard, and log path.
- The "Logs" section mentions the persistent default + why the log
  directory is created at install time.
- The restart-loop troubleshooting row now points at the new log
  location and mentions StartLimitBurst so users know to check
  `systemctl status` when the service has given up.

Not changed:
- `pkill` ExecStartPre (v1.3.0 already has the right version).
- Restart policy kept as `always` + `RestartSec=10` — these are
  opinionated and the defaults are reasonable; crash-loop guard
  is the right fix for their downside.
- macOS launchd's EnvironmentVariables is the plist-level equivalent
  of systemd's `Environment=` lines — added for parity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant