Claude Code reflection plugin + 907-stop classification baseline#139
Merged
Conversation
Group A of plan v2 for issue #137. Lays the foundation for the Claude Code reflection plugin without enabling it end-to-end yet: - claude/.claude-plugin/plugin.json + hooks/hooks.json — Stop hook wiring - claude/bin/reflect.mjs — entry skeleton with loop-guard, attempt counter, transcript tail-read, debug logging, fail-safe error handling. Strips tool_use/tool_result from the stop context per spec (only user msgs + final assistant text reach the judge). - claude/README.md, claude/package.json — install + author docs - evals/scripts/mine-cc-stops.mjs — scans ~/.claude/projects/**/*.jsonl, extracts Stop boundaries, emits candidate JSONL with metadata (tools_available_inferred, user_messages, final_assistant_text) - .gitignore — exclude raw cc-stop-*.jsonl datasets (contain user data); allow committing redacted gold set No classifier yet. No inject yet. Plugin loads but exits 0 on every Stop. Next: run miner, filter, classify with Claude Code haiku. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Group B/C of plan v2. - filter-cc-stops.mjs: heuristic pass over miner output. Tags each candidate with hint:summary_drift / hint:punt / hint:stuck / hint:question. Drops candidates with no hints (cheap "complete" answers). - classify-cc-stops.mjs: calls Anthropic API directly with the OAuth Bearer token from ~/.claude/.credentials.json (avoids the ~100K context bloat that `claude -p` loads from CLAUDE.md / skills / plugins). Same model (claude-haiku-4-5), same user auth — just routed direct. Concurrency 4, retry-on-429, resume-safe (skips records already in output). Output JSONL stays gitignored (evals/datasets/cc-stop-*.jsonl) — real user session data. Only the redacted gold subset is committed downstream. Smoke run: 10 samples classified in ~9s, 1294 input tokens/sample avg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end pipeline now works:
- claude/lib/judge.mjs: classifies a stop context into one of 6 categories
via Haiku 4.5 over the Anthropic API (OAuth Bearer from
~/.claude/.credentials.json, same path as the eval classifier). 15s
hard timeout via AbortController. TIMEOUT/PARSE_ERROR returns are
treated as "no inject" by the caller — fail-safe.
- claude/lib/feedback.mjs: per-category templates with escalating tone
across attempts 1/2/3. Injects on summary_drift_stop, tool_available_punt,
genuinely_stuck. Skips on complete, waiting_for_user_legitimate, working,
and any error category.
- claude/bin/reflect.mjs: replaced the task-11/13 TODO blocks. Now reads
stdin, applies loop-guard + attempt-cap, calls judge, writes verdict
file, and (if injectable) emits the {decision:"block", additionalContext}
JSON on stdout per Claude Code Stop hook spec.
Smoke-tested with a real transcript file. Verified:
- happy path produces a valid block payload with additionalContext
- stop_hook_active=true: exits 0, no stdout, logs loop_guard_triggered
- attempt counter at MAX: exits 0, no stdout, logs attempt_cap_reached
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#137) - claude/test/reflect.test.mjs: 35 Node native-test cases covering feedback templates per category/attempt, reflect.mjs exports (loopGuard, attempt counter round-trip, transcript tail, stop context build), judge.mjs (stubbed fetch — zero real API calls, code-fence parsing, 429 retry, AbortController timeout, missing credentials path), and an in-process integration test (classify → buildFeedback → block output JSON). All 35 pass in ~300ms with --test-force-exit. - claude/package.json: test script uses --test-force-exit + explicit glob (test discovery without glob silently mis-resolved on Node 22). - evals/scripts/audit-cc-classifications.mjs: stratified sample (per-cat) + redaction (emails, tokens, /home paths, github refs, long secrets). - evals/datasets/cc-stop-labeled-gold-redacted.jsonl: 30 records, stratified 6 per category across the 5 categories that appeared in the 907-record baseline. supervisor-audited gold_label per record (v1 mostly accepts haiku, with one correction class: "complete" + ends-with-"Which?" → waiting_for_user_legitimate). - evals/datasets/README.md: dataset provenance, redaction rules, baseline distribution, known prompt issues (link to follow-up #138). Follow-up tracked in #138: refine classifier prompt (working over-assigned 374×, tool_available_punt under-assigned 0×). Acceptance: F1 ≥ 0.75 on the two high-value categories with an expanded gold set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer raised 5 real issues, all fixed: 1. claude/bin/reflect.mjs:23 — removed unused createRequire import. 2. claude/bin/reflect.mjs:100-109 — added sanitizeCwd() helper. Rejects non-absolute or non-normalized cwd from the Stop hook payload (defends against payloads like cwd:"../etc"). On throw, the existing uncaughtException handler exits 0 — fail-safe. 3. claude/bin/reflect.mjs:165-186 — writeAttemptCounter is now atomic (tmp + POSIX rename) AND concurrency-safe: only writes if the new count exceeds the existing on-disk count. Prevents two racing Stop hooks for the same session from clobbering each other and bypassing the 3-inject cap. 4. claude/bin/reflect.mjs:148-154 — readAttempts handles a corrupt / partially-written counter file by returning 0 and logging "attempts_file_corrupt". 5. claude/lib/judge.mjs:43-62, 285+ — added sanitizeError() helper. Strips Bearer/authorization/x-api-key from API error texts before they reach debug logs. Prevents the OAuth token from leaking if the Anthropic API echoes auth headers on a 401. 6. evals/scripts/audit-cc-classifications.mjs:34-40 — strengthened redaction patterns: fixed "Accept-Bearer" → case-insensitive "Authorization: Bearer", added x-api-key, Stripe (sk/pk/rk_test/live), AWS access keys (AKIA...), and JWT-shaped tokens (a.b.c). JWT pattern placed before the long-secret regex because dots break \b boundaries. Existing 35 unit tests still pass (npm test, 291ms). Smoke verified: - valid absolute cwd → emits decision:block as before - cwd:"/tmp/../etc" → sanitizeCwd throws → uncaughtException → exit 0, no stdout, no fs writes outside the project tree - cwd:"./relative" → same fail-safe behavior Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 7 reviewer flagged that the 35-test suite in claude/test/ was not run by CI — only the root Jest suite (test/*.ts) was. Adds a post-step that runs node --test --test-force-exit test/*.mjs in ./claude so future regressions land in CI, not on the dev box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per user feedback: stubbed-fetch unit tests can't prove the Stop hook
actually fires inside Claude Code or that injects reach the agent. Real
E2E with `claude -p` + real Anthropic API is the only meaningful gate.
Changes:
1. Deleted claude/test/reflect.test.mjs (35 unit tests, all stubbed).
2. Removed the corresponding CI step in .github/workflows/test.yml.
3. Added claude/test/e2e-cc.mjs: real E2E runner with 4 scenarios:
- explicit_wait_negative: user says "wait" -> plugin must not inject.
- complete_negative: trivial Q&A -> plugin must not inject.
- attempt_cap_respected: multi-file task -> no false-positive injects,
attempt cap honored.
- direct_pipe_summary_drift: synthetic drift transcript piped directly
to reflect.mjs -> verifies the full inject path: real classifier
call, correct CC Stop hook schema in stdout, no hookSpecificOutput.
Run: node claude/test/e2e-cc.mjs (or per scenario: --scenario N).
Cost ~$0.05-0.20/scenario via Haiku 4.5 OAuth. Out of CI (auth + cost).
Bug fixes uncovered by E2E:
1. claude/bin/reflect.mjs: hook fires BEFORE transcript flush in -p
mode. Added poll loop (100ms x 10) that re-reads transcript until the
final assistant text appears. If still empty after polling, exit 0
(fail-safe -- better to skip than false-positive inject).
2. claude/bin/reflect.mjs: Stop hook JSON schema fix. CC v2.1.150
rejects { decision, reason, hookSpecificOutput: {...} } as "Invalid
input" -- that shape is for PreToolUse / PostToolUse. The correct
Stop hook shape per hookify/core/rule_engine.py and empirical test
is { decision: "block", reason }. CC injects reason as the agent's
next-turn instruction; the longer feedback message now goes in
reason. Verified by hook_blocking_error attachment + isMeta user
message "Stop hook feedback: <reason>" in the transcript.
E2E results (2026-05-26):
- 4/4 PASS
- s1 (explicit_wait_negative): 0 injects (correct)
- s2 (complete_negative): 0 injects (correct)
- s3 (attempt_cap_respected): 0 injects (Haiku didn't drift on this task)
- s4 (direct_pipe_summary_drift): 1 inject with schema-valid stdout
Known test-methodology limitation (follow-up): Haiku 4.5 rarely drifts
on small E2E prompts so scenario 3 is vacuously satisfied. The architecture
is proven; pattern provocation needs Sonnet or longer-horizon tasks.
Install for sessions (workaround for --plugin-dir not enabling Stop
hooks in -p mode, CC v2.1.150): merge hooks/hooks.json into your
~/.claude/settings.json under the "hooks" key, with command path
pointing at this plugin's bin/reflect.mjs absolute path. Plugin packaging
remains for future marketplace publication.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…137) - Install: settings.json hook is the authoritative path; --plugin-dir doesn't activate Stop hooks in headless -p mode on CC v2.1.150. Document the marketplace path as future work. - Failure categories: corrected to the 6 the classifier actually uses (matched judge.mjs/feedback.mjs). Removed the older speculative context_exhaustion/decision_paralysis/false_completion entries that never landed in the prompt. - Testing: documented the new E2E runner (node claude/test/e2e-cc.mjs) with scenario descriptions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
dzianisv
added a commit
that referenced
this pull request
Jun 20, 2026
…ompt (#138) (#147) * feat(reflection): v2.3 prompt + head+tail truncation (#138) Three discriminator rules added after auditing 907 baseline + 89.5% v2.3 eval on combined 57-record gold (CC+OC): 1. CLOSING DELIVERY DOMINATES: if a long turn ends with a result (PR pushed, tests pass, verdict, error returned), intermediate "Let me check X. Let me apply Y" phrases were process narration, not commitment. Turn is COMPLETE. 2. CAPABILITY OFFER IS NOT COMMITMENT: "I can do X" / "Next I can run X" / "If you want, I can X" is an offer to the user. Drift requires committed action ("I will run", "Let me now"), not a capability statement. 3. IMPERATIVE GUIDANCE TO USER IS COMPLETE: "Try X", "Use /clear", "Run npm test" directed at the user is guidance, not agent self-action. Plus head+tail truncation: long final_assistant_text was capped at 2400ch, hiding the closing delivery on turns ≥3kb (CC#23 was 3871ch). Now keep 1800 head + 2400 tail so both opening commitment and closing result land in the prompt. Bug fix: prevents false-positive drift on long delivery turns. Eval results on combined 57-record gold (CC 30 + OC 27): - v1 baseline: 42.1% accuracy, drift F1=0.55 - v2.3: 89.5% accuracy, drift F1=0.82, complete recall 28/28 (100%) - Zero false-positive drift on the 28 complete gold records — eliminates the exact failure mode logged in #138 (false-inject on "watch over next few days" delivery summary). Acceptance status: - F1 ≥ 0.75 on summary_drift_stop: 0.82 ✓ - working < 5% on corpus: 0.27% CC, 0% OC ✓ - F1 ≥ 0.75 on tool_available_punt: deferred — 0 gold records (rare pattern in this user's sessions). Prompt has discriminator + 2 few-shots ready to evaluate when prod data surfaces examples. Closes #138. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(evals): OC stop dataset mining infrastructure (#138) Adds OpenCode SQLite stop-event miner + Haiku classifier + audit harness, mirroring the CC scripts so the same #138 acceptance criteria can be measured on OpenCode sessions. - mine-oc-stops.mjs — walks ~/.local/share/opencode/opencode.db (sqlite3 -readonly -json), emits same JSONL schema as mine-cc-stops.mjs. - classify-oc-stops.mjs — same v2.3 prompt as CC classifier (kept in sync with claude/lib/judge.mjs and classify-cc-stops.mjs). OAuth Bearer via ~/.claude/.credentials.json. - audit-oc-classifications.mjs — stratified sample + redaction for publishable gold set, matching cc-stop variant. - .gitignore — raw OC datasets excluded; only redacted gold subset committed. 463 OC candidates mined from 727 sessions across 15 projects; classifier distribution: 193 complete / 190 drift / 69 wait / 11 stuck / 0 punt. Verifies the OC stop boundary code-path is wired and produces records shaped for the same eval-v2-on-gold regression harness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * evals(gold): expand to 57 records + regression harness + relabel 3 drift (#138) - evals/datasets/oc-stop-labeled-gold-redacted.jsonl: 27 redacted gold records from OC dataset, stratified by category, same redaction patterns as CC gold (emails, sk-ant-*, ghp_*, AKIA*, Bearer, /home/<user>/, JWT, github.com/<owner>/<repo>). - evals/datasets/cc-stop-labeled-gold-redacted.jsonl: relabel 3 records found mislabeled during v2.3 audit: · CC#19 drift → waiting (ends with "Want me to update?" permission question) · OC#9 drift → complete (ends with "Fixes to consider:" suggestions list) · OC#17 drift → waiting (ends with "Could you share what alice said?") Original labels preserved in `gold_label_v1` field with `gold_label_audit` rationale per record. Combined gold dist: 28 complete, 14 waiting, 8 drift, 7 stuck — drift count meets #138 acceptance (≥8 per measured category). - evals/scripts/eval-v2-on-gold.mjs: regression harness — reclassifies gold records with the current v2.3 prompt, computes per-category accuracy + F1 + confusion matrix vs gold_label. Used to verify prompt edits did not regress complete recall (must stay 100% — false-positive drift is the worst failure mode, see PR #139 prod incident). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(evals): ignore v2-classified output JSONL (#138) cc-stop-classified-v2.jsonl + oc-stop-classified-v2.jsonl contain raw user-session text after re-classification with the v2.3 prompt — same privacy treatment as v1 classified files. Only the redacted gold subset is committed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thinking Path
What Changed
deploy/systemd/paperclip.servicetemplate covering the three common install styles (npx, source checkout viapnpm, source checkout via directtsx) with inlineTODOmarkers for the two values an operator must edit.doc/SYSTEMD.mdwalkthrough: install, enable lingering, start, verify, common ops, updating, and a troubleshooting section for the failure modes that bit me on first install (no-TTYtsxloader,Start request repeated too quickly,ENOSPCcrash loops, tailnet bind ordering).README.md: one-line link from the install snippet so first-time self-hosters discover the systemd path without having to search.doc/DEVELOPING.md: one-paragraph cross-link next to the Docker Quickstart / Quadlet sections.No code, no manifest, no lockfile, no Dockerfile changes — strictly docs + a sample unit.
Verification
Both pieces were validated on the host that motivated this change (Ubuntu 24, source checkout, tailnet bind, embedded Postgres):
~/.config/systemd/user/paperclip.service, fill in the twoTODOvalues.sudo loginctl enable-linger "$USER"thensystemctl --user daemon-reload && systemctl --user enable --now paperclip.service.systemctl --user status paperclip.service→Active: active (running)within ~10 s.journalctl --user -u paperclip.service -fshows the Paperclip banner, embedded PostgreSQL ready line, and Better Auth init.curl -sf http://<bind-ip>:3100/api/health→{"status":"ok",...}.Sanity-checked the doc against the failure paths in the troubleshooting section by reproducing them on the same host before writing them up.
Risks
Low. Pure docs and a sample file under a new
deploy/systemd/path. No existing files are removed, no runtime, build, or CI behavior changes. The README/DEVELOPING edits are additive paragraphs. Worst case for a reader is that they follow the guide on an unsupported distro and the service does not start — at which point they are no worse off than before this PR existed.Model Used
Claude Opus 4.7 (1M context window, extended thinking enabled, tool use including Bash/Read/Edit/Write/Grep). Human-reviewed, edited, and verified on the target host.
Checklist
Closes #467. Supersedes #555 (docs-only) by also shipping the sample unit file.