diff --git a/README.md b/README.md index 97dc35b..05080ec 100644 --- a/README.md +++ b/README.md @@ -115,11 +115,21 @@ Save what we just learned as a skill Not every task produces a skill. It only extracts knowledge that required actual discovery (not just reading docs), will help with future tasks, has clear trigger conditions, and has been verified to work. +### Tentative Knowledge + +Not all patterns are ready for full skill extraction. When a discovery meets some but not all quality criteria (e.g., observed once but not yet verified across contexts), Claudeception saves it as a **tentative note** — a lightweight YAML file in `memory/tentative/` with a confidence score. + +- **Confidence scoring**: starts at 0.4, increases with repeated observations (+0.15 same context, +0.20 different context) and user confirmation (+0.30), decreases with counter-examples (−0.20) +- **Automatic promotion**: when confidence reaches 0.7 with 2+ observations from distinct sessions, the note is suggested for promotion to a full skill +- **Expiry**: notes that go 180 days without new observations are auto-deleted; low-confidence notes expire sooner + +See `resources/instinct-template.yaml` for the YAML schema and `resources/tentative-knowledge.md` for detailed rules. + ## Research The idea comes from academic work on skill libraries for AI agents. -[Voyager](https://arxiv.org/abs/2305.16291) (Wang et al., 2023) showed that game-playing agents can build up libraries of reusable skills over time, and that this helps them avoid re-learning things they already figured out. [CASCADE](https://arxiv.org/abs/2512.23880) (2024) introduced "meta-skills" (skills for acquiring skills), which is what this is. [SEAgent](https://arxiv.org/abs/2508.04700) (2025) showed agents can learn new software environments through trial and error, which inspired the retrospective feature. [Reflexion](https://arxiv.org/abs/2303.11366) (Shinn et al., 2023) showed that self-reflection helps. +[Voyager](https://arxiv.org/abs/2305.16291) (Wang et al., 2023) showed that game-playing agents can build up libraries of reusable skills over time, and that this helps them avoid re-learning things they already figured out. [CASCADE](https://arxiv.org/abs/2512.23880) (2025) introduced "meta-skills" (skills for acquiring skills), which is what this is. [SEAgent](https://arxiv.org/abs/2508.04700) (2025) showed agents can learn new software environments through trial and error, which inspired the retrospective feature. [Reflexion](https://arxiv.org/abs/2303.11366) (Shinn et al., 2023) showed that self-reflection helps. Agents that persist what they learn do better than agents that start fresh. diff --git a/SKILL.md b/SKILL.md index 69db71b..33ea18d 100644 --- a/SKILL.md +++ b/SKILL.md @@ -5,9 +5,11 @@ description: | Triggers: (1) /claudeception command to review session learnings, (2) "save this as a skill" or "extract a skill from this", (3) "what did we learn?", (4) After any task involving non-obvious debugging, workarounds, or trial-and-error discovery. Creates new Claude Code - skills when valuable, reusable knowledge is identified. + skills when valuable, reusable knowledge is identified. Also captures tentative notes + for emerging patterns not yet ready for full skill extraction — lightweight YAML notes + with confidence scoring in memory/tentative/. author: Claude Code -version: 3.0.0 +version: 3.1.0 allowed-tools: - Read - Write @@ -110,6 +112,29 @@ Analyze what was learned: - What would someone need to know to solve this faster next time? - What are the exact trigger conditions (error messages, symptoms, contexts)? +### Step 2.5: Triage — Full Skill vs Tentative Note + +After identifying the knowledge, decide which extraction path to take: + +| Criteria Check | Path | +|---------------|------| +| All 4 Quality Criteria met (Reusable + Non-trivial + Specific + Verified) | **Full skill** → continue to Step 3 | +| Specific is met, plus at least 1 other criterion has partial evidence | **Tentative note** → see below | +| Specific is met but no other criterion shows even partial evidence | **Discard** — too thin to be useful | +| Cannot describe a clear trigger + action (Specific not met) | **Discard** — not worth capturing | + +"Partial evidence" means at least Non-trivial or Reusable shows initial signs (e.g., "this pattern +likely applies elsewhere but hasn't been verified across contexts"). A 3-of-4 case (e.g., Reusable + +Non-trivial + Specific but not Verified) takes the tentative path — missing any criterion disqualifies +from full skill extraction. + +**Tentative note path** (skips Steps 3-6): +1. Ensure `memory/tentative/` directory exists (Write tool creates parent directories automatically) +2. Check for existing notes: match by filename `{name}.yaml`, fall back to LLM-assessed trigger similarity +3. If match found: update `confidence`, add observation entry, update `last_seen` +4. If new: create YAML note from `resources/instinct-template.yaml` with initial confidence 0.4 +5. See `resources/tentative-knowledge.md` for detailed confidence rules and edge cases + ### Step 3: Research Best Practices (When Appropriate) Before creating the skill, search the web for current information when: @@ -124,7 +149,7 @@ Before creating the skill, search the web for current information when: **When to search:** - The topic involves specific technologies, frameworks, or tools - You're uncertain about current best practices -- The solution might have changed after January 2025 (knowledge cutoff) +- The solution might have changed after May 2025 (knowledge cutoff) - There might be official documentation or community standards - You want to verify your understanding is current @@ -227,10 +252,15 @@ executable helpers. When `/claudeception` is invoked at the end of a session: 1. **Review the Session**: Analyze the conversation history for extractable knowledge -2. **Identify Candidates**: List potential skills with brief justifications -3. **Prioritize**: Focus on the highest-value, most reusable knowledge -4. **Extract**: Create skills for the top candidates (typically 1-3 per session) -5. **Summarize**: Report what skills were created and why +2. **Scan Tentative Notes**: Check `memory/tentative/*.yaml` for: + - Existing notes that match observations from this session (update confidence) + - Notes meeting promotion threshold (confidence >= 0.7, observations >= 2 from distinct sessions) + - Notes declined for promotion twice in separate sessions (skip auto-suggest; see `resources/tentative-knowledge.md` § Promotion Declined Twice) + - Stale notes past expiry thresholds (flag for cleanup or auto-delete per expiry rules; promotion takes precedence over stale) +3. **Identify Candidates**: List potential skills (from session + promoted tentative notes) +4. **Prioritize**: Focus on the highest-value, most reusable knowledge +5. **Extract**: Create skills for the top candidates (typically 1-3 per session) +6. **Summarize**: Report what skills were created, tentative notes updated, and promotions suggested ## Self-Reflection Prompts @@ -267,6 +297,8 @@ Before finalizing a skill, verify: - [ ] Web research conducted when appropriate (for technology-specific topics) - [ ] References section included if web sources were consulted - [ ] Current best practices (post-2025) incorporated when relevant +- [ ] If knowledge doesn't meet all 4 Quality Criteria: considered tentative note path before discarding +- [ ] Tentative notes contain valid trigger condition and confidence in [0.1, 0.95] ## Anti-Patterns to Avoid @@ -280,29 +312,76 @@ Before finalizing a skill, verify: Skills should evolve: +0. **Tentative**: Lightweight YAML note with confidence scoring; may be promoted or expire 1. **Creation**: Initial extraction with documented verification 2. **Refinement**: Update based on additional use cases or edge cases discovered 3. **Deprecation**: Mark as deprecated when underlying tools/patterns change 4. **Archival**: Remove or archive skills that are no longer relevant +## Tentative Knowledge Management + +Tentative notes capture emerging patterns that don't yet meet all 4 Quality Criteria. They live +in `memory/tentative/` as lightweight YAML files, accumulating confidence through repeated +observations until they are promoted to full skills or expire. + +**Schema**: See `resources/instinct-template.yaml` for the YAML template with field documentation. + +### Confidence Rules (Summary) + +| Event | Delta | +|-------|-------| +| Initial observation | 0.4 (starting value) | +| Re-observed in same context | +0.15 | +| Observed in different context | +0.20 | +| User explicit confirmation | +0.30 (confidence only; does NOT count as an observation) | +| Counter-example observed | −0.20 | + +Confidence is clamped to [0.1, 0.95] after each adjustment. + +### Promotion + +A tentative note is eligible for promotion when **both** conditions are met: +- `confidence >= 0.7` +- `observations >= 2` from **>= 2 distinct sessions or dates** + +During Retrospective Mode, eligible notes are presented for user confirmation. If confirmed, +the note's content pre-fills Steps 3-6 to create a full skill; the YAML file is then deleted. + +### Expiry + +| Condition | Action | +|-----------|--------| +| 90 days since `last_seen`, no new observation | Mark stale; prompt for cleanup | +| 180 days since `last_seen` | Auto-delete | +| `confidence < 0.3` AND 60 days since `last_seen` | Early delete | + +If a note simultaneously meets promotion thresholds and stale criteria, **promotion takes precedence**. + +### Detailed Rules + +See `resources/tentative-knowledge.md` for complete rules on confidence arithmetic edge cases, +promotion protocol, expiry details, deduplication strategy, and cross-project aggregation placeholder. + ## Example: Complete Extraction Flow **Scenario**: While debugging a Next.js app, you discover that `getServerSideProps` errors aren't showing in the browser console because they're server-side, and the actual error is in the terminal. -**Step 1 - Identify the Knowledge**: +**Step 1 - Check for Existing Skills**: No matching skills found. + +**Step 2 - Identify the Knowledge**: - Problem: Server-side errors don't appear in browser console - Non-obvious aspect: Expected behavior for server-side code in Next.js - Trigger: Generic error page with empty browser console -**Step 2 - Research Best Practices**: +**Step 3 - Research Best Practices**: Search: "Next.js getServerSideProps error handling best practices 2026" - Found official docs on error handling - Discovered recommended patterns for try-catch in data fetching - Learned about error boundaries for server components -**Step 3-5 - Structure and Save**: +**Steps 4-6 - Structure and Save**: **Extraction**: @@ -353,6 +432,9 @@ and line numbers. - [Next.js Error Handling](https://nextjs.org/docs/pages/building-your-application/routing/error-handling) ``` +> **Tentative path**: If Step 2.5 routes to tentative, skip Steps 3-6 and create a YAML note per +> `resources/instinct-template.yaml` with initial confidence 0.4. See Step 2.5 for the full flow. + ## Integration with Workflow ### Automatic Trigger Conditions diff --git a/WARP.md b/WARP.md index e6ec70c..6db5f68 100644 --- a/WARP.md +++ b/WARP.md @@ -4,12 +4,14 @@ This file provides guidance to WARP (warp.dev) when working with code in this re ## Project Overview -Claudeception is a **Claude Code skill** for continuous learning—it enables Claude Code to autonomously extract and preserve learned knowledge into reusable skills. It is not an application codebase but rather a skill definition with documentation and examples. +Claudeception is a **Claude Code skill** for continuous learning—it enables Claude Code to autonomously extract and preserve learned knowledge into reusable skills. It also captures emerging patterns as tentative YAML notes with confidence scoring, promoting them to full skills after repeated observations. It is not an application codebase but rather a skill definition with documentation and examples. ## Key Files - `SKILL.md` — The main skill definition (YAML frontmatter + instructions). This is what Claude Code loads. - `resources/skill-template.md` — Template for creating new skills +- `resources/instinct-template.yaml` — YAML template for tentative knowledge notes +- `resources/tentative-knowledge.md` — Detailed rules for confidence scoring, promotion, and expiry - `examples/` — Sample extracted skills demonstrating proper format ## Skill File Format diff --git a/resources/instinct-template.yaml b/resources/instinct-template.yaml new file mode 100644 index 0000000..49e98f7 --- /dev/null +++ b/resources/instinct-template.yaml @@ -0,0 +1,27 @@ +# TEMPLATE — do not place in memory/tentative/ +# Tentative Knowledge Note Template for Claudeception +# Storage: ~/.claude/projects//memory/tentative/.yaml +# Created by claudeception when knowledge doesn't yet meet all 4 Quality Criteria + +name: example-pattern-name # kebab-case, descriptive; used as filename +trigger: | + When [specific condition/symptom/error message] +action: | + Then [what to do / what the pattern suggests] +confidence: 0.4 # float [0.1, 0.95]; initial = 0.4 +observations: # each real observation event (NOT user confirmations) + - date: "YYYY-MM-DD" # ISO 8601 + summary: "Observed X while doing Y" + session: "brief session context" +first_seen: "YYYY-MM-DD" # ISO 8601; set on creation +last_seen: "YYYY-MM-DD" # ISO 8601; updated on each observation + +tags: # categorization for retrieval + - domain-tag # e.g., "r-tidyverse", "git", "windows" + - context-tag # e.g., "debugging", "configuration" + +# --- Optional fields --- +source: "session" # how created: session | manual | retrospective +related_skills: [] # links to existing skills if partially overlapping +counter_examples: [] # observations contradicting this pattern (triggers -0.20) +promotion_declined: [] # ISO 8601 dates when user declined promotion; >= 2 entries → auto-skip diff --git a/resources/research-references.md b/resources/research-references.md index 175b216..cd92763 100644 --- a/resources/research-references.md +++ b/resources/research-references.md @@ -26,8 +26,8 @@ This document compiles the academic research that informed the design of Claudec ### CASCADE: Cumulative Agentic Skill Creation through Autonomous Development and Evolution -**Authors**: [Research Team] -**Published**: December 2024 +**Authors**: Huang, Chen, Fei, Li, Schwaller, Ceder +**Published**: December 2025 **URL**: https://arxiv.org/abs/2512.23880 **Key Contribution**: Self-evolving agentic framework demonstrating the transition from "LLM + tool use" to "LLM + skill acquisition." @@ -80,8 +80,9 @@ This document compiles the academic research that informed the design of Claudec ### EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines -**Authors**: [Research Team] -**Published**: 2024 +**Authors**: Zhang, Yuan, Guo, Yu, Xu, Chen, Li, Yang, Guan, Tang, Hu, Zhang, Chen, Wang +**Published**: January 2026 +**URL**: https://arxiv.org/abs/2601.09465 **Key Contribution**: Self-evolving framework with experience pools for continuous learning. diff --git a/resources/tentative-knowledge.md b/resources/tentative-knowledge.md new file mode 100644 index 0000000..b48c012 --- /dev/null +++ b/resources/tentative-knowledge.md @@ -0,0 +1,112 @@ +# Tentative Knowledge — Detailed Rules + +Reference document for the Claudeception Tentative Knowledge Layer. +SKILL.md contains summaries; this file has the complete rules and edge cases. + +## Confidence Arithmetic + +### Initial Assignment + +- **Single observation, unverified**: 0.4 +- **Single observation + user verbal confirmation in same session**: 0.4 + 0.30 = 0.70 + (but promotion still blocked — see Promotion Protocol below) + +### Adjustment Events + +| Event | Delta | Example | +|-------|-------|---------| +| Re-observed in same context | +0.15 | Same project, same problem domain, same tool | +| Observed in different context | +0.20 | Different project OR different problem domain | +| User explicit confirmation | +0.30 | User says "yes that's a real pattern" | +| Counter-example observed | −0.20 | Pattern fails or contradicts in a new scenario | + +### Clamping + +After every adjustment: `confidence = clamp(confidence + delta, 0.1, 0.95)` + +**Simultaneous events in one session**: Apply deltas sequentially in observation order, +clamping after each. Example: observe in new context (+0.20) then user confirms (+0.30): +0.4 → 0.60 → 0.90. When all deltas are positive (or all negative), order does not affect +the final result. However, if a counter-example (−0.20) is mixed with positive deltas in +the same session, order CAN matter due to floor-clamping — apply counter-example deltas +last to avoid premature floor-clamping that inflates the result. + +### Context Differentiation + +- **Same context**: same project (by working directory) + same problem domain + same tool +- **Different context**: different project OR different problem domain +- Ambiguous cases: use LLM judgment; when uncertain, treat as same context (+0.15, conservative) + +## Promotion Protocol + +### Eligibility + +Both conditions must be met: +1. `confidence >= 0.7` +2. `observations.length >= 2`, from **>= 2 distinct sessions or dates** + +**Critical rule**: User explicit confirmation (+0.30) adjusts confidence only. +It does **NOT** add an entry to the `observations` list. This prevents a single-observation +note from being promoted via confirmation alone (the "fast-track loophole"). + +### Promotion Steps (during Retrospective Mode) + +1. Scan `memory/tentative/*.yaml` for eligible notes +2. Display each candidate: name, trigger, action, confidence, observation count +3. Ask user: "This pattern has been observed N times (confidence C). Promote to full skill?" +4. **If confirmed**: run Extraction Process Steps 3-6, pre-filling from the note: + - `trigger` → Context / Trigger Conditions section + - `action` → Solution section (starting point) + - `observations` → Example section (pick most illustrative) + - `tags` → description keywords +5. **If declined**: keep as tentative; append current date to `promotion_declined` list in the YAML file +6. After successful promotion: delete the YAML file + +### Promotion Declined Twice + +If the same note's `promotion_declined` list contains >= 2 entries from distinct sessions, +stop suggesting it automatically during Retrospective Mode scans. The note remains in +`memory/tentative/` and can still be manually promoted if the user explicitly requests it +(e.g., "promote [note-name] to skill"), which bypasses the auto-skip filter. + +### Stale-but-Promotable Precedence + +If a note simultaneously meets promotion thresholds AND 90-day stale criteria, +**promotion takes precedence**. Rationale: the note has enough evidence to be useful; +staleness just means the pattern hasn't recurred recently, not that it's invalid. + +## Expiry Rules + +| Condition | Action | Rationale | +|-----------|--------|-----------| +| 90 days since `last_seen`, no new observation | Mark stale; prompt user for cleanup | Knowledge may be outdated | +| 180 days since `last_seen` | Auto-delete the YAML file | Definitely stale; reduce clutter | +| `confidence < 0.3` AND 60 days since `last_seen` | Early delete | Low confidence + idle = noise | + +"Days since last_seen" is calculated from `last_seen` field in the YAML, not from file modification time. + +## Deduplication Strategy + +Tentative notes use `{name}.yaml` as filename (e.g., `memory/tentative/git-rebase-conflict-resolution.yaml`). + +**Matching priority when creating a new note**: +1. **Filename exact match**: if `memory/tentative/{name}.yaml` already exists, update it +2. **Trigger semantic similarity**: read existing YAML files, compare trigger descriptions + using LLM judgment (not a programmatic similarity metric). If a semantically similar + note exists, update that note instead of creating a duplicate +3. **No match**: create new file + +## Edge Cases + +- **Note contradicts existing skill**: Add counter-example to the note; do NOT modify the skill. + Flag for review during next Retrospective Mode. +- **Two notes should merge**: During Retrospective, if two notes have overlapping triggers, + merge into one: combine observations lists, average confidences, keep the more descriptive trigger. +- **Namespace collision**: If two different patterns naturally produce the same kebab-case name, + append a numeric suffix (e.g., `pattern-name-2.yaml`). + +## Future: Cross-Project Aggregation + +**Not yet implemented.** Planned feature: when the same pattern appears in `memory/tentative/` +across 2+ distinct projects with average confidence >= 0.8, suggest global promotion to a +user-level skill (installed in `~/.claude/skills/`). Design details pending future development.