blader · peiyuan-ran-huang · Apr 5, 2026 · Apr 5, 2026 · Apr 5, 2026 · Apr 5, 2026
diff --git a/README.md b/README.md
@@ -115,11 +115,21 @@ Save what we just learned as a skill
 
 Not every task produces a skill. It only extracts knowledge that required actual discovery (not just reading docs), will help with future tasks, has clear trigger conditions, and has been verified to work.
 
+### Tentative Knowledge
+
+Not all patterns are ready for full skill extraction. When a discovery meets some but not all quality criteria (e.g., observed once but not yet verified across contexts), Claudeception saves it as a **tentative note** — a lightweight YAML file in `memory/tentative/` with a confidence score.
+
+- **Confidence scoring**: starts at 0.4, increases with repeated observations (+0.15 same context, +0.20 different context) and user confirmation (+0.30), decreases with counter-examples (−0.20)
+- **Automatic promotion**: when confidence reaches 0.7 with 2+ observations from distinct sessions, the note is suggested for promotion to a full skill
- **Automatic promotion**: when confidence reaches 0.7 with 2+ observations from distinct sessions, the note is suggested for promotion to a full skill
+- **Automatic promotion**: when confidence reaches 0.7 with 2+ observations from distinct sessions, the note is suggested for promotion to a full skill; user confirmations can raise confidence but do **not** count as observations toward this requirement
- **Automatic promotion**: when confidence reaches 0.7 with 2+ observations from distinct sessions, the note is suggested for promotion to a full skill
+- **Automatic promotion**: when confidence reaches 0.7 with 2+ observations from distinct sessions, the note is suggested for promotion to a full skill; user confirmations can raise confidence but do **not** count as observations toward this requirement
+- **Expiry**: notes that go 180 days without new observations are auto-deleted; low-confidence notes expire sooner
+
+See `resources/instinct-template.yaml` for the YAML schema and `resources/tentative-knowledge.md` for detailed rules.
+
 ## Research
 
 The idea comes from academic work on skill libraries for AI agents.
 
-[Voyager](https://arxiv.org/abs/2305.16291) (Wang et al., 2023) showed that game-playing agents can build up libraries of reusable skills over time, and that this helps them avoid re-learning things they already figured out. [CASCADE](https://arxiv.org/abs/2512.23880) (2024) introduced "meta-skills" (skills for acquiring skills), which is what this is. [SEAgent](https://arxiv.org/abs/2508.04700) (2025) showed agents can learn new software environments through trial and error, which inspired the retrospective feature. [Reflexion](https://arxiv.org/abs/2303.11366) (Shinn et al., 2023) showed that self-reflection helps.
+[Voyager](https://arxiv.org/abs/2305.16291) (Wang et al., 2023) showed that game-playing agents can build up libraries of reusable skills over time, and that this helps them avoid re-learning things they already figured out. [CASCADE](https://arxiv.org/abs/2512.23880) (2025) introduced "meta-skills" (skills for acquiring skills), which is what this is. [SEAgent](https://arxiv.org/abs/2508.04700) (2025) showed agents can learn new software environments through trial and error, which inspired the retrospective feature. [Reflexion](https://arxiv.org/abs/2303.11366) (Shinn et al., 2023) showed that self-reflection helps.
 
 Agents that persist what they learn do better than agents that start fresh.
 

diff --git a/SKILL.md b/SKILL.md
@@ -5,9 +5,11 @@ description: |
   Triggers: (1) /claudeception command to review session learnings, (2) "save this as a skill"
   or "extract a skill from this", (3) "what did we learn?", (4) After any task involving
   non-obvious debugging, workarounds, or trial-and-error discovery. Creates new Claude Code
-  skills when valuable, reusable knowledge is identified.
+  skills when valuable, reusable knowledge is identified. Also captures tentative notes
+  for emerging patterns not yet ready for full skill extraction — lightweight YAML notes
+  with confidence scoring in memory/tentative/.
 author: Claude Code
-version: 3.0.0
+version: 3.1.0
 allowed-tools:
   - Read
   - Write
@@ -110,6 +112,29 @@ Analyze what was learned:
 - What would someone need to know to solve this faster next time?
 - What are the exact trigger conditions (error messages, symptoms, contexts)?
 
+### Step 2.5: Triage — Full Skill vs Tentative Note
+
+After identifying the knowledge, decide which extraction path to take:
+
+| Criteria Check | Path |
+|---------------|------|
+| All 4 Quality Criteria met (Reusable + Non-trivial + Specific + Verified) | **Full skill** → continue to Step 3 |
+| Specific is met, plus at least 1 other criterion has partial evidence | **Tentative note** → see below |
+| Specific is met but no other criterion shows even partial evidence | **Discard** — too thin to be useful |
+| Cannot describe a clear trigger + action (Specific not met) | **Discard** — not worth capturing |
+
+"Partial evidence" means at least Non-trivial or Reusable shows initial signs (e.g., "this pattern
+likely applies elsewhere but hasn't been verified across contexts"). A 3-of-4 case (e.g., Reusable +
+Non-trivial + Specific but not Verified) takes the tentative path — missing any criterion disqualifies
+from full skill extraction.
-| Specific is met, plus at least 1 other criterion has partial evidence | **Tentative note** → see below |
-| Cannot describe a clear trigger + action (Specific not met) | **Discard** — not worth capturing |
-
-"Partial evidence" means at least Non-trivial or Reusable shows initial signs (e.g., "this pattern
-likely applies elsewhere but hasn't been verified across contexts"). A 3-of-4 case (e.g., Reusable +
-Non-trivial + Specific but not Verified) takes the tentative path — missing any criterion disqualifies
-from full skill extraction.
+| Specific is met, plus Non-trivial or Reusable has partial evidence | **Tentative note** → see below |
+| Cannot describe a clear trigger + action (Specific not met) | **Discard** — not worth capturing |
+
+"Partial evidence" means early signs of Non-trivial or Reusable (e.g., "this pattern
+likely applies elsewhere but hasn't been verified across contexts"). Verified alone does not qualify
+a note for the tentative path. A 3-of-4 case (e.g., Reusable + Non-trivial + Specific but not Verified)
+takes the tentative path — missing any criterion disqualifies from full skill extraction.
-| Specific is met, plus at least 1 other criterion has partial evidence | **Tentative note** → see below |
-| Cannot describe a clear trigger + action (Specific not met) | **Discard** — not worth capturing |
-
-"Partial evidence" means at least Non-trivial or Reusable shows initial signs (e.g., "this pattern
-likely applies elsewhere but hasn't been verified across contexts"). A 3-of-4 case (e.g., Reusable +
-Non-trivial + Specific but not Verified) takes the tentative path — missing any criterion disqualifies
-from full skill extraction.
+| Specific is met, plus Non-trivial or Reusable has partial evidence | **Tentative note** → see below |
+| Cannot describe a clear trigger + action (Specific not met) | **Discard** — not worth capturing |
+
+"Partial evidence" means early signs of Non-trivial or Reusable (e.g., "this pattern
+likely applies elsewhere but hasn't been verified across contexts"). Verified alone does not qualify
+a note for the tentative path. A 3-of-4 case (e.g., Reusable + Non-trivial + Specific but not Verified)
+takes the tentative path — missing any criterion disqualifies from full skill extraction.
+
+**Tentative note path** (skips Steps 3-6):
+1. Ensure `memory/tentative/` directory exists (Write tool creates parent directories automatically)
+2. Check for existing notes: match by filename `{name}.yaml`, fall back to LLM-assessed trigger similarity
+3. If match found: update `confidence`, add observation entry, update `last_seen`
+4. If new: create YAML note from `resources/instinct-template.yaml` with initial confidence 0.4
+5. See `resources/tentative-knowledge.md` for detailed confidence rules and edge cases
+
 ### Step 3: Research Best Practices (When Appropriate)
 
 Before creating the skill, search the web for current information when:
@@ -124,7 +149,7 @@ Before creating the skill, search the web for current information when:
 **When to search:**
 - The topic involves specific technologies, frameworks, or tools
 - You're uncertain about current best practices
-- The solution might have changed after January 2025 (knowledge cutoff)
+- The solution might have changed after May 2025 (knowledge cutoff)
 - There might be official documentation or community standards
 - You want to verify your understanding is current
 
@@ -227,10 +252,15 @@ executable helpers.
 When `/claudeception` is invoked at the end of a session:
 
 1. **Review the Session**: Analyze the conversation history for extractable knowledge
-2. **Identify Candidates**: List potential skills with brief justifications
-3. **Prioritize**: Focus on the highest-value, most reusable knowledge
-4. **Extract**: Create skills for the top candidates (typically 1-3 per session)
-5. **Summarize**: Report what skills were created and why
+2. **Scan Tentative Notes**: Check `memory/tentative/*.yaml` for:
+   - Existing notes that match observations from this session (update confidence)
+   - Notes meeting promotion threshold (confidence >= 0.7, observations >= 2 from distinct sessions)
+   - Notes declined for promotion twice in separate sessions (skip auto-suggest; see `resources/tentative-knowledge.md` § Promotion Declined Twice)
+   - Stale notes past expiry thresholds (flag for cleanup or auto-delete per expiry rules; promotion takes precedence over stale)
+3. **Identify Candidates**: List potential skills (from session + promoted tentative notes)
+4. **Prioritize**: Focus on the highest-value, most reusable knowledge
+5. **Extract**: Create skills for the top candidates (typically 1-3 per session)
+6. **Summarize**: Report what skills were created, tentative notes updated, and promotions suggested
 
 ## Self-Reflection Prompts
 
@@ -267,6 +297,8 @@ Before finalizing a skill, verify:
 - [ ] Web research conducted when appropriate (for technology-specific topics)
 - [ ] References section included if web sources were consulted
 - [ ] Current best practices (post-2025) incorporated when relevant
+- [ ] If knowledge doesn't meet all 4 Quality Criteria: considered tentative note path before discarding
+- [ ] Tentative notes contain valid trigger condition and confidence in [0.1, 0.95]
 
 ## Anti-Patterns to Avoid
 
@@ -280,29 +312,76 @@ Before finalizing a skill, verify:
 
 Skills should evolve:
 
+0. **Tentative**: Lightweight YAML note with confidence scoring; may be promoted or expire
 1. **Creation**: Initial extraction with documented verification
 2. **Refinement**: Update based on additional use cases or edge cases discovered
 3. **Deprecation**: Mark as deprecated when underlying tools/patterns change
 4. **Archival**: Remove or archive skills that are no longer relevant
 
+## Tentative Knowledge Management
+
+Tentative notes capture emerging patterns that don't yet meet all 4 Quality Criteria. They live
+in `memory/tentative/` as lightweight YAML files, accumulating confidence through repeated
+observations until they are promoted to full skills or expire.
+
+**Schema**: See `resources/instinct-template.yaml` for the YAML template with field documentation.
+
+### Confidence Rules (Summary)
+
+| Event | Delta |
+|-------|-------|
+| Initial observation | 0.4 (starting value) |
+| Re-observed in same context | +0.15 |
+| Observed in different context | +0.20 |
+| User explicit confirmation | +0.30 (confidence only; does NOT count as an observation) |
+| Counter-example observed | −0.20 |
+
+Confidence is clamped to [0.1, 0.95] after each adjustment.
+
+### Promotion
+
+A tentative note is eligible for promotion when **both** conditions are met:
+- `confidence >= 0.7`
+- `observations >= 2` from **>= 2 distinct sessions or dates**
+
+During Retrospective Mode, eligible notes are presented for user confirmation. If confirmed,
+the note's content pre-fills Steps 3-6 to create a full skill; the YAML file is then deleted.
+
+### Expiry
+
+| Condition | Action |
+|-----------|--------|
+| 90 days since `last_seen`, no new observation | Mark stale; prompt for cleanup |
+| 180 days since `last_seen` | Auto-delete |
+| `confidence < 0.3` AND 60 days since `last_seen` | Early delete |
+
+If a note simultaneously meets promotion thresholds and stale criteria, **promotion takes precedence**.
+
+### Detailed Rules
+
+See `resources/tentative-knowledge.md` for complete rules on confidence arithmetic edge cases,
+promotion protocol, expiry details, deduplication strategy, and cross-project aggregation placeholder.
+
 ## Example: Complete Extraction Flow
 
 **Scenario**: While debugging a Next.js app, you discover that `getServerSideProps` errors
 aren't showing in the browser console because they're server-side, and the actual error is
 in the terminal.
 
-**Step 1 - Identify the Knowledge**:
+**Step 1 - Check for Existing Skills**: No matching skills found.
+
+**Step 2 - Identify the Knowledge**:
 - Problem: Server-side errors don't appear in browser console
 - Non-obvious aspect: Expected behavior for server-side code in Next.js
 - Trigger: Generic error page with empty browser console
 
-**Step 2 - Research Best Practices**:
+**Step 3 - Research Best Practices**:
 Search: "Next.js getServerSideProps error handling best practices 2026"
 - Found official docs on error handling
 - Discovered recommended patterns for try-catch in data fetching
 - Learned about error boundaries for server components
 
-**Step 3-5 - Structure and Save**:
+**Steps 4-6 - Structure and Save**:
 
 **Extraction**:
 
@@ -353,6 +432,9 @@ and line numbers.
 - [Next.js Error Handling](https://nextjs.org/docs/pages/building-your-application/routing/error-handling)
 ```
 
+> **Tentative path**: If Step 2.5 routes to tentative, skip Steps 3-6 and create a YAML note per
+> `resources/instinct-template.yaml` with initial confidence 0.4. See Step 2.5 for the full flow.
+
 ## Integration with Workflow
 
 ### Automatic Trigger Conditions

diff --git a/WARP.md b/WARP.md
@@ -4,12 +4,14 @@ This file provides guidance to WARP (warp.dev) when working with code in this re
 
 ## Project Overview
 
-Claudeception is a **Claude Code skill** for continuous learning—it enables Claude Code to autonomously extract and preserve learned knowledge into reusable skills. It is not an application codebase but rather a skill definition with documentation and examples.
+Claudeception is a **Claude Code skill** for continuous learning—it enables Claude Code to autonomously extract and preserve learned knowledge into reusable skills. It also captures emerging patterns as tentative YAML notes with confidence scoring, promoting them to full skills after repeated observations. It is not an application codebase but rather a skill definition with documentation and examples.
 
 ## Key Files
 
 - `SKILL.md` — The main skill definition (YAML frontmatter + instructions). This is what Claude Code loads.
 - `resources/skill-template.md` — Template for creating new skills
+- `resources/instinct-template.yaml` — YAML template for tentative knowledge notes
+- `resources/tentative-knowledge.md` — Detailed rules for confidence scoring, promotion, and expiry
 - `examples/` — Sample extracted skills demonstrating proper format
 
 ## Skill File Format

diff --git a/resources/instinct-template.yaml b/resources/instinct-template.yaml
@@ -0,0 +1,27 @@
+# TEMPLATE — do not place in memory/tentative/
+# Tentative Knowledge Note Template for Claudeception
+# Storage: ~/.claude/projects/<project>/memory/tentative/<name>.yaml
+# Created by claudeception when knowledge doesn't yet meet all 4 Quality Criteria
-# Created by claudeception when knowledge doesn't yet meet all 4 Quality Criteria
+# Created by Claudeception when knowledge doesn't yet meet all 4 Quality Criteria
-# Created by claudeception when knowledge doesn't yet meet all 4 Quality Criteria
+# Created by Claudeception when knowledge doesn't yet meet all 4 Quality Criteria
+
+name: example-pattern-name          # kebab-case, descriptive; used as filename
+trigger: |
+  When [specific condition/symptom/error message]
+action: |
+  Then [what to do / what the pattern suggests]
+confidence: 0.4                     # float [0.1, 0.95]; initial = 0.4
+observations:                       # each real observation event (NOT user confirmations)
+  - date: "YYYY-MM-DD"             # ISO 8601
+    summary: "Observed X while doing Y"
+    session: "brief session context"
+first_seen: "YYYY-MM-DD"           # ISO 8601; set on creation
+last_seen: "YYYY-MM-DD"            # ISO 8601; updated on each observation
+
+tags:                               # categorization for retrieval
+  - domain-tag                      # e.g., "r-tidyverse", "git", "windows"
+  - context-tag                     # e.g., "debugging", "configuration"
+
+# --- Optional fields ---
+source: "session"                   # how created: session | manual | retrospective
+related_skills: []                  # links to existing skills if partially overlapping
+counter_examples: []                # observations contradicting this pattern (triggers -0.20)
+promotion_declined: []              # ISO 8601 dates when user declined promotion; >= 2 entries → auto-skip
diff --git a/resources/research-references.md b/resources/research-references.md
@@ -26,8 +26,8 @@ This document compiles the academic research that informed the design of Claudec
 
 ### CASCADE: Cumulative Agentic Skill Creation through Autonomous Development and Evolution
 
-**Authors**: [Research Team]  
-**Published**: December 2024  
+**Authors**: Huang, Chen, Fei, Li, Schwaller, Ceder  
+**Published**: December 2025  
 **URL**: https://arxiv.org/abs/2512.23880
 
 **Key Contribution**: Self-evolving agentic framework demonstrating the transition from "LLM + tool use" to "LLM + skill acquisition."
@@ -80,8 +80,9 @@ This document compiles the academic research that informed the design of Claudec
 
 ### EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines
 
-**Authors**: [Research Team]  
-**Published**: 2024
+**Authors**: Zhang, Yuan, Guo, Yu, Xu, Chen, Li, Yang, Guan, Tang, Hu, Zhang, Chen, Wang  
+**Published**: January 2026  
+**URL**: https://arxiv.org/abs/2601.09465
 
 **Key Contribution**: Self-evolving framework with experience pools for continuous learning.