Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,21 @@ Save what we just learned as a skill

Not every task produces a skill. It only extracts knowledge that required actual discovery (not just reading docs), will help with future tasks, has clear trigger conditions, and has been verified to work.

### Tentative Knowledge

Not all patterns are ready for full skill extraction. When a discovery meets some but not all quality criteria (e.g., observed once but not yet verified across contexts), Claudeception saves it as a **tentative note** — a lightweight YAML file in `memory/tentative/` with a confidence score.

- **Confidence scoring**: starts at 0.4, increases with repeated observations (+0.15 same context, +0.20 different context) and user confirmation (+0.30), decreases with counter-examples (−0.20)
- **Automatic promotion**: when confidence reaches 0.7 with 2+ observations from distinct sessions, the note is suggested for promotion to a full skill
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section summarizes confidence/promotion, but it doesn’t mention the anti-gaming rule that user confirmations adjust confidence but do not count as observations. Since confirmation affects promotion eligibility in the rest of the docs, it’d help to add a short note here so readers don’t assume “1 observation + confirmation” satisfies the “2+ observations” requirement.

Suggested change
- **Automatic promotion**: when confidence reaches 0.7 with 2+ observations from distinct sessions, the note is suggested for promotion to a full skill
- **Automatic promotion**: when confidence reaches 0.7 with 2+ observations from distinct sessions, the note is suggested for promotion to a full skill; user confirmations can raise confidence but do **not** count as observations toward this requirement

Copilot uses AI. Check for mistakes.
- **Expiry**: notes that go 180 days without new observations are auto-deleted; low-confidence notes expire sooner

See `resources/instinct-template.yaml` for the YAML schema and `resources/tentative-knowledge.md` for detailed rules.

## Research

The idea comes from academic work on skill libraries for AI agents.

[Voyager](https://arxiv.org/abs/2305.16291) (Wang et al., 2023) showed that game-playing agents can build up libraries of reusable skills over time, and that this helps them avoid re-learning things they already figured out. [CASCADE](https://arxiv.org/abs/2512.23880) (2024) introduced "meta-skills" (skills for acquiring skills), which is what this is. [SEAgent](https://arxiv.org/abs/2508.04700) (2025) showed agents can learn new software environments through trial and error, which inspired the retrospective feature. [Reflexion](https://arxiv.org/abs/2303.11366) (Shinn et al., 2023) showed that self-reflection helps.
[Voyager](https://arxiv.org/abs/2305.16291) (Wang et al., 2023) showed that game-playing agents can build up libraries of reusable skills over time, and that this helps them avoid re-learning things they already figured out. [CASCADE](https://arxiv.org/abs/2512.23880) (2025) introduced "meta-skills" (skills for acquiring skills), which is what this is. [SEAgent](https://arxiv.org/abs/2508.04700) (2025) showed agents can learn new software environments through trial and error, which inspired the retrospective feature. [Reflexion](https://arxiv.org/abs/2303.11366) (Shinn et al., 2023) showed that self-reflection helps.

Agents that persist what they learn do better than agents that start fresh.

Expand Down
102 changes: 92 additions & 10 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@ description: |
Triggers: (1) /claudeception command to review session learnings, (2) "save this as a skill"
or "extract a skill from this", (3) "what did we learn?", (4) After any task involving
non-obvious debugging, workarounds, or trial-and-error discovery. Creates new Claude Code
skills when valuable, reusable knowledge is identified.
skills when valuable, reusable knowledge is identified. Also captures tentative notes
for emerging patterns not yet ready for full skill extraction — lightweight YAML notes
with confidence scoring in memory/tentative/.
author: Claude Code
version: 3.0.0
version: 3.1.0
allowed-tools:
- Read
- Write
Expand Down Expand Up @@ -110,6 +112,29 @@ Analyze what was learned:
- What would someone need to know to solve this faster next time?
- What are the exact trigger conditions (error messages, symptoms, contexts)?

### Step 2.5: Triage — Full Skill vs Tentative Note

After identifying the knowledge, decide which extraction path to take:

| Criteria Check | Path |
|---------------|------|
| All 4 Quality Criteria met (Reusable + Non-trivial + Specific + Verified) | **Full skill** → continue to Step 3 |
| Specific is met, plus at least 1 other criterion has partial evidence | **Tentative note** → see below |
| Specific is met but no other criterion shows even partial evidence | **Discard** — too thin to be useful |
| Cannot describe a clear trigger + action (Specific not met) | **Discard** — not worth capturing |

"Partial evidence" means at least Non-trivial or Reusable shows initial signs (e.g., "this pattern
likely applies elsewhere but hasn't been verified across contexts"). A 3-of-4 case (e.g., Reusable +
Non-trivial + Specific but not Verified) takes the tentative path — missing any criterion disqualifies
from full skill extraction.
Comment on lines +122 to +129
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Step 2.5, the “Tentative note” row says “Specific is met, plus at least 1 other criterion has partial evidence”, but the paragraph below defines “partial evidence” specifically as early signs of Non-trivial or Reusable. To avoid mis-triage (e.g., treating “Specific + Verified” as tentative even if it’s trivial/non-reusable), consider tightening the table row wording to explicitly reference partial evidence for Non-trivial/Reusable (or clarify whether Verified can be the partial criterion).

Suggested change
| Specific is met, plus at least 1 other criterion has partial evidence | **Tentative note** → see below |
| Cannot describe a clear trigger + action (Specific not met) | **Discard** — not worth capturing |
"Partial evidence" means at least Non-trivial or Reusable shows initial signs (e.g., "this pattern
likely applies elsewhere but hasn't been verified across contexts"). A 3-of-4 case (e.g., Reusable +
Non-trivial + Specific but not Verified) takes the tentative path — missing any criterion disqualifies
from full skill extraction.
| Specific is met, plus Non-trivial or Reusable has partial evidence | **Tentative note** → see below |
| Cannot describe a clear trigger + action (Specific not met) | **Discard** — not worth capturing |
"Partial evidence" means early signs of Non-trivial or Reusable (e.g., "this pattern
likely applies elsewhere but hasn't been verified across contexts"). Verified alone does not qualify
a note for the tentative path. A 3-of-4 case (e.g., Reusable + Non-trivial + Specific but not Verified)
takes the tentative path — missing any criterion disqualifies from full skill extraction.

Copilot uses AI. Check for mistakes.

**Tentative note path** (skips Steps 3-6):
1. Ensure `memory/tentative/` directory exists (Write tool creates parent directories automatically)
2. Check for existing notes: match by filename `{name}.yaml`, fall back to LLM-assessed trigger similarity
3. If match found: update `confidence`, add observation entry, update `last_seen`
4. If new: create YAML note from `resources/instinct-template.yaml` with initial confidence 0.4
5. See `resources/tentative-knowledge.md` for detailed confidence rules and edge cases

### Step 3: Research Best Practices (When Appropriate)

Before creating the skill, search the web for current information when:
Expand All @@ -124,7 +149,7 @@ Before creating the skill, search the web for current information when:
**When to search:**
- The topic involves specific technologies, frameworks, or tools
- You're uncertain about current best practices
- The solution might have changed after January 2025 (knowledge cutoff)
- The solution might have changed after May 2025 (knowledge cutoff)
- There might be official documentation or community standards
- You want to verify your understanding is current

Expand Down Expand Up @@ -227,10 +252,15 @@ executable helpers.
When `/claudeception` is invoked at the end of a session:

1. **Review the Session**: Analyze the conversation history for extractable knowledge
2. **Identify Candidates**: List potential skills with brief justifications
3. **Prioritize**: Focus on the highest-value, most reusable knowledge
4. **Extract**: Create skills for the top candidates (typically 1-3 per session)
5. **Summarize**: Report what skills were created and why
2. **Scan Tentative Notes**: Check `memory/tentative/*.yaml` for:
- Existing notes that match observations from this session (update confidence)
- Notes meeting promotion threshold (confidence >= 0.7, observations >= 2 from distinct sessions)
- Notes declined for promotion twice in separate sessions (skip auto-suggest; see `resources/tentative-knowledge.md` § Promotion Declined Twice)
- Stale notes past expiry thresholds (flag for cleanup or auto-delete per expiry rules; promotion takes precedence over stale)
3. **Identify Candidates**: List potential skills (from session + promoted tentative notes)
4. **Prioritize**: Focus on the highest-value, most reusable knowledge
5. **Extract**: Create skills for the top candidates (typically 1-3 per session)
6. **Summarize**: Report what skills were created, tentative notes updated, and promotions suggested

## Self-Reflection Prompts

Expand Down Expand Up @@ -267,6 +297,8 @@ Before finalizing a skill, verify:
- [ ] Web research conducted when appropriate (for technology-specific topics)
- [ ] References section included if web sources were consulted
- [ ] Current best practices (post-2025) incorporated when relevant
- [ ] If knowledge doesn't meet all 4 Quality Criteria: considered tentative note path before discarding
- [ ] Tentative notes contain valid trigger condition and confidence in [0.1, 0.95]

## Anti-Patterns to Avoid

Expand All @@ -280,29 +312,76 @@ Before finalizing a skill, verify:

Skills should evolve:

0. **Tentative**: Lightweight YAML note with confidence scoring; may be promoted or expire
1. **Creation**: Initial extraction with documented verification
2. **Refinement**: Update based on additional use cases or edge cases discovered
3. **Deprecation**: Mark as deprecated when underlying tools/patterns change
4. **Archival**: Remove or archive skills that are no longer relevant

## Tentative Knowledge Management

Tentative notes capture emerging patterns that don't yet meet all 4 Quality Criteria. They live
in `memory/tentative/` as lightweight YAML files, accumulating confidence through repeated
observations until they are promoted to full skills or expire.

**Schema**: See `resources/instinct-template.yaml` for the YAML template with field documentation.

### Confidence Rules (Summary)

| Event | Delta |
|-------|-------|
| Initial observation | 0.4 (starting value) |
| Re-observed in same context | +0.15 |
| Observed in different context | +0.20 |
| User explicit confirmation | +0.30 (confidence only; does NOT count as an observation) |
| Counter-example observed | −0.20 |

Confidence is clamped to [0.1, 0.95] after each adjustment.

### Promotion

A tentative note is eligible for promotion when **both** conditions are met:
- `confidence >= 0.7`
- `observations >= 2` from **>= 2 distinct sessions or dates**

Comment on lines +343 to +346
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promotion eligibility currently allows observations from “>= 2 distinct sessions or dates”. The PR description/test plan emphasizes distinct sessions (not just different dates), and using dates can be gamed (or accidentally satisfied by crossing midnight) without truly distinct sessions. Consider requiring distinct session identifiers/contexts, with “date” only as a fallback when session data is unavailable (and document that fallback explicitly).

Copilot uses AI. Check for mistakes.
During Retrospective Mode, eligible notes are presented for user confirmation. If confirmed,
the note's content pre-fills Steps 3-6 to create a full skill; the YAML file is then deleted.

### Expiry

| Condition | Action |
|-----------|--------|
| 90 days since `last_seen`, no new observation | Mark stale; prompt for cleanup |
| 180 days since `last_seen` | Auto-delete |
| `confidence < 0.3` AND 60 days since `last_seen` | Early delete |

If a note simultaneously meets promotion thresholds and stale criteria, **promotion takes precedence**.

### Detailed Rules

See `resources/tentative-knowledge.md` for complete rules on confidence arithmetic edge cases,
promotion protocol, expiry details, deduplication strategy, and cross-project aggregation placeholder.

## Example: Complete Extraction Flow

**Scenario**: While debugging a Next.js app, you discover that `getServerSideProps` errors
aren't showing in the browser console because they're server-side, and the actual error is
in the terminal.

**Step 1 - Identify the Knowledge**:
**Step 1 - Check for Existing Skills**: No matching skills found.

**Step 2 - Identify the Knowledge**:
- Problem: Server-side errors don't appear in browser console
- Non-obvious aspect: Expected behavior for server-side code in Next.js
- Trigger: Generic error page with empty browser console

**Step 2 - Research Best Practices**:
**Step 3 - Research Best Practices**:
Search: "Next.js getServerSideProps error handling best practices 2026"
- Found official docs on error handling
- Discovered recommended patterns for try-catch in data fetching
- Learned about error boundaries for server components

**Step 3-5 - Structure and Save**:
**Steps 4-6 - Structure and Save**:

**Extraction**:

Expand Down Expand Up @@ -353,6 +432,9 @@ and line numbers.
- [Next.js Error Handling](https://nextjs.org/docs/pages/building-your-application/routing/error-handling)
```

> **Tentative path**: If Step 2.5 routes to tentative, skip Steps 3-6 and create a YAML note per
> `resources/instinct-template.yaml` with initial confidence 0.4. See Step 2.5 for the full flow.

## Integration with Workflow

### Automatic Trigger Conditions
Expand Down
4 changes: 3 additions & 1 deletion WARP.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,14 @@ This file provides guidance to WARP (warp.dev) when working with code in this re

## Project Overview

Claudeception is a **Claude Code skill** for continuous learning—it enables Claude Code to autonomously extract and preserve learned knowledge into reusable skills. It is not an application codebase but rather a skill definition with documentation and examples.
Claudeception is a **Claude Code skill** for continuous learning—it enables Claude Code to autonomously extract and preserve learned knowledge into reusable skills. It also captures emerging patterns as tentative YAML notes with confidence scoring, promoting them to full skills after repeated observations. It is not an application codebase but rather a skill definition with documentation and examples.

## Key Files

- `SKILL.md` — The main skill definition (YAML frontmatter + instructions). This is what Claude Code loads.
- `resources/skill-template.md` — Template for creating new skills
- `resources/instinct-template.yaml` — YAML template for tentative knowledge notes
- `resources/tentative-knowledge.md` — Detailed rules for confidence scoring, promotion, and expiry
- `examples/` — Sample extracted skills demonstrating proper format

## Skill File Format
Expand Down
27 changes: 27 additions & 0 deletions resources/instinct-template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# TEMPLATE — do not place in memory/tentative/
# Tentative Knowledge Note Template for Claudeception
# Storage: ~/.claude/projects/<project>/memory/tentative/<name>.yaml
# Created by claudeception when knowledge doesn't yet meet all 4 Quality Criteria
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment uses lowercase “claudeception” (“Created by claudeception…”), while the project/skill name is consistently “Claudeception” elsewhere in the docs. Consider capitalizing it here for consistency and to avoid confusion when users grep for the name.

Suggested change
# Created by claudeception when knowledge doesn't yet meet all 4 Quality Criteria
# Created by Claudeception when knowledge doesn't yet meet all 4 Quality Criteria

Copilot uses AI. Check for mistakes.

name: example-pattern-name # kebab-case, descriptive; used as filename
trigger: |
When [specific condition/symptom/error message]
action: |
Then [what to do / what the pattern suggests]
confidence: 0.4 # float [0.1, 0.95]; initial = 0.4
observations: # each real observation event (NOT user confirmations)
- date: "YYYY-MM-DD" # ISO 8601
summary: "Observed X while doing Y"
session: "brief session context"
first_seen: "YYYY-MM-DD" # ISO 8601; set on creation
last_seen: "YYYY-MM-DD" # ISO 8601; updated on each observation

tags: # categorization for retrieval
- domain-tag # e.g., "r-tidyverse", "git", "windows"
- context-tag # e.g., "debugging", "configuration"

# --- Optional fields ---
source: "session" # how created: session | manual | retrospective
related_skills: [] # links to existing skills if partially overlapping
counter_examples: [] # observations contradicting this pattern (triggers -0.20)
promotion_declined: [] # ISO 8601 dates when user declined promotion; >= 2 entries → auto-skip
9 changes: 5 additions & 4 deletions resources/research-references.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ This document compiles the academic research that informed the design of Claudec

### CASCADE: Cumulative Agentic Skill Creation through Autonomous Development and Evolution

**Authors**: [Research Team]
**Published**: December 2024
**Authors**: Huang, Chen, Fei, Li, Schwaller, Ceder
**Published**: December 2025
**URL**: https://arxiv.org/abs/2512.23880

**Key Contribution**: Self-evolving agentic framework demonstrating the transition from "LLM + tool use" to "LLM + skill acquisition."
Expand Down Expand Up @@ -80,8 +80,9 @@ This document compiles the academic research that informed the design of Claudec

### EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

**Authors**: [Research Team]
**Published**: 2024
**Authors**: Zhang, Yuan, Guo, Yu, Xu, Chen, Li, Yang, Guan, Tang, Hu, Zhang, Chen, Wang
**Published**: January 2026
**URL**: https://arxiv.org/abs/2601.09465

**Key Contribution**: Self-evolving framework with experience pools for continuous learning.

Expand Down
Loading