Skip to content

Commit ac5aa83

Browse files
committed
feat(skill): add verify skill
1 parent f714964 commit ac5aa83

2 files changed

Lines changed: 71 additions & 0 deletions

File tree

skills/verify/SKILL.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
name: verify
3+
description: Enforce evidence-based completion claims — require fresh command output before reporting success. Use when completing any task, fixing a bug, finishing a phase, running tests, building, deploying, or making any "it works" claim.
4+
---
5+
6+
# Verify
7+
8+
Prove it works before saying it works.
9+
10+
## Hard Rules
11+
12+
- Do not claim completion without fresh terminal evidence from this session.
13+
- Forbidden words in completion claims: "should", "probably", "seems to", "likely", "I believe", "I think it works". These signal unverified assertions.
14+
- Cached, remembered, or previous-session output is not evidence. Run it again.
15+
16+
## Gate Function
17+
18+
Every completion claim must pass all 5 steps in order:
19+
20+
1. **Identify** — What command proves this claim? If multiple commands are needed, run the gate once per command.
21+
2. **Run** — Execute the full command now. No partial runs, no skipping.
22+
3. **Read** — Read complete output. Check exit code. Count pass/fail.
23+
4. **Confirm** — Does the output prove the exact claim?
24+
5. **Report** — State the result, cite command, exit code, and key output.
25+
26+
If any step fails, stop. Fix the issue and restart from step 1.
27+
28+
If no verification command exists (e.g., no test suite), tell the user and ask them how to verify before claiming done.
29+
30+
## Verification Patterns
31+
32+
| Claim | Required Evidence | Not Sufficient |
33+
|---|---|---|
34+
| Tests pass | Test output: 0 failures, exit 0 | Previous run, "should pass now" |
35+
| Build succeeds | Build output: exit 0 | Linter passing, partial build |
36+
| Bug is fixed | Reproduce symptom → now passes | "Changed code, should be fixed" |
37+
| Linter clean | Linter output: 0 errors | Single file check |
38+
| Phase complete | Each criterion verified individually | "Tests pass, so done" |
39+
| Feature works | E2E test or manual walkthrough | Unit tests alone |
40+
41+
## Regression Verification
42+
43+
For bug fixes, a single pass is not enough:
44+
45+
1. Write a test covering the bug.
46+
2. Run → **must pass** (fix in place).
47+
3. Revert the fix.
48+
4. Run → **must fail** (proves test catches the bug).
49+
5. Restore the fix.
50+
6. Run → **must pass**.
51+
52+
If step 4 passes, the test is wrong. Rewrite it.
53+
54+
## Red Flags and Rationalizations
55+
56+
| Rationalization | Why It's Wrong | Do Instead |
57+
|---|---|---|
58+
| "This change is trivial" | Trivial changes break things constantly | Run the check |
59+
| "I ran it earlier" | Code changed since then | Run it again now |
60+
| "The test is flaky" | Flaky ≠ ignorable | Fix the flake first |
61+
| "It compiles, so it works" | Compilation ≠ correctness | Run the tests |
62+
| "The CI will catch it" | CI is a safety net, not a substitute | Verify locally first |
63+
| "The agent said it's done" | Agent claims need verification too | Check diff and run tests |
64+
65+
## Memory Integration
66+
67+
After a failed verification, store the failure pattern via `$memory` with tags `verify,failure-pattern` so future sessions can avoid the same mistake.

skills/verify/agents/openai.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
interface:
2+
display_name: "Verify"
3+
short_description: "Enforce evidence-based completion claims before reporting success"
4+
default_prompt: "Use $verify before claiming any task is done — run the proof command fresh, read the full output, and only report the result with evidence."

0 commit comments

Comments
 (0)