|
| 1 | +--- |
| 2 | +name: verify |
| 3 | +description: Enforce evidence-based completion claims — require fresh command output before reporting success. Use when completing any task, fixing a bug, finishing a phase, running tests, building, deploying, or making any "it works" claim. |
| 4 | +--- |
| 5 | + |
| 6 | +# Verify |
| 7 | + |
| 8 | +Prove it works before saying it works. |
| 9 | + |
| 10 | +## Hard Rules |
| 11 | + |
| 12 | +- Do not claim completion without fresh terminal evidence from this session. |
| 13 | +- Forbidden words in completion claims: "should", "probably", "seems to", "likely", "I believe", "I think it works". These signal unverified assertions. |
| 14 | +- Cached, remembered, or previous-session output is not evidence. Run it again. |
| 15 | + |
| 16 | +## Gate Function |
| 17 | + |
| 18 | +Every completion claim must pass all 5 steps in order: |
| 19 | + |
| 20 | +1. **Identify** — What command proves this claim? If multiple commands are needed, run the gate once per command. |
| 21 | +2. **Run** — Execute the full command now. No partial runs, no skipping. |
| 22 | +3. **Read** — Read complete output. Check exit code. Count pass/fail. |
| 23 | +4. **Confirm** — Does the output prove the exact claim? |
| 24 | +5. **Report** — State the result, cite command, exit code, and key output. |
| 25 | + |
| 26 | +If any step fails, stop. Fix the issue and restart from step 1. |
| 27 | + |
| 28 | +If no verification command exists (e.g., no test suite), tell the user and ask them how to verify before claiming done. |
| 29 | + |
| 30 | +## Verification Patterns |
| 31 | + |
| 32 | +| Claim | Required Evidence | Not Sufficient | |
| 33 | +|---|---|---| |
| 34 | +| Tests pass | Test output: 0 failures, exit 0 | Previous run, "should pass now" | |
| 35 | +| Build succeeds | Build output: exit 0 | Linter passing, partial build | |
| 36 | +| Bug is fixed | Reproduce symptom → now passes | "Changed code, should be fixed" | |
| 37 | +| Linter clean | Linter output: 0 errors | Single file check | |
| 38 | +| Phase complete | Each criterion verified individually | "Tests pass, so done" | |
| 39 | +| Feature works | E2E test or manual walkthrough | Unit tests alone | |
| 40 | + |
| 41 | +## Regression Verification |
| 42 | + |
| 43 | +For bug fixes, a single pass is not enough: |
| 44 | + |
| 45 | +1. Write a test covering the bug. |
| 46 | +2. Run → **must pass** (fix in place). |
| 47 | +3. Revert the fix. |
| 48 | +4. Run → **must fail** (proves test catches the bug). |
| 49 | +5. Restore the fix. |
| 50 | +6. Run → **must pass**. |
| 51 | + |
| 52 | +If step 4 passes, the test is wrong. Rewrite it. |
| 53 | + |
| 54 | +## Red Flags and Rationalizations |
| 55 | + |
| 56 | +| Rationalization | Why It's Wrong | Do Instead | |
| 57 | +|---|---|---| |
| 58 | +| "This change is trivial" | Trivial changes break things constantly | Run the check | |
| 59 | +| "I ran it earlier" | Code changed since then | Run it again now | |
| 60 | +| "The test is flaky" | Flaky ≠ ignorable | Fix the flake first | |
| 61 | +| "It compiles, so it works" | Compilation ≠ correctness | Run the tests | |
| 62 | +| "The CI will catch it" | CI is a safety net, not a substitute | Verify locally first | |
| 63 | +| "The agent said it's done" | Agent claims need verification too | Check diff and run tests | |
| 64 | + |
| 65 | +## Memory Integration |
| 66 | + |
| 67 | +After a failed verification, store the failure pattern via `$memory` with tags `verify,failure-pattern` so future sessions can avoid the same mistake. |
0 commit comments