Skip to content

Commit 7c7feb6

Browse files
committed
update workflows
1 parent 4b17e80 commit 7c7feb6

2 files changed

Lines changed: 217 additions & 11 deletions

File tree

docs/lean-squad.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@ graph LR
2525
A --> T4[Task 4: Implementation Extraction]
2626
A --> T5[Task 5: Proof Assistance]
2727
A --> T6[Task 6: Maintain Open Lean Squad PRs]
28-
T1 & T2 & T3 & T4 & T5 & T6 --> T7[Task 7: Update FV Status Issue]
28+
A --> T10[Task 10: Project Report]
29+
T1 & T2 & T3 & T4 & T5 & T6 & T10 --> T7[Task 7: Update FV Status Issue]
2930
T7 --> M[Save repo-memory]
3031
````
3132

@@ -37,7 +38,7 @@ The weighting scheme adapts automatically: when no FV work exists Task 1 dominat
3738

3839
Default weighting: dominates when no FV work exists yet.
3940

40-
Surveys the codebase to identify 3–5 functions, data structures, or algorithms that are strong formal verification candidates. For each target documents: expected benefit, rough specification size, proof tractability (`decide` / routine tactics / deep proof engineering), approximations needed, and recommended approach (model checking, inductive invariant, equational proof). Consults Lean 4 / Mathlib documentation and FV literature. Produces `formal-verification/RESEARCH.md` and `formal-verification/TARGETS.md` as a PR, and optionally a tracking issue inviting maintainer input on priorities.
41+
Surveys the codebase to identify 3–5 functions, data structures, or algorithms that are strong formal verification candidates. If prior FV work exists, reads the latest `formal-verification/CRITIQUE.md` to incorporate feedback — adjusting target priorities, revising approaches, and addressing high-value gaps flagged by the critique. For each target documents: expected benefit, rough specification size, proof tractability (`decide` / routine tactics / deep proof engineering), approximations needed, and recommended approach (model checking, inductive invariant, equational proof). Consults Lean 4 / Mathlib documentation and FV literature. Produces `formal-verification/RESEARCH.md` and `formal-verification/TARGETS.md` as a PR, and optionally a tracking issue inviting maintainer input on priorities.
4142

4243
### Task 2: Informal Spec Extraction
4344

@@ -73,6 +74,12 @@ Reviews open `[Lean Squad]` PRs, fixes CI failures (Lean syntax errors, `lake bu
7374

7475
Maintains a single `[Lean Squad] Formal Verification Status` issue as a continuously-updated dashboard with an at-a-glance table (one row per target, showing current phase and status), summary narrative, findings section (bugs found, counterexamples), approach notes, and a prepended run history entry for every run.
7576

77+
### Task 10: Project Report
78+
79+
Default weighting: important once proofs exist; available once Lean specs exist.
80+
81+
Creates and incrementally maintains `formal-verification/REPORT.md` — a comprehensive, reader-friendly project report summarising the entire FV effort. Uses mermaid diagrams extensively to visualise proof architecture, dependency layers, modelling choices, the main proof chain, and project timeline. Includes a mandatory Findings section documenting any bugs found (with counterexamples and issue links), formulation issues caught during development, and interesting structural discoveries. The report is updated incrementally each run rather than rewritten from scratch.
82+
7683
## What Gets Created
7784

7885
| Artifact | Location | Description |
@@ -81,6 +88,7 @@ Maintains a single `[Lean Squad] Formal Verification Status` issue as a continuo
8188
| Target list | `formal-verification/TARGETS.md` | Prioritised targets with phase status |
8289
| Informal specs | `formal-verification/specs/<name>_informal.md` | Per-target: contracts, examples, intent |
8390
| Lean specs | `formal-verification/lean/FVSquad/<Name>.lean` | Lean 4 types, propositions, proofs |
91+
| Project report | `formal-verification/REPORT.md` | Comprehensive FV report with mermaid diagrams |
8492
| Status issue | GitHub issue `[FV Squad] Formal Verification Status` | Rolling dashboard |
8593
| Bug reports | GitHub issues `[FV Squad] ...` | Properties violated, with counterexample |
8694

workflows/lean-squad.md

Lines changed: 207 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ description: |
1414
7. Proof Utility Critique — assess the value and coverage of what has been proven so far
1515
8. Aeneas Extraction (optional, Rust only) — use Charon+Aeneas to auto-generate Lean from Rust
1616
9. CI Automation — set up and maintain CI workflows that verify proofs on every PR
17+
10. Project Report — create and incrementally maintain REPORT.md with mermaid diagrams
1718
1819
Phases are sequentially weighted: Task 1 dominates until research is done,
1920
then Task 2 rises, and so on up to proofs. Each run builds on prior runs
@@ -100,9 +101,10 @@ steps:
100101
[ -f ".github/workflows/lean-ci.yml" ] && echo 1 > /tmp/gh-aw/has_lean_ci.txt || echo 0 > /tmp/gh-aw/has_lean_ci.txt
101102
[ -f ".github/workflows/aeneas-generate.yml" ] && echo 1 > /tmp/gh-aw/has_aeneas_ci.txt || echo 0 > /tmp/gh-aw/has_aeneas_ci.txt
102103
103-
# Detect CORRESPONDENCE.md and CRITIQUE.md
104+
# Detect CORRESPONDENCE.md, CRITIQUE.md, and REPORT.md
104105
[ -f "formal-verification/CORRESPONDENCE.md" ] && echo 1 > /tmp/gh-aw/has_correspondence.txt || echo 0 > /tmp/gh-aw/has_correspondence.txt
105106
[ -f "formal-verification/CRITIQUE.md" ] && echo 1 > /tmp/gh-aw/has_critique.txt || echo 0 > /tmp/gh-aw/has_critique.txt
107+
[ -f "formal-verification/REPORT.md" ] && echo 1 > /tmp/gh-aw/has_report.txt || echo 0 > /tmp/gh-aw/has_report.txt
106108
107109
# Detect formal-verification directory
108110
[ -d "formal-verification" ] && echo 1 > /tmp/gh-aw/fv_dir.txt || echo 0 > /tmp/gh-aw/fv_dir.txt
@@ -132,6 +134,7 @@ steps:
132134
has_aeneas_ci = int(open('/tmp/gh-aw/has_aeneas_ci.txt').read().strip() or 0)
133135
has_correspondence = int(open('/tmp/gh-aw/has_correspondence.txt').read().strip() or 0)
134136
has_critique = int(open('/tmp/gh-aw/has_critique.txt').read().strip() or 0)
137+
has_report = int(open('/tmp/gh-aw/has_report.txt').read().strip() or 0)
135138
fv_dir = int(open('/tmp/gh-aw/fv_dir.txt').read().strip() or 0)
136139
fv_docs = int(open('/tmp/gh-aw/fv_docs.txt').read().strip() or 0)
137140
fv_issues = json.load(open('/tmp/gh-aw/fv_issues.json'))
@@ -150,6 +153,7 @@ steps:
150153
7: 'Proof Utility Critique',
151154
8: 'Aeneas Extraction (Rust only)',
152155
9: 'CI Automation',
156+
10: 'Project Report',
153157
}
154158
155159
# Phase progress heuristics derived from repo state
@@ -172,6 +176,7 @@ steps:
172176
7: (10.0 if not has_critique else 3.0) if has_proofs else 0.0, # critique: critical when proofs exist but no doc
173177
8: (3.0 if has_lean_specs else 1.0) if (has_rust and has_research) else 0.0, # aeneas: only for Rust codebases with research done
174178
9: 12.0 if (has_lean_specs and not has_ci) else 2.0, # CI: critical when lean files exist but no CI; regular check otherwise
179+
10: (8.0 if not has_report else 3.0) if has_proofs else (2.0 if has_lean_specs else 0.0), # report: important when proofs exist but no report; available once lean specs exist
175180
}
176181
177182
run_id = int(os.environ.get('GITHUB_RUN_ID', '0'))
@@ -198,7 +203,8 @@ steps:
198203
print(f'Phase flags : research={has_research}, inf_specs={has_inf_specs}, '
199204
f'lean_specs={has_lean_specs}, impl={has_impl}, proofs={has_proofs}, '
200205
f'rust={has_rust}, ci={has_ci}, '
201-
f'correspondence={bool(has_correspondence)}, critique={bool(has_critique)}')
206+
f'correspondence={bool(has_correspondence)}, critique={bool(has_critique)}, '
207+
f'report={bool(has_report)}')
202208
print()
203209
print('Task weights:')
204210
for t, w in weights.items():
@@ -221,6 +227,7 @@ steps:
221227
'has_ci': has_ci,
222228
'has_correspondence': bool(has_correspondence),
223229
'has_critique': bool(has_critique),
230+
'has_report': bool(has_report),
224231
},
225232
'task_names': task_names,
226233
'weights': {str(k): round(v, 2) for k, v in weights.items()},
@@ -408,6 +415,7 @@ formal-verification/
408415
TARGETS.md # Prioritised target list with current phase per target
409416
CORRESPONDENCE.md # How each Lean implementation model maps to the Rust source
410417
CRITIQUE.md # Ongoing assessment of proof utility and coverage
418+
REPORT.md # Ongoing latest project report
411419
specs/
412420
<name>_informal.md # Informal specification per target
413421
lean/
@@ -421,26 +429,28 @@ formal-verification/
421429

422430
### Task 1: Research & Target Identification
423431

424-
**Goal**: Survey the codebase and identify 3–5 functions, data structures, or algorithms that are strong candidates for formal verification. Document the approach, expected benefits, likely spec sizes, and proof tractability.
432+
**Goal**: Survey the codebase and identify 3–5 functions, data structures, or algorithms that are strong candidates for formal verification. Document the approach, expected benefits, likely spec sizes, and proof tractability. If prior FV work exists, incorporate feedback from the latest critique to adjust priorities and approach.
425433

426434
1. Read the repository: explore the structure, primary language(s), key modules. Read README, CONTRIBUTING, and any architecture docs.
427-
2. Identify **FV-amenable targets** — look for:
435+
2. **Read the latest critique** (if `formal-verification/CRITIQUE.md` exists): review its assessments of proof utility, identified gaps, concerns about vacuous proofs, and recommended next targets. Use these findings to adjust which targets to prioritise, which approaches to revise, and which high-value gaps to address. If the critique flags theorems as weak or models as mismatched, factor that into the research plan — either by re-prioritising targets, recommending spec revisions, or noting that certain areas need deeper modelling before further proof work.
436+
3. Identify **FV-amenable targets** — look for:
428437
- Pure or nearly-pure functions with clear inputs/outputs
429438
- Data structure invariants (e.g., sorted lists, balanced trees, valid state machines)
430439
- Algorithms with textbook correctness criteria (sorting, searching, parsing, hashing)
431440
- Security-sensitive logic (authentication, authorisation, cryptographic primitives)
432441
- Protocol or state machine logic with finite state spaces
433442
- Existing tests that implicitly document specification — these are specification hints
434-
3. For each candidate, document:
443+
- **Gaps identified by the critique**: targets or properties that the critique flagged as high-value but not yet attempted
444+
4. For each candidate, document:
435445
- **Benefit**: what property would we verify? What bugs could this catch?
436446
- **Specification size**: roughly how many Lean lines to state the key properties?
437447
- **Proof tractability**: likely `decide` / routine `simp`+`omega`, or requires substantial proof engineering?
438448
- **Approximations needed**: what aspects of the original code can't be directly modelled in Lean (e.g., I/O, side effects, memory layout)? Document these clearly.
439449
- **Approach**: enumeration/`decide`, inductive invariant, equational proof, model checking via bounded `decide`?
440-
4. Search the web (`web-fetch`) for Lean 4 FV patterns relevant to the language/domain. Check Mathlib for relevant existing lemmas and automation.
441-
5. Create or update `formal-verification/RESEARCH.md` and `formal-verification/TARGETS.md`. Create a PR.
442-
6. Optionally, open an issue summarising the survey and inviting maintainer input on priorities.
443-
7. Update memory with identified targets, approach choices, and rationale.
450+
5. Search the web (`web-fetch`) for Lean 4 FV patterns relevant to the language/domain. Check Mathlib for relevant existing lemmas and automation.
451+
6. Create or update `formal-verification/RESEARCH.md` and `formal-verification/TARGETS.md`. If updating, include a section noting how critique feedback was incorporated (e.g., re-prioritised targets, revised approaches, new targets added from gap analysis). Create a PR.
452+
7. Optionally, open an issue summarising the survey and inviting maintainer input on priorities.
453+
8. Update memory with identified targets, approach choices, rationale, and any critique-driven adjustments.
444454

445455
---
446456

@@ -878,6 +888,194 @@ Record in memory:
878888

879889
---
880890

891+
### Task 10: Project Report
892+
893+
**Goal**: Create and incrementally maintain `formal-verification/REPORT.md` — a comprehensive, reader-friendly project report that summarises the entire formal verification effort, including proof architecture, what was verified, findings (including bugs), modelling choices, and project timeline. The report uses mermaid diagrams extensively to visualise proof architecture, dependency layers, modelling choices, and timeline.
894+
895+
This task produces a living document. Each run updates the report to reflect the current state of the project rather than rewriting it from scratch.
896+
897+
1. Read all existing FV artifacts: Lean files, informal specs, CORRESPONDENCE.md, CRITIQUE.md, TARGETS.md, RESEARCH.md, memory, open issues, and merged PRs.
898+
2. **Create or update** `formal-verification/REPORT.md` with the following structure:
899+
900+
#### Report Structure
901+
902+
```markdown
903+
> 🔬 *Lean Squad — automated formal verification for `<owner>/<repo>`.*
904+
905+
**Status**: <emoji> <STATUS> — <N> theorems, <M> Lean files, <S> `sorry`, <tool version>.
906+
907+
---
908+
909+
## Executive Summary
910+
911+
{3–5 sentences: what the project has achieved so far, key numbers (theorems proved,
912+
files, sorry count), headline results (bugs found, key properties verified),
913+
and current direction.}
914+
915+
---
916+
917+
## Proof Architecture
918+
919+
{Describe how the proof is organised — e.g. layers or modules. Include a mermaid
920+
diagram showing the dependency structure between proof files/layers.}
921+
922+
```mermaid
923+
graph TD
924+
A["Layer 1: ..."]
925+
B["Layer 2: ..."]
926+
A --> B
927+
```
928+
929+
---
930+
931+
## What Was Verified
932+
933+
{For each layer or group of proof files, describe what was verified and highlight
934+
key results. Include a mermaid diagram per layer showing the files and their
935+
theorem counts.}
936+
937+
### Layer N — <Name> (<M> files, ~<T> theorems)
938+
939+
{Description of this layer.}
940+
941+
```mermaid
942+
graph LR
943+
F1["File1.lean<br/>N theorems<br/>Key property"]
944+
F2["File2.lean<br/>M theorems<br/>Key property"]
945+
```
946+
947+
**Key results**:
948+
- `theorem_name`: description of what it proves
949+
950+
---
951+
952+
## File Inventory
953+
954+
| File | Theorems | Phase | Key result |
955+
|------|----------|-------|------------|
956+
| `Name.lean` | N | Phase ✅/🔄 | Description |
957+
| **Total** | **N** || **S sorry** |
958+
959+
---
960+
961+
## The Main Proof Chain
962+
963+
{If there is a top-level or headline theorem, show the chain of lemmas
964+
leading to it as a mermaid diagram.}
965+
966+
```mermaid
967+
graph LR
968+
A["lemma1"] --> B["lemma2"] --> C["main_theorem ✅"]
969+
```
970+
971+
{State the top-level theorem in Lean syntax if applicable.}
972+
973+
---
974+
975+
## Modelling Choices and Known Limitations
976+
977+
{Describe what is modelled, what is abstracted, and what is omitted.
978+
Include a mermaid diagram showing the relationship between the real
979+
implementation, the Lean model, and the proofs.}
980+
981+
```mermaid
982+
graph TD
983+
REAL["Real Implementation"]
984+
MODEL["Lean 4 Model"]
985+
PROOF["Lean Proofs"]
986+
REAL -->|"Modelled as"| MODEL
987+
MODEL -->|"Proved in"| PROOF
988+
NOTE1["✅ Included: ..."]
989+
NOTE2["⚠️ Abstracted: ..."]
990+
NOTE3["❌ Omitted: ..."]
991+
MODEL --- NOTE1
992+
MODEL --- NOTE2
993+
MODEL --- NOTE3
994+
```
995+
996+
| Category | What's covered | What's abstracted/omitted |
997+
|----------|---------------|--------------------------|
998+
| ... | ... | ... |
999+
1000+
---
1001+
1002+
## Findings
1003+
1004+
### Bugs Found
1005+
1006+
{List any implementation bugs discovered through formal verification.
1007+
For each bug, include: the property that was expected to hold, the
1008+
counterexample or proof failure, severity, and link to the filed issue.
1009+
1010+
If no bugs found, state this explicitly — it is itself a positive finding.}
1011+
1012+
### Formulation Issues
1013+
1014+
{Any spec or proof formulation bugs caught during development (e.g.
1015+
over-general propositions that turned out to be false).}
1016+
1017+
### Interesting Structural Discoveries
1018+
1019+
{Properties that turned out to be stronger or weaker than expected,
1020+
surprising equivalences, or non-obvious invariants.}
1021+
1022+
---
1023+
1024+
## Project Timeline
1025+
1026+
{Use a mermaid timeline diagram to show the progression of the project.}
1027+
1028+
```mermaid
1029+
timeline
1030+
title FV Project Development
1031+
section Phase 1
1032+
Target A : N theorems
1033+
section Phase 2
1034+
Target B : M theorems
1035+
```
1036+
1037+
---
1038+
1039+
## Toolchain
1040+
1041+
- **Prover**: Lean 4 (version X.Y.Z)
1042+
- **Libraries**: Mathlib / stdlib only
1043+
- **CI**: description of CI setup
1044+
- **Build system**: Lake
1045+
1046+
{Include tactic inventory table if proofs exist.}
1047+
1048+
| Tactic | Usage |
1049+
|--------|-------|
1050+
| `omega` | Integer/natural-number arithmetic |
1051+
| ... | ... |
1052+
```
1053+
1054+
3. **Mermaid diagrams are mandatory** for:
1055+
- Proof architecture / dependency layers
1056+
- Each verification layer's file structure
1057+
- The main proof chain (if a headline theorem exists)
1058+
- Modelling choices (real code → model → proofs)
1059+
- Project timeline
1060+
4. **Findings section is mandatory**: always include a Findings section, even when no bugs have been found. If no bugs were found, state this explicitly as a positive finding. If bugs were found, include for each:
1061+
- The property that was expected to hold
1062+
- The counterexample or proof failure that refuted it
1063+
- The affected function/file and impact
1064+
- Link to the GitHub issue filed (from Task 5)
1065+
5. The report should be **incremental**: read the existing REPORT.md (if any), update sections that have changed, add new layers/files/theorems, and update the timeline. Do not delete prior content unless it has become incorrect.
1066+
6. **Always** include a `## Last Updated` section near the top with the current UTC date/time and the HEAD commit SHA:
1067+
```
1068+
## Last Updated
1069+
- **Date**: YYYY-MM-DD HH:MM UTC
1070+
- **Commit**: `<SHA>`
1071+
```
1072+
7. Count theorems, `sorry`s, and files by inspecting the actual Lean sources — do not guess from memory alone.
1073+
8. Cross-reference CORRESPONDENCE.md and CRITIQUE.md when describing modelling choices, proof utility, and known limitations.
1074+
9. Create a PR with the updated REPORT.md.
1075+
10. Update memory: note that the report exists and what state it covers.
1076+
1077+
---
1078+
8811079
### Task Final: Update Lean Squad Status Issue *(ALWAYS DO THIS EVERY RUN)*
8821080

8831081
Maintain a single open issue titled `[Lean Squad] Formal Verification Status` as a continuously-updated dashboard for maintainers.

0 commit comments

Comments
 (0)