Skip to content

Commit 96d2532

Browse files
author
jammy
committed
feat: add ai-literacy pilot course and improve lesson UX
1 parent 0da048a commit 96d2532

21 files changed

Lines changed: 461 additions & 161 deletions
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# AI Failure Case Collection Pipeline
2+
3+
## Goal
4+
Collect real-world "AI wrote it, but it failed" cases and convert them into lesson-ready assets.
5+
6+
## Output
7+
- Source backlog: `docs/templates/ai-failure-cases-template.csv` (or copied to a working file)
8+
- Weekly triage list: 10 candidate cases
9+
- Lesson-ready cases: at least 3 per week
10+
11+
## Case Definition
12+
A case is valid only when all 3 are true:
13+
1. Reproducible: there is clear input/condition to reproduce failure.
14+
2. Explainable: root cause can be described in 1-3 sentences.
15+
3. Teachable: can be turned into a short exercise with a concrete fix.
16+
17+
## Source Channels (Priority)
18+
1. Internal user logs/errors (highest)
19+
2. GitHub issues/PRs mentioning AI-generated code
20+
3. Stack Overflow / Reddit reports
21+
4. Public benchmark failures (SWE-bench style)
22+
23+
## Pipeline Stages
24+
1. Ingest
25+
- Add raw case into CSV with minimal fields:
26+
- `source_type`, `source_url`, `language`, `domain`, `symptom`, `raw_snippet`
27+
28+
2. Normalize
29+
- Fill these fields:
30+
- `repro_steps`, `expected_behavior`, `actual_behavior`, `root_cause`
31+
- Remove sensitive data from snippet/logs.
32+
33+
3. Score
34+
- Score each case from 1-5:
35+
- `impact_score` (how damaging in real work)
36+
- `frequency_score` (how often it appears)
37+
- `clarity_score` (how easy to teach)
38+
- Compute priority: `impact * 0.5 + frequency * 0.3 + clarity * 0.2`
39+
40+
4. Triage
41+
- Keep top 10 by priority each week.
42+
- Mark status:
43+
- `new`, `triaged`, `lesson_candidate`, `lesson_published`, `rejected`
44+
45+
5. Lesson Mapping
46+
- Map each candidate into one of tracks:
47+
- `verification`, `debugging`, `automation_literacy`
48+
- Define lesson format:
49+
- bug prompt -> reproduction -> diagnosis -> minimal patch -> regression check
50+
51+
6. QA Gate
52+
- Must pass:
53+
- Reproduction works locally
54+
- Fix solves failure and does not break baseline check
55+
- Explanation is under 180 words for step-level content
56+
57+
## Weekly Operating Rhythm
58+
1. Monday: ingest + normalize (30-50 raw cases)
59+
2. Tuesday: score + triage (top 10)
60+
3. Wednesday-Thursday: convert top 3 into lesson drafts
61+
4. Friday: publish 1-3 validated cases and review metrics
62+
63+
## Ownership
64+
- Collector: gathers raw cases and fills ingest fields
65+
- Reviewer: validates reproducibility + root cause
66+
- Lesson editor: converts into course JSON steps
67+
68+
## Metrics
69+
- `new_cases_per_week`
70+
- `triaged_cases_per_week`
71+
- `published_lessons_per_week`
72+
- `rejection_rate`
73+
- `time_to_publish` (ingest -> lesson_published)
74+
75+
## Rejection Rules
76+
Reject if any applies:
77+
1. Not reproducible
78+
2. Root cause is unclear after 20 minutes
79+
3. Case is too niche and not generalizable
80+
4. Legal/privacy risk in source content
81+
82+
## First Week Bootstrap
83+
1. Create working backlog file from template.
84+
2. Add 20 cases (target mix: JS 8, Java 6, Python 6).
85+
3. Publish 2 pilot lessons from highest-priority cases.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
case_id,created_at,status,source_type,source_url,language,domain,track,symptom,raw_snippet,repro_steps,expected_behavior,actual_behavior,root_cause,fix_summary,impact_score,frequency_score,clarity_score,priority_score,owner,reviewer,lesson_id,notes
2+
CASE-2026-03-001,2026-03-09,new,article,https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages/,python,dependency_security,verification,"AI suggests non-existent package; developers install it","pip install huggingface-cli","1) Ask LLM for Hugging Face CLI install command 2) Run suggested pip command","Tool should suggest official package/install path","Suggested package name is fake and can be attacker-controlled","LLM dependency hallucination + public registry namespace abuse","Add package verification step (registry check + maintainer/date checks) before install",5,4,5,4.7,,,,"Good anchor case for slopsquatting lesson"
3+
CASE-2026-03-002,2026-03-09,new,article,https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages/,javascript,dependency_security,verification,"Hallucinated npm/python package names repeat across runs","Repeated fake package names from model outputs","1) Query model repeatedly with same coding tasks 2) Track suggested packages","Low/no fake dependency rate","Persistent hallucinations observed; exploitable in Python/npm","Repetitive hallucinated names create predictable attack surface","Introduce dependency allowlist and block newly published unknown packages",5,4,4,4.5,,,,"Use as dataset-driven verification exercise"
4+
CASE-2026-03-003,2026-03-09,new,official_docs,https://docs.github.com/en/copilot/responsible-use/copilot-coding-agent,polyglot,inaccurate_generation,verification,"Generated code appears valid but can be semantically/syntactically wrong","Copilot docs: generated code may be inaccurate","1) Use coding agent output without tests 2) Merge directly","Agent output should be trustworthy by default","Output can violate intent or contain errors","Probabilistic generation without full correctness guarantees","Require test gate + review checklist before merge",4,5,5,4.5,,,,"Policy lesson: never trust untested AI output"
5+
CASE-2026-03-004,2026-03-09,new,research_paper,https://www.citedrive.com/en/discovery/security-weaknesses-of-copilot-generated-code-in-github-projects-an-empirical-study/,python,security_cwe,verification,"Copilot-generated snippets include high-rate security weaknesses","29.5% Python snippets affected; multiple CWE categories","1) Collect Copilot-tagged snippets 2) Run static analysis","Most generated snippets should pass basic security checks","High proportion contain CWE-level weaknesses","Model optimization for plausibility/functionality over secure defaults","Add SAST scan and security linting into acceptance criteria",5,4,4,4.5,,,,"Use CWE mapping mini-labs"
6+
CASE-2026-03-005,2026-03-09,new,research_paper,https://www.citedrive.com/en/discovery/security-weaknesses-of-copilot-generated-code-in-github-projects-an-empirical-study/,javascript,security_cwe,verification,"JS generated snippets contain exploitable patterns","24.2% JavaScript snippets affected","1) Generate JS snippets with assistant 2) Scan with eslint/security tooling","Generated code should be safe-by-default","Frequent injection/XSS-style weakness patterns","Training data + weak secure-by-default prompting","Teach threat-model checklist for every AI patch",5,4,4,4.5,,,,"Pair with frontend security chapter"
7+
CASE-2026-03-006,2026-03-09,new,press_release,https://cyber.nyu.edu/2021/10/15/ccs-researchers-find-github-copilot-generates-vulnerable-code-40-of-the-time/,polyglot,security_baseline,verification,"High vulnerability rate in generated code","~40% generated samples vulnerable in study","1) Recreate prompt scenarios 2) Analyze generated implementations","Low critical-vuln rate in suggestions","Large fraction vulnerable/buggy","LLM completion optimized for matching patterns, not security proofs","Adopt secure template snippets and mandatory security review",5,3,4,4.2,,,,"Historical baseline for why verification matters"
8+
CASE-2026-03-007,2026-03-09,new,stackoverflow,https://stackoverflow.com/questions/77812049/openai-api-error-choice-object-has-no-attribute-text,python,api_mismatch,debugging,"Code uses wrong response field and crashes","'Choice' object has no attribute 'text'","1) Use Chat Completions API 2) Access response.choices[0].text","Should return content string","Runtime attribute error occurs","Outdated API usage pattern mixed with current SDK","Replace with response.choices[0].message.content",3,5,5,4.0,,,,"Great beginner 'AI outdated code' lesson"
9+
CASE-2026-03-008,2026-03-09,new,github_issue,https://github.com/microsoft/vscode-copilot-release/issues/9940,typescript,path_handling,automation_literacy,"Agent writes invalid path separators in WSL context","Expected /home/... but generated \\home\\...","1) Use Copilot agent in WSL 2) Ask for file generation 3) Try save","Generated file path should match environment","Save/keep action fails due invalid path format","Environment-aware path normalization failure","Add environment detector + path canonicalization checks",4,4,4,4.0,,,,"Good 'verify before apply' file-ops case"
10+
CASE-2026-03-009,2026-03-09,new,github_issue,https://github.com/microsoft/vscode/issues/265794,typescript,patch_regression,debugging,"Agent reapplies superseded edits and corrupts file state","Keep Changes applies older intermediate diffs","1) Long multi-step session 2) Manual edits between agent runs 3) Keep changes","Final diff should reflect latest accepted state","Older edits are reintroduced; regression appears","Diff application pipeline mixes stale intermediate states","Require final-state diff preview + conflict detection before apply",5,3,4,4.1,,,,"Strong advanced debugging scenario"
11+
CASE-2026-03-010,2026-03-09,new,github_issue,https://github.com/microsoft/vscode/issues/271620,markdown,instruction_following,automation_literacy,"Commit message generation ignores custom instructions","Configured instructions in settings are not followed","1) Configure instruction file 2) Generate commit message","Output should follow instruction format","Generic output ignores constraints","Instruction ingestion path/feature instability","Add validation: reject output when policy markers absent",3,4,4,3.5,,,,"Can be used for prompt-contract lesson"
12+
CASE-2026-03-011,2026-03-09,new,github_issue,https://github.com/microsoft/vscode/issues/270772,dart,file_generation,automation_literacy,"Agent reports files created, but files missing","Agent confirms create; explorer shows nothing","1) Ask agent to generate multiple files 2) Open explorer","Files should exist after success confirmation","No files appear; links fail","False-positive task completion reporting","Add post-action filesystem verification and retry logic",4,4,4,4.0,,,,"Teaches trust-but-verify workflow"
13+
CASE-2026-03-012,2026-03-09,new,github_issue,https://github.com/microsoft/vscode-copilot-release/issues/5184,mixed,session_reliability,automation_literacy,"Agent enters no-response crash loop during iterative coding","""Sorry, no response was returned"" repeatedly","1) Multi-turn iterative asks 2) Continue/refine repeatedly","Agent should preserve progress and continue","Crash loop + generated changes lost","Session/state handling and rate-limit recovery weakness","Checkpoint generated patches per step before next turn",4,3,4,3.8,,,,"Useful for resilient workflow lesson"
14+
CASE-2026-03-013,2026-03-09,new,github_issue,https://github.com/microsoft/vscode-copilot-release/issues/3927,typescript,context_window,automation_literacy,"Large-file edit fails due length limit","Response hit length limit on ~7000-line edit","1) Request broad refactor in large file","Agent should chunk edits safely","Single-pass fails with non-actionable error","No automatic chunking strategy for long outputs","Split task into bounded file regions + batch apply",3,4,5,3.8,,,,"Maps to practical chunking techniques"
15+
CASE-2026-03-014,2026-03-09,new,github_issue,https://github.com/microsoft/vscode/issues/261555,python,notebook_editing,automation_literacy,"Inline chat edits notebook in read-only preview, not active cell","Immutable preview cells appear","1) Use Ctrl+I in notebook cell 2) Request modification","Cell document should mutate in place","Preview-only read-only output generated","Notebook editor integration mismatch","Require explicit apply-to-cell step + diff check",3,3,4,3.3,,,,"Notebook-specific AI editing pitfall"
16+
CASE-2026-03-015,2026-03-09,new,github_issue,https://github.com/microsoft/vscode/issues/275672,typescript,agent_regression,debugging,"Model iteration introduces repeated regressions","Report notes code breakage and infernal iteration","1) Let agent perform repeated auto-fixes 2) run tests each loop","Each iteration should monotonically improve","Later iterations re-break previous fixes","Unstable planning and missing regression guardrails","Add per-iteration regression test gate",4,3,3,3.5,,,,"Great for iterative debugging lesson"
17+
CASE-2026-03-016,2026-03-09,new,github_issue,https://github.com/github/github-mcp-server/issues/937,python,environment_limits,automation_literacy,"Agent promises artifact generation but cannot persist file","PPT generation blocked by environment limitations","1) Request PPT generation in Copilot context 2) expect local save","Artifact should save or fail early with actionable fallback","Late-stage failure after generation","Capability mismatch not surfaced upfront","Require capability preflight + fallback script generation",3,3,4,3.3,,,,"Automation literacy: environment-aware planning"
18+
CASE-2026-03-017,2026-03-09,new,stackoverflow,https://stackoverflow.com/questions/79084529/streamlit-javascript-integration,python,javascript_bridge,debugging,"Assistant advice still leaves integration returning wrong value type","return 2 gives unexpected behavior","1) Use st_javascript with numeric return 2) inspect output","Expected numeric value pass-through","Buggy behavior unless string workaround used","Tooling/runtime quirk not handled by generic AI advice","Add type normalization and minimal repro debugging steps",2,3,4,2.7,,,,"Small repro-first debugging case"
19+
CASE-2026-03-018,2026-03-09,new,official_blog,https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/,polyglot,evaluation_trust,verification,"Benchmark pass does not always imply true bug-fix quality","Audit found many unresolved benchmark quality issues","1) Compare repeated runs on same issue 2) inspect failing subsets","Benchmark should reliably reflect engineering ability","Significant portion had test/problem-design issues","Over-reliance on imperfect benchmark labels","Teach multi-metric validation beyond single benchmark score",3,3,5,3.4,,,,"Meta-lesson for AI evaluation literacy"
20+
CASE-2026-03-019,2026-03-09,new,official_blog,https://openai.com/index/introducing-swe-bench-verified/,python,error_handling,debugging,"Model struggles with simple but brittle bug contexts without proper tests","Example: 'kern referenced before assignment' issue framing","1) Reproduce issue from problem statement 2) write focused test 3) patch","Patch should satisfy explicit fail-to-pass test","Without targeted test, fix attempts drift","Lack of test-first debugging discipline","Force test-first workflow in lesson template",3,4,5,3.7,,,,"Good bridge from benchmark to lesson design"
21+
CASE-2026-03-020,2026-03-09,new,official_docs,https://docs.github.com/en/copilot/responsible-use/copilot-coding-agent,polyglot,public_code_match,verification,"Generated code may match public code unexpectedly","Copilot docs note near-match/public code risk","1) Generate code for common task 2) similarity-check output","Low accidental copy risk with clear provenance","Possible near-match output without references","Model can reproduce training-like patterns","Add provenance/license check step before merge",4,3,4,3.7,,,,"Useful for compliance + review checklist"
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# AI Failure Sources (2026-03-09)
2+
3+
Collected for initial pipeline bootstrap.
4+
5+
## Research / Official
6+
- https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages/
7+
- https://www.citedrive.com/en/discovery/security-weaknesses-of-copilot-generated-code-in-github-projects-an-empirical-study/
8+
- https://cyber.nyu.edu/2021/10/15/ccs-researchers-find-github-copilot-generates-vulnerable-code-40-of-the-time/
9+
- https://openai.com/index/introducing-swe-bench-verified/
10+
- https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/
11+
- https://docs.github.com/en/copilot/responsible-use/copilot-coding-agent
12+
13+
## GitHub Issues (Tooling + Agent Failure)
14+
- https://github.com/microsoft/vscode-copilot-release/issues/9940
15+
- https://github.com/microsoft/vscode/issues/265794
16+
- https://github.com/microsoft/vscode/issues/271620
17+
- https://github.com/microsoft/vscode/issues/270772
18+
- https://github.com/microsoft/vscode-copilot-release/issues/5184
19+
- https://github.com/microsoft/vscode-copilot-release/issues/3927
20+
- https://github.com/microsoft/vscode/issues/261555
21+
- https://github.com/microsoft/vscode/issues/275672
22+
- https://github.com/github/github-mcp-server/issues/937
23+
24+
## Q&A
25+
- https://stackoverflow.com/questions/77812049/openai-api-error-choice-object-has-no-attribute-text
26+
- https://stackoverflow.com/questions/79084529/streamlit-javascript-integration
27+
28+
## Notes
29+
- Selection rule: reproducible + explainable + teachable.
30+
- Prioritized cases with concrete symptom and fix path.
31+
- Converted into backlog CSV with triage fields and scoring.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
case_id,created_at,status,source_type,source_url,language,domain,track,symptom,raw_snippet,repro_steps,expected_behavior,actual_behavior,root_cause,fix_summary,impact_score,frequency_score,clarity_score,priority_score,owner,reviewer,lesson_id,notes
2+
CASE-0001,2026-03-09,new,github_issue,https://example.com,java,input_validation,verification,"accepts invalid edge input","if (n > 0) ...","1) run with n=0","reject input 0","continues flow","boundary check excludes zero","change condition to n >= 1",4,4,5,4.2,,,,
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
{
2+
"language": {
3+
"id": "ai-literacy",
4+
"name": "AI Code Literacy",
5+
"description": "AI가 작성한 코드를 이해하고 검증하고 수정하는 실전 트랙",
6+
"icon": "🛡️",
7+
"color": "#0EA5E9"
8+
},
9+
"chapters": [
10+
{
11+
"id": "ai-ch1",
12+
"order": 1,
13+
"title": "AI 코드 검증 입문",
14+
"description": "AI가 준 코드가 왜 틀리는지 재현하고 고치는 최소 루프를 익힙니다.",
15+
"part": "verification",
16+
"partLabel": "AI 검증",
17+
"lessons": [
18+
{
19+
"id": "ai-1-1",
20+
"order": 1,
21+
"title": "API 응답 필드 착각 잡기",
22+
"description": "오래된 AI 예제 코드의 응답 필드 접근 오류를 재현하고 수정합니다.",
23+
"difficulty": "basic",
24+
"estimatedTime": 12
25+
}
26+
]
27+
}
28+
]
29+
}

0 commit comments

Comments
 (0)