Inflated Code Output LOC from VS Code edit-session payloads
The Output → Code Output view can significantly overcount AI-generated lines of code for VS Code Copilot agent sessions by summing persisted edit operation payloads instead of estimating unique produced code.
Summary
In one investigated local dataset, the chart showed 91,583 LOC for 2026-06-08. Almost all of it came from VS Code chatEditingSessions edit-state operations, not from AI response code blocks.
| Measure |
LOC |
| Total shown for 2026-06-08 |
91,583 |
| AI response code blocks |
58 |
| VS Code edit-state payloads |
91,525 |
Root Cause
The LOC metric sums every persisted VS Code edit operation payload. For multi-round Copilot agent sessions, those payloads include repeated whole-file replacements, so the same file's lines are counted many times.
Example:
- Agent writes a 1,200-line file → VS Code stores a whole-file snapshot.
- Agent revises it → VS Code stores the whole file again.
- Agent revises it again → VS Code stores it again.
A user would expect ~1,200 produced lines. The current counter reports 3,600.
Why It Depends on the Model
The overcount tracks which edit tool the model uses:
| Model family |
Requests |
Counted |
Real |
Inflation |
apply_patch |
string-edit |
| OpenAI (gpt-5.5, etc.) |
85 |
171,027 |
115,632 |
1.48× |
96% |
0% |
| Anthropic (Claude) |
1,213 |
160,402 |
184,919 |
0.87× |
0% |
90% |
| Mixed / copilot-auto |
211 |
52,904 |
50,046 |
1.06× |
68% |
27% |
- OpenAI /
apply_patch — the tool writes back the entire file for every change. VS Code records each apply_patch as a whole-file textEdit. Repeated applies on the same file inflate the count.
- Anthropic /
replace_string_in_file — targeted search-and-replace. VS Code records only the changed region. Little or no inflation (slight undercount instead).
- Mixed — proportional to how often
apply_patch is chosen.
Deep-Dive Verification (one request)
The largest request ("Start implementation", request_78fc0984) was reconstructed from its session JSONL and cross-checked three ways:
| Measure |
LOC |
| Current counter (sum of every edit payload) |
16,727 |
| Genuinely new lines (diff vs previous version) |
~5,294 |
| Lines the model actually emitted (patch + create payloads) |
~1,124 |
The model ran 59 tool rounds and edited almost entirely via 23 small apply_patch diffs — it never re-emitted whole files. Yet VS Code persisted 25 whole-file snapshots and the counter summed all of them.
haco/orchestrator.py — 9 whole-file snapshots:
| Snapshot |
Stored file lines (counted) |
| 1 |
916 |
| 2 |
1,190 |
| … |
… |
| 9 |
1,207 |
| Sum (current) |
10,517 |
| Final file |
~1,207 |
Token usage corroborates: 32,558 real output tokens is consistent with ~1,124 lines of patch text, not 16,727 produced lines.
Actual vs Expected Behavior
Actual: AI-Generated LoC = sum of all edit operation payload sizes. Whole-file replacements and repeated revisions are summed as if each were new output.
Expected: Estimate unique or net AI-produced LOC. Repeated whole-file rewrites of the same file within one request should not multiply the file's line count.
Proposed Fix: Incremental Per-File Diff (Variant D)
Within each request, keep a running copy of each file's content. For every textEdit, reconstruct the resulting file state and count only lines new compared to the previous version of that file.
// per (requestId, fileUri), operations sorted by epoch
let prev = seedFromInitialContents(fileUri) ?? ""; // "" for newly created files
let produced = 0;
for (const op of fileOps) {
const next = applyEdits(prev, op.edits); // reconstruct file state after this op
produced += addedLineCount(prev, next); // line-level diff: only new/changed lines
prev = next;
}
editLocIndex.set(requestId, fileUri, produced);
addedLineCount uses a linear multiset difference (hash each line with a 32-bit charCodeAt scan, tally previous hashes in a Map<number, count>, count hashes not already present). No LCS/Myers diff — keeps the step O(C) in payload characters.
Key fast paths:
- Single-write files (74% here) skip the diff entirely — count newlines directly.
- First snapshot of a new file — count its line total without a diff.
Effect on the investigated request:
| Measure |
Current |
With fix |
haco/orchestrator.py |
10,517 |
~1,550 |
| Whole request |
16,727 |
~5,294 |
| Day total (2026-06-08) |
91,525 |
substantially lower |
This is tool-agnostic: it removes the apply_patch inflation and also corrects the slight string-replace undercount.
Performance
Benchmarked on the heaviest local workspace (53 edit-state files, 761 operations, 12.1 MB), 60 iterations:
| Variant |
Time |
vs current |
LOC |
| A — current (newline sum, the bug) |
15.9 ms |
1.0× |
149,321 |
B — naive diff (split + Map<string>) |
31.8 ms |
2.0× |
94,136 |
D — fast-path + split-free hash Map<number> |
23.0 ms |
1.5× |
94,155 |
| E — net-growth proxy (near-free) |
12.1 ms |
0.8× |
86,674 |
+7 ms on the heaviest workspace; ~57 ms projected across all 82 workspaces (~1 GB). Against a multi-second parse of 953 MB of session JSONL, this is not perceptible.
Relevant Code
Acceptance Criteria
Inflated Code Output LOC from VS Code edit-session payloads
The Output → Code Output view can significantly overcount AI-generated lines of code for VS Code Copilot agent sessions by summing persisted edit operation payloads instead of estimating unique produced code.
Summary
In one investigated local dataset, the chart showed 91,583 LOC for
2026-06-08. Almost all of it came from VS CodechatEditingSessionsedit-state operations, not from AI response code blocks.Root Cause
Example:
A user would expect ~1,200 produced lines. The current counter reports 3,600.
Why It Depends on the Model
The overcount tracks which edit tool the model uses:
apply_patch— the tool writes back the entire file for every change. VS Code records each apply_patch as a whole-filetextEdit. Repeated applies on the same file inflate the count.replace_string_in_file— targeted search-and-replace. VS Code records only the changed region. Little or no inflation (slight undercount instead).apply_patchis chosen.Deep-Dive Verification (one request)
The largest request (
"Start implementation",request_78fc0984) was reconstructed from its session JSONL and cross-checked three ways:The model ran 59 tool rounds and edited almost entirely via 23 small
apply_patchdiffs — it never re-emitted whole files. Yet VS Code persisted 25 whole-file snapshots and the counter summed all of them.haco/orchestrator.py— 9 whole-file snapshots:Token usage corroborates: 32,558 real output tokens is consistent with ~1,124 lines of patch text, not 16,727 produced lines.
Actual vs Expected Behavior
Actual:
AI-Generated LoC= sum of all edit operation payload sizes. Whole-file replacements and repeated revisions are summed as if each were new output.Expected: Estimate unique or net AI-produced LOC. Repeated whole-file rewrites of the same file within one request should not multiply the file's line count.
Proposed Fix: Incremental Per-File Diff (Variant D)
Within each request, keep a running copy of each file's content. For every
textEdit, reconstruct the resulting file state and count only lines new compared to the previous version of that file.addedLineCountuses a linear multiset difference (hash each line with a 32-bitcharCodeAtscan, tally previous hashes in aMap<number, count>, count hashes not already present). No LCS/Myers diff — keeps the step O(C) in payload characters.Key fast paths:
Effect on the investigated request:
haco/orchestrator.pyThis is tool-agnostic: it removes the
apply_patchinflation and also corrects the slight string-replace undercount.Performance
Benchmarked on the heaviest local workspace (53 edit-state files, 761 operations, 12.1 MB), 60 iterations:
split+Map<string>)Map<number>+7 ms on the heaviest workspace; ~57 ms projected across all 82 workspaces (~1 GB). Against a multi-second parse of 953 MB of session JSONL, this is not perceptible.
Relevant Code
src/webview/page-output.ts— rendersgetCodeProduction/summary.totalAiLocsrc/core/analyzer-production.ts— adds AI response code blocks + edit-session LOCsrc/core/parser-vscode.ts— populates edit LOC index by counting newline characters in every inserted edit payload (the fix goes here)Acceptance Criteria