Inflated Code Output LOC from VS Code edit-session payloads

# Inflated Code Output LOC from VS Code edit-session payloads

The **Output → Code Output** view can significantly overcount AI-generated lines of code for VS Code Copilot agent sessions by summing persisted edit operation payloads instead of estimating unique produced code.

---

## Summary

In one investigated local dataset, the chart showed **91,583 LOC** for `2026-06-08`. Almost all of it came from VS Code `chatEditingSessions` edit-state operations, not from AI response code blocks.

| Measure | LOC |
|---|---:|
| Total shown for 2026-06-08 | 91,583 |
| AI response code blocks | 58 |
| VS Code edit-state payloads | 91,525 |

---

## Root Cause

> The LOC metric sums every persisted VS Code edit operation payload. For multi-round Copilot agent sessions, those payloads include repeated whole-file replacements, so the same file's lines are counted many times.

**Example:**
1. Agent writes a 1,200-line file → VS Code stores a whole-file snapshot.
2. Agent revises it → VS Code stores the whole file again.
3. Agent revises it again → VS Code stores it again.

A user would expect ~1,200 produced lines. The current counter reports **3,600**.

---

## Why It Depends on the Model

The overcount tracks which **edit tool** the model uses:

| Model family | Requests | Counted | Real | Inflation | apply_patch | string-edit |
|---|---:|---:|---:|---:|---:|---:|
| OpenAI (gpt-5.5, etc.) | 85 | 171,027 | 115,632 | **1.48×** | 96% | 0% |
| Anthropic (Claude) | 1,213 | 160,402 | 184,919 | **0.87×** | 0% | 90% |
| Mixed / copilot-auto | 211 | 52,904 | 50,046 | 1.06× | 68% | 27% |

- **OpenAI / `apply_patch`** — the tool writes back the entire file for every change. VS Code records each apply_patch as a whole-file `textEdit`. Repeated applies on the same file inflate the count.
- **Anthropic / `replace_string_in_file`** — targeted search-and-replace. VS Code records only the changed region. Little or no inflation (slight undercount instead).
- **Mixed** — proportional to how often `apply_patch` is chosen.

---

## Deep-Dive Verification (one request)

The largest request (`"Start implementation"`, `request_78fc0984`) was reconstructed from its session JSONL and cross-checked three ways:

| Measure | LOC |
|---|---:|
| Current counter (sum of every edit payload) | 16,727 |
| Genuinely new lines (diff vs previous version) | ~5,294 |
| Lines the model actually emitted (patch + create payloads) | ~1,124 |

The model ran 59 tool rounds and edited almost entirely via **23 small `apply_patch` diffs** — it never re-emitted whole files. Yet VS Code persisted **25 whole-file snapshots** and the counter summed all of them.

**`haco/orchestrator.py` — 9 whole-file snapshots:**

| Snapshot | Stored file lines (counted) |
|---:|---:|
| 1 | 916 |
| 2 | 1,190 |
| … | … |
| 9 | 1,207 |
| **Sum (current)** | **10,517** |
| **Final file** | **~1,207** |

Token usage corroborates: **32,558 real output tokens** is consistent with ~1,124 lines of patch text, not 16,727 produced lines.

---

## Actual vs Expected Behavior

**Actual:** `AI-Generated LoC` = sum of all edit operation payload sizes. Whole-file replacements and repeated revisions are summed as if each were new output.

**Expected:** Estimate unique or net AI-produced LOC. Repeated whole-file rewrites of the same file within one request should not multiply the file's line count.

---

## Proposed Fix: Incremental Per-File Diff (Variant D)

Within each request, keep a running copy of each file's content. For every `textEdit`, reconstruct the resulting file state and count only lines **new compared to the previous version** of that file.

```ts
// per (requestId, fileUri), operations sorted by epoch
let prev = seedFromInitialContents(fileUri) ?? "";  // "" for newly created files
let produced = 0;
for (const op of fileOps) {
  const next = applyEdits(prev, op.edits);   // reconstruct file state after this op
  produced += addedLineCount(prev, next);    // line-level diff: only new/changed lines
  prev = next;
}
editLocIndex.set(requestId, fileUri, produced);
```

`addedLineCount` uses a **linear multiset difference** (hash each line with a 32-bit `charCodeAt` scan, tally previous hashes in a `Map<number, count>`, count hashes not already present). No LCS/Myers diff — keeps the step O(C) in payload characters.

**Key fast paths:**
- Single-write files (74% here) skip the diff entirely — count newlines directly.
- First snapshot of a new file — count its line total without a diff.

**Effect on the investigated request:**

| Measure | Current | With fix |
|---|---:|---:|
| `haco/orchestrator.py` | 10,517 | ~1,550 |
| Whole request | 16,727 | ~5,294 |
| Day total (2026-06-08) | 91,525 | substantially lower |

This is **tool-agnostic**: it removes the `apply_patch` inflation and also corrects the slight string-replace undercount.

---

## Performance

Benchmarked on the heaviest local workspace (53 edit-state files, 761 operations, 12.1 MB), 60 iterations:

| Variant | Time | vs current | LOC |
|---|---:|---:|---:|
| A — current (newline sum, the bug) | 15.9 ms | 1.0× | 149,321 |
| B — naive diff (`split` + `Map<string>`) | 31.8 ms | 2.0× | 94,136 |
| **D — fast-path + split-free hash `Map<number>`** | **23.0 ms** | **1.5×** | **94,155** |
| E — net-growth proxy (near-free) | 12.1 ms | 0.8× | 86,674 |

**+7 ms** on the heaviest workspace; **~57 ms** projected across all 82 workspaces (~1 GB). Against a multi-second parse of 953 MB of session JSONL, this is not perceptible.

---

## Relevant Code

- [`src/webview/page-output.ts`](../src/webview/page-output.ts) — renders `getCodeProduction` / `summary.totalAiLoc`
- [`src/core/analyzer-production.ts`](../src/core/analyzer-production.ts) — adds AI response code blocks + edit-session LOC
- [`src/core/parser-vscode.ts`](../src/core/parser-vscode.ts) — populates edit LOC index by counting newline characters in every inserted edit payload (**the fix goes here**)

---

## Acceptance Criteria

- [ ] Repeated whole-file replacements of the same file in one request do not multiply the file's LOC.
- [ ] A request that rewrites a 1,200-line file three times should not report ~3,600 unique generated LOC.
- [ ] Existing code-block-based counting continues to work for harnesses without edit-session data.
- [ ] The Code Output view makes clear whether it reports raw edit-operation volume or estimated unique produced code.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inflated Code Output LOC from VS Code edit-session payloads #127

Inflated Code Output LOC from VS Code edit-session payloads

Summary

Root Cause

Why It Depends on the Model

Deep-Dive Verification (one request)

Actual vs Expected Behavior

Proposed Fix: Incremental Per-File Diff (Variant D)

Performance

Relevant Code

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Measure	LOC
Total shown for 2026-06-08	91,583
AI response code blocks	58
VS Code edit-state payloads	91,525

Model family	Requests	Counted	Real	Inflation	apply_patch	string-edit
OpenAI (gpt-5.5, etc.)	85	171,027	115,632	1.48×	96%	0%
Anthropic (Claude)	1,213	160,402	184,919	0.87×	0%	90%
Mixed / copilot-auto	211	52,904	50,046	1.06×	68%	27%

Measure	LOC
Current counter (sum of every edit payload)	16,727
Genuinely new lines (diff vs previous version)	~5,294
Lines the model actually emitted (patch + create payloads)	~1,124

Snapshot	Stored file lines (counted)
1	916
2	1,190
…	…
9	1,207
Sum (current)	10,517
Final file	~1,207

Measure	Current	With fix
`haco/orchestrator.py`	10,517	~1,550
Whole request	16,727	~5,294
Day total (2026-06-08)	91,525	substantially lower

Variant	Time	vs current	LOC
A — current (newline sum, the bug)	15.9 ms	1.0×	149,321
B — naive diff (`split` + `Map<string>`)	31.8 ms	2.0×	94,136
D — fast-path + split-free hash `Map<number>`	23.0 ms	1.5×	94,155
E — net-growth proxy (near-free)	12.1 ms	0.8×	86,674

Inflated Code Output LOC from VS Code edit-session payloads #127

Description

Inflated Code Output LOC from VS Code edit-session payloads

Summary

Root Cause

Why It Depends on the Model

Deep-Dive Verification (one request)

Actual vs Expected Behavior

Proposed Fix: Incremental Per-File Diff (Variant D)

Performance

Relevant Code

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions