fix(judge): record unmeasured instead of fabricated zeros on checker failure by drewstone · Pull Request #301 · tangle-network/agent-eval

drewstone · 2026-07-02T02:09:29Z

Problem

A correctness-checker failure fabricated zeros. parseCorrectnessResponse required a closing }; a checker LLM reply truncated by the 200-token cap (e.g. {"correct": false, ") threw, the throw aborted the entire completion check, and downstream judge dimensions were recorded as 0 — violating the package's own JudgeParseError contract ("a synthetic zero is indistinguishable from a real low score"). Verified real-world harm: a legal-agent Kimi row with deterministic 0.96 and 26 valid proposals recorded 0 across all 7 judge dimensions because the checker's OWN reply truncated at 20 chars. Checker LLM calls were also captured nowhere, making these failures unauditable.

Change

src/json-recovery.ts (new): truncation-tolerant JSON recovery (auto-close unclosed strings/objects/arrays, drop dangling members), lifted from the inline logic in reflective-mutation.ts which now imports it.
parseCorrectnessResponse: strict parse first; on failure, recover the truncated prefix — a recovered boolean verdict is a real measurement (the Kimi-row reply {"correct": false, " now parses instead of zeroing the run). No recoverable boolean → JudgeParseError, never a guess.
createLlmCorrectnessChecker: one retry on parse/call failure (maxAttempts, default 2); optional rawSink records every checker request/response/error (provider: 'correctness-checker') for forensics.
verifyCompletion / completionVerdict: a checker failure marks the requirement unmeasured + unmeasuredReason — correct stays null, the row leaves completionRate's denominator, fullyComplete cannot be claimed, and unmeasuredCount is reported. All checks failing → throw (infrastructure failure, not a scored run).

Verification

npx vitest run src/completion-verifier.test.ts src/json-recovery.test.ts src/reflective-mutation.test.ts → 51 pass / 0 fail (incl. Kimi-shaped truncation recovery, cut-boolean non-fabrication, retry, raw-capture ordering, unmeasured denominator, all-failed throw)
Full suite pnpm test → 2671 pass / 2 skipped / 0 fail
npx tsc --noEmit → clean

🤖 Generated with Claude Code

…failure

tangletools

✅ Auto-approved drewstone PR — `5d129947`

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: drewstone_author · 2026-07-02T02:09:38Z}

tangletools

🟡 Value Audit — sound-with-nits


Verdict	sound-with-nits
Concerns	1 (1 weak-concern)
Heuristic	0.0s
Duplication	0.1s
Interrogation	262.5s (2 bridge agents)
Total	262.6s

💰 Value — sound-with-nits

Stops the correctness checker from fabricating zeros on truncation/infra failure by recovering truncated JSON and modeling checker failures as unmeasured; a correct, in-grain fix with one minor dedup opportunity in the hand-rolled retry/rawSink loop.

What it does: Three coupled changes. (1) New shared src/json-recovery.ts lifts truncation-tolerant JSON recovery out of reflective-mutation.ts (clean extraction, now imports autoCloseTruncatedJson) and extends it with recoverTruncatedJson (auto-close unclosed strings/objects, drop dangling members, iterate at comma boundaries). (2) parseCorrectnessResponse strict-parses first, then on failure recovers
Goals it achieves: Eliminate fabricated zeros — a checker LLM reply truncated by the 200-token cap (e.g. {"correct": false, ") used to throw, aborting the completion check and recording 0 across all judge dimensions for a row whose deterministic score was 0.96. This directly violates the package's own JudgeParseError contract (judges.ts:6-11: 'a synthetic zero is indistinguishable from a real low score'). Second
Assessment: Sound and well-scoped. The unmeasured modeling is the correct third state for an eval oracle — distinct from both 'passed' and 'failed' — and its propagation is coherent: denominator exclusion in completionVerdict (completion-verifier.ts:126-136), fullyComplete suppression (line 139), and the throw-when-all-failed guard (line 128-134) all read as deliberate, matching the repo's fail-loud doc
Better / existing approach: Looked for existing machinery the checker should reuse instead of hand-rolling. Two candidates exist but neither is a drop-in: (a) recordRaw+cryptoEventId in llm-client.ts:653-678 do best-effort rawSink recording + event-id generation identical in spirit to the PR's inline record helper (completion-verifier.ts:494-501), but they are private and fetch-coupled — the checker uses the TCloud/`
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 2
Bridge warning: opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content

🎯 Usefulness — sound

Fixes fabricated-zero scoring on checker truncation/failure via a shared JSON-recovery module, the established JudgeParseError type, an additive forensic sink opt, and a new unmeasured state that correctly excludes failed checks from the completion denominator — all wired through the existing verify

Integration: Fully reachable. verifyCompletion (completion-verifier.ts:340) is the completion oracle called by the agent tool eval-tools.ts:163, the playback scoreUserStory (playback.ts:97), and exported from index.ts:488-492. createLlmCorrectnessChecker (completion-verifier.ts:486) is the production CorrectnessChecker injected into it, so the new retry/recovery/rawSink behavior is on every real scoring path.
Fit with existing patterns: Fits the codebase's grain. json-recovery.ts is a clean extraction of the autoCloseTruncatedJson logic reflective-mutation.ts already needed (now imports it at reflective-mutation.ts:20, deduplicating the inline copy). The checker now throws JudgeParseError (judges.ts:12) — the exact type parseJudgeResponse (judges.ts:294) uses for the same 'never fabricate a zero' doctrine. The rawSink opt reuses
Real-world viability: Holds up off the happy path. parseCorrectnessResponse (completion-verifier.ts:453) orders strict-parse → truncation-recovery → throw, and readVerdict requires a boolean correct, so {"correct": false, "} recovers (real measurement) while {"correct": } throws (no fabrication) — the core correctness property. recoverTruncatedJson is bounded (64 iterations) and returns null when unrecoverable. The ret
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 1

💰 Value Audit

🟡 Hand-rolled retry + rawSink loop parallels llm-client.ts internals [duplication] ``

The record helper (completion-verifier.ts:494-501) and the per-attempt RawProviderEvent construction (lines 524-569) re-implement recordRaw (llm-client.ts:653-665, same best-effort-swallow semantics) and cryptoEventId (llm-client.ts:675-678, now randomUUID() inline). Neither is exported, so reuse wasn't available, but extracting a shared recordRaw/event-id helper for the TCloud path would prevent a second copy from drifting. Also the retry loop has no backoff (immediate re-call) — fine

What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass	What it asks
Heuristic	Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication	Do added function/class names already exist elsewhere in the repo?
Value Audit	What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit	Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

_{value-audit · 20260702T023244Z}

tangletools · 2026-07-02T03:22:08Z

✅ No Blockers — `5d129947`

Review health 100/100 · Reviewer score 47/100 · Confidence 85/100 · 30 findings (4 medium, 26 low)

	opencode-kimi	glm	deepseek	aggregate
Readiness	47	59	61	47
Confidence	85	85	85	85
Correctness	47	59	61	47
Security	47	59	61	47
Testing	47	59	61	47
Architecture	47	59	61	47

Reviewer score is advisory once the run is complete and the verdict has no blockers.

Full multi-shot audit completed 5/5 planned shots over 5 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 5 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 5 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM Stale file-level doc comment contradicts new completionRate denominator — src/completion-verifier.ts

The header comment still says 'completionRate is satisfied / total', but completionVerdict() now divides by measurable requirements only (line 136) and the CompletionVerdict interface doc at line 102 says 'satisfied / MEASURABLE requirements'. This misleads consumers about how unmeasured rows affect the score. Fix: update line 18 to 'satisfied / measurable requirements'.

🟠 MEDIUM Earliest {/[ heuristic mis-slices prose that contains a bracket before the JSON — src/json-recovery.ts

Line 118-122 pick Math.min(objStart, arrStart), so any [ before the actual JSON object wins. For input [note] {"score":3} the candidate becomes [note] {"score":3}, the strict parse fails, auto-close fails, and the function returns null even though a complete object is present. Reproduction: recoverTruncatedJson('[note] {"score":3}') -> null. This breaks the documented tolerance for prose/markdown and can miss recoverable judge outputs. Fix: prefer the first {; only fall back to [ when no object is recoverable, or validate that the chosen slice can be made to parse.

🟠 MEDIUM Incomplete backslash escape at truncation boundary produces unparseable JSON — src/json-recovery.ts

When a truncated string value ends on a backslash (escaped=true at end-of-loop), the suffix builder appends '"' (line 59) which JSON.parse consumes as the escape target rather than the string terminator. Result: {"a": "val"} still has an open string and fails JSON.parse. Verified: autoCloseTruncatedJson('{"a": "val\') → '{"a": "val\"}' → JSON.parse throws 'Unterminated string in JSON at position 13'. Fix: before appending '"' on line 59, check escaped and prepend 'n' to produce \n (valid escape completion): if (escaped) suffix += 'n' then `suffix +=

🟠 MEDIUM Trailing unescaped backslash in open string breaks recovery — src/json-recovery.ts

When a truncated string ends with a bare backslash, e.g. raw input {"a":"x\\, the loop exits with escaped=true and inString=true. The suffix then appends "} directly after the backslash, yielding {"a":"x\"} where the added quote is escaped by the trailing backslash. JSON.parse fails, so recoverTruncatedJson returns null. Reproduction: autoCloseTruncatedJson('{"a":"x\') -> '{"a":"x"}', JSON.parse throws 'Unterminated string'. This leaves path/regex-like cap-hit output unrecovered and still produces the fabricated-zero outcome the module exists to prevent. Fix: detect and neutralize a trailing unescaped backslash before closing the string (drop it or add an extra escape so the closing quote is literal).

🟡 LOW Best-effort sink failure path is untested — src/completion-verifier.test.ts

The implementation swallows sink.record errors silently (impl lines 494-501: try/catch with empty catch). No test verifies that a throwing sink does not corrupt the verdict. If someone removes the try/catch, a flaky sink would break every eval run and no test would catch the regression. Low severity — this is implementation resilience, not test correctness — but worth a one-line test: sink.record = async () => { throw new Error('disk full') } and assert the checker still returns a verdict.

🟡 LOW Greedy assignment ordering assumption in unmeasured test — src/completion-verifier.test.ts

The test's flaky checker throws on its FIRST call (r1) and succeeds on the second (r2), but the assignment order depends on greedy sort order (r2 scores 0.80 vs r1's 0.67). If token recall scores ever change (e.g., if REQUIREMENT_FORM_STOPWORDS or tokenization changes), the assignment order could swap, causing r2 to be the unmeasured row. The test's first flag assumption (r1 fails, r2 succeeds) is implicitly coupled to implementation details of artifactCandidates. A more robust test would set up requirements with predictable, non-interchangeable scores, or would inspect the unmeasured row's reqId.

🟡 LOW Minor coverage gap: non-default maxAttempts not tested — src/completion-verifier.test.ts

Tests assume the default maxAttempts=2 but do not exercise a custom maxAttempts option. Since the option exists in LlmCorrectnessCheckerOpts, a test for maxAttempts=1 or maxAttempts=3 would harden the contract.

🟡 LOW Minor coverage gap: raw sink parse-error path not exercised — src/completion-verifier.test.ts

The raw-sink test records an 'error' event only when tc.chat() throws. It does not verify that a parse error from parseCorrectnessResponse is also recorded as an error event with the correct errorMessage. The implementation does record it (lines 555-570), so this is a coverage nit, not a bug.

🟡 LOW No test coverage for custom maxAttempts option — src/completion-verifier.test.ts

The new maxAttempts option defaults to 2 but is configurable. Tests only exercise the default path. A test with maxAttempts: 3 would catch off-by-one errors in the retry loop (e.g., attempt < maxAttempts vs attempt <= maxAttempts). Not a correctness bug on the default path, but uncovered configuration surface.

🟡 LOW No test for custom maxAttempts value (only default of 2 exercised) — src/completion-verifier.test.ts

The retry tests (lines 479-512) use the default maxAttempts=2. The LlmCorrectnessCheckerOpts.maxAttempts field (impl line 437) is never tested with a non-default value. If someone changes the default or breaks the maxAttempts override, no test catches it. Minor coverage gap — the production path (default) is well covered. Fix: add one test passing { maxAttempts: 3 } and assert calls === 3 on all-fail.

🟡 LOW Missing baseUrl in forensic provider events — src/completion-verifier.ts

All RawProviderEvents record baseUrl: '' (hardcoded at lines 529, 545, 561). The TCloud instance knows the provider endpoint but its baseUrl is not threaded through to the sink. This means forensic replay of checker calls can't confirm which provider/endpoint served each response. The RawProviderEvent interface documents baseUrl as 'Base URL used for the call' — an empty string loses this signal. Not a correctness bug; just incomplete forensic capture.

🟡 LOW No backoff between checker retry attempts — src/completion-verifier.ts

The retry loop at line 522 sends attempts back-to-back with no delay. For transport errors (503, ECONNRESET), both attempts fire within milliseconds, making the retry unlikely to help with rate-limiting or server-overload scenarios. For parse failures (JudgeParseError), temperature 0 means the model produces the same output on retry, so backoff wouldn't help there. Impact is low because maxAttempts defaults to 2 and parse-failure retries are the dominant failure mode. Add a small exponential backoff (e.g., 100ms * 2^attempt) to give transport errors a chance to resolve.

🟡 LOW Recorded rawSink events carry inaccurate baseUrl/endpoint forensic metadata — src/completion-verifier.ts

Every recorded RawProviderEvent hardcodes endpoint: '/chat' and baseUrl: ''. The RawProviderSink contract (src/trace/raw-provider-sink.ts:46-49) defines endpoint as the real path (e.g. '/v1/chat/completions') and baseUrl as the normalized call URL with no trailing slash. Empty baseUrl violates the contract; '/chat' is a guess. Forensic-only so not load-bearing, but degrades audit value — a reviewer reconstructing a failed call cannot tell which provider/URL actually answered. Either omit the fields (sink treats undefined as unknown) or pull them from the TCloud client when exposed. Same shape repeats on lines 545, 562 for response/error events.

🟡 LOW Retrying on JudgeParseError with temperature=0 is wasted compute — src/completion-verifier.ts

The retry loop treats parse failures and network errors identically — both consume an attempt. With temperature: 0 (line 519) the model is deterministic, so a parse failure on attempt 0 will almost certainly reproduce on attempt 1 because the request is byte-identical. Retries remain useful for transient network/5xx errors. The docstring at line 432-436 acknowledges both consume attempts but doesn't justify why parse-failure retry is worthwhile at temp 0. Either (a) short-circuit parse failures (don't retry), or (b) document the reason

🟡 LOW completionRate can report 1.0 during a partial checker outage — src/completion-verifier.ts

When some requirements are unmeasured and all measurable ones pass, completionRate = satisfiedCount / measurable.length = 1.0 while fullyComplete = false and valid = false. This is the documented design choice (see comments at lines 102-107, 137-138) and the alternative (zero-folding) is exactly the bug class being fixed. Flag for DOWNSTREAM awareness: any consumer that aggregates completionRate (mean across runs, trend lines) rather than gating on fullyComplete/valid will see inflated numbers during partial outages. The unmeasuredCount field is exported precisely to detect this; recommend the README/concepts doc call out the inflation-when-aggregat

🟡 LOW maxAttempts <= 0 throws 'undefined' instead of a clear misuse error — src/completion-verifier.ts

If a caller passes { maxAttempts: 0 } (or negative), the for-loop never executes, lastErr stays undefined, and the function throws new Error(String(undefined)) = Error('undefined'). The default is 2 so this only triggers under misuse, but a one-line guard (if (maxAttempts < 1) throw new Error('maxAttempts must be >= 1')) at construction time would surface the misconfiguration loudly instead of as a cryptic runtime error.

🟡 LOW maxAttempts=0 produces confusing Error('undefined') — src/completion-verifier.ts

The retry loop runs for (let attempt = 0; attempt < maxAttempts; attempt++) with no validation. If a caller passes maxAttempts=0, the loop never executes, lastErr stays undefined, and the function throws new Error(String(undefined)) i.e. Error('undefined'). Default is 2, so this is only a misconfiguration edge case. Fix: either guard maxAttempts > 0 or assert a clear message.

🟡 LOW rawSink failures are silently swallowed — src/completion-verifier.ts

The record() helper catches and swallows all sink errors. The inline comment acknowledges this is intentional ('Forensic capture is best-effort'), but it means a misconfigured sink (bad path, permissions, disk full) silently loses audit data with no log or metric. Consider emitting a warning or surfacing a non-fatal diagnostic so operators notice capture is broken.

🟡 LOW Missing coverage for arrays, nested truncation, and markdown fences — src/json-recovery.test.ts

The tests cover object truncation well but do not exercise array truncation (e.g. [1, 2, ), nested object/array cut-offs, or JSON inside markdown fences. These shapes are reachable from completion-verifier.ts and reflective-mutation.ts callers; adding them would guard against regressions in the recovery loop.

🟡 LOW No test coverage for incomplete-escape truncation pattern — src/json-recovery.test.ts

The escape-awareness test at line 24 covers escape-awareness inside strings but only tests a completed escape sequence ("hi). No test exercises the incomplete-escape case where the input ends on a bare backslash. Adding a test like autoCloseTruncatedJson('{"a": "val\\') would catch the bug described above and serve as a regression gate.

🟡 LOW No test for array-rooted recovery in recoverTruncatedJson — src/json-recovery.test.ts

recoverTruncatedJson searches for both '{' and '[' openers (json-recovery.ts:119) and can recover truncated arrays like [1, 2, 3, but no test exercises this path. The current callers only pass object-rooted input (completion-verifier.ts:472 slices from '{'), so this is an untested public API surface.

🟡 LOW No test for autoCloseTruncatedJson on top-level array input — src/json-recovery.test.ts

The function handles both '{' and '[' structurally (json-recovery.ts:50), but no test verifies closing an unclosed top-level array like '[1, 2' → '[1, 2]'. All existing tests use object-rooted input only.

🟡 LOW No test for literal braces inside strings treated as non-structural — src/json-recovery.test.ts

When input contains literal braces inside string values (e.g. '{"a": "{"}'), the state machine must treat them as non-structural content (inString=true skips the brace push). No test verifies this: a balanced-but-brace-containing input like '{"a": "{"}' would be returned unchanged, but a truncated version like '{"a": "{' could theoretically confuse the depth counter if the implementation had a bug. Current implementation is correct — this is a coverage gap, not a bug.

🟡 LOW No test pins LIFO closure at nesting depth > 2 — src/json-recovery.test.ts

The only nesting test is {"a": [1, 2 → {"a": [1, 2]} (depth 2). Real judge payloads nest deeper (objects-in-arrays-in-objects). The LIFO suffix builder (while stack.length > 0: pop + append closer) is exercised only at depth 2, so a regression that closes in FIFO order or caps at one level would slip. Add a depth-3+ case like {"a": [{"b": [1, "x' → {"a": [{"b": [1, "x"]}]}].

🟡 LOW No test pins array-as-top-level recovery in recoverTruncatedJson — src/json-recovery.test.ts

Every recoverTruncatedJson case uses '{' as the opener. The implementation explicitly handles arrays (text.indexOf('['), Math.min(objStart, arrStart) slicing) but no test exercises e.g. recoverTruncatedJson('[1, 2, {"a": 3') → [1, 2, {a: 3}]. An array-truncation regression (e.g. dropping the array branch in the slice) would pass silently. Add one array-top-level recovery case to lock the branch.

🟡 LOW No test pins behavior when last trailing comma sits inside an array context — src/json-recovery.test.ts

lastCommaOutsideString does not track array-vs-object context — it finds the last comma anywhere outside a string. For input like {"a": [1, 2' the last comma is the one inside the array (after 1), so cutting there yields {"a": [1] which silently drops the 2. This is conservative-recovery behavior worth pinning either way (current behavior: keep prefix, drop partial member) so a future refactor that adds context-awareness doesn't change recovery shape without a test catching it.

🟡 LOW Overly permissive assertion masks exact recovery behavior — src/json-recovery.test.ts

The assertion r === null || !('correct' in r) would still pass if the function returned an empty object {} instead of null for input '{"correct": '. Since the input contains no complete member to recover, the precise contract is null; tighten to expect(r).toBeNull() or use a separate case where a previous complete member exists and assert the dangling key is dropped.

🟡 LOW String/escape walker duplicated between two functions — src/json-recovery.ts

autoCloseTruncatedJson (lines 30-56) and lastCommaOutsideString (lines 84-98) reimplement the same in-string / escaped / quote state-machine. They must stay byte-for-byte consistent or the comma-cut and the auto-close will disagree on what is 'inside a string' -- a subtle drift risk if either is edited alone. Consider extracting a small forEachNonStringChar(s, visit) helper that both consume. Pure maintainability nit; no behavior defect today.

🟡 LOW Truncation on a dangling backslash inside an open string is not recoverable — src/json-recovery.ts

When the cap-hit lands immediately after a single backslash inside an open string (e.g. {"path":"C:\ -- a Windows path, regex, or LaTeX fragment cut mid-escape), the loop exits with escaped=true and inString=true, so line 59 appends """ to close the string. But that appended quote is consumed as part of the dangling escape (" = escaped quote), so JSON.parse still sees the string as open and rejects the candidate. Confirmed empirically with the exact char sequence: autoClose('{"path":"C:') returns '{"path":"C:"}' which JSON.parse rejects, and recoverTruncatedJson returns null because lastCommaOutsideString finds no comma outside a string to cut at. Blast radius is bound

🟡 LOW parseReflectionResponse lacks a dedicated unit test — src/reflective-mutation.ts

Pre-existing gap (not introduced by this PR): no reflective-mutation.test.ts exists, so the truncation-tolerant fallback path at lines 267-279 — the only in-file consumer of autoCloseTruncatedJson — has no direct coverage of its integration with the slice/candidate selection logic above it. The moved helper itself IS now well-tested in json-recovery.test.ts. Suggest adding a parseReflectionResponse test that feeds a cap-hit-truncated {"proposals":[{...}, payload to lock the end-to-end recovery path. No merge blocker.

_{tangletools · 2026-07-02T03:22:05Z · trace}

tangletools

✅ Approved — 30 non-blocking findings — `5d129947`

Full multi-shot audit completed 5/5 planned shots over 5 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 5 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 5 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary

_{tangletools · 2026-07-02T03:22:05Z · immutable trace}

…not-zero

…start slice fallback, doc truth - autoCloseTruncatedJson completes an incomplete trailing escape to a literal backslash before closing the string, so the appended quote is a terminator, not an escape target - recoverTruncatedJson tries each opener position (earliest first) and falls back to the next when a prose bracket poisons the slice - completionRate header comment states the measurable-only denominator - coverage: prose-bracket + trailing-backslash reproductions, custom maxAttempts, throwing raw sink; two pre-existing useLiteralKeys lints

tangletools

✅ Auto-approved drewstone PR — `3713ffaa`

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: drewstone_author · 2026-07-02T04:58:42Z}

drewstone · 2026-07-02T04:58:50Z

All four MEDIUMs fixed (plus the cheap LOWs), latest push:

Stale doc: header now states the measurable-only denominator (satisfied / MEASURABLE requirements).
Earliest-[ mis-slice: recoverTruncatedJson now tries each opener position in order and falls back when a prose bracket poisons the slice — [note] {"score":3} recovers to {score:3} (test added), top-level arrays still recover (test added).
Trailing-backslash truncation (both findings): autoCloseTruncatedJson completes an incomplete trailing escape to a literal backslash before appending the closing quote — {"a": "val\ now recovers (tests added at both layers).
LOWs: custom maxAttempts: 3 all-fail test (asserts 3 calls), throwing-raw-sink test (verdict unaffected), plus two pre-existing useLiteralKeys lint findings in untouched files fixed while passing.

Full suite 2682 passed / 2 skipped; rtk lint + typecheck clean; branch merged with main (was 2 behind).

drewstone added 2 commits July 1, 2026 20:08

feat(json-recovery): shared truncation-tolerant JSON recovery module

3c3abff

fix(judge): record unmeasured instead of fabricated zeros on checker …

5d12994

…failure

tangletools approved these changes Jul 2, 2026

View reviewed changes

tangletools reviewed Jul 2, 2026

View reviewed changes

tangletools previously approved these changes Jul 2, 2026

View reviewed changes

drewstone added 2 commits July 1, 2026 22:56

Merge remote-tracking branch 'origin/main' into fix/judge-unmeasured-…

050da23

…not-zero

drewstone dismissed tangletools’s stale review via 3713ffa July 2, 2026 04:58

tangletools approved these changes Jul 2, 2026

View reviewed changes

drewstone merged commit e69c0f1 into main Jul 2, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(judge): record unmeasured instead of fabricated zeros on checker failure#301

fix(judge): record unmeasured instead of fabricated zeros on checker failure#301
drewstone merged 4 commits into
mainfrom
fix/judge-unmeasured-not-zero

drewstone commented Jul 2, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools commented Jul 2, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

drewstone commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jul 2, 2026

Problem

Change

Verification

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved drewstone PR — 5d129947

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

🟡 Value Audit — sound-with-nits

💰 Value — sound-with-nits

🎯 Usefulness — sound

💰 Value Audit

Uh oh!

tangletools commented Jul 2, 2026

✅ No Blockers — 5d129947

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Approved — 30 non-blocking findings — 5d129947

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved drewstone PR — 3713ffaa

Uh oh!

drewstone commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved drewstone PR — `5d129947`

✅ No Blockers — `5d129947`

✅ Approved — 30 non-blocking findings — `5d129947`

✅ Auto-approved drewstone PR — `3713ffaa`