Skip to content

feat(index): catalog sessions with repo context#13

Merged
drewstone merged 1 commit into
mainfrom
feat/session-index-context
Jul 2, 2026
Merged

feat(index): catalog sessions with repo context#13
drewstone merged 1 commit into
mainfrom
feat/session-index-context

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Summary

  • add a general traces session index JSON with per-session rows and aggregate totals
  • improve repo/cwd resolution from repaired paths, span paths, and explicit tool workdirs
  • include nearby local context files for joins: agent docs, .evolve JSONL files, reflections, and handoffs

Verification

  • pnpm typecheck
  • pnpm test (78/78)
  • pnpm build
  • node dist/cli.js index --harness codex --cwd /home/drew/code/traces --last 1 --out /tmp/traces-session-index-context-smoke.json

@tangletools tangletools left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved drewstone PR — 52011d91

This PR was opened by the trusted drewstone account.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: drewstone_author · 2026-07-02T01:44:08Z

@tangletools tangletools left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Value Audit — sound-with-nits

Verdict sound-with-nits
Concerns 4 (1 low, 3 weak-concern)
Heuristic 0.0s
Duplication 0.1s
Interrogation 360.7s (2 bridge agents)
Total 360.8s

💰 Value — sound-with-nits

Adds a traces index command and SDK that builds a reusable JSON session catalog with aggregate totals and nearby context files, plus stronger repo/cwd resolution heuristics; it's a coherent, useful extension with only minor rough edges. Ship.

  • What it does: Introduces a new index command (src/cli.ts:264-285) and collectSessionIndex / buildSessionIndex SDK surface (src/session-index.ts:293-298, src/session-index.ts:118-146) that scans sessions, builds one JSON catalog with one row per session (harness, session id, cwd, repo labels, time bounds, metrics, models, tools, loop/error signals), aggregate totals, and a sidecar context index of local
  • Goals it achieves: 1) Give operators and downstream tools a general, joinable session inventory rather than only a markdown report or per-row JSONL. 2) Fix incorrect per-repo grouping when harness transcript directories mangle dashes into slashes or omit cwd entirely. 3) Surface how the cwd was recovered so callers can trust or filter rows by provenance. 4) Anchor sessions to local agent docs and .evolve artifacts
  • Assessment: Good change. It reuses the existing scanSessions / buildPolicyEvidenceRecord pipeline instead of inventing new metrics (src/session-index.ts:283-288), keeps the new module focused, and places the cwd/repo inference exactly where the codebase already centralizes that concern (repo.ts). The CLI follows the same pattern as evidence/convert/analyze. The only real cost is added complexity i
  • Better / existing approach: none — this is the right approach. I searched the repo for existing session catalogs or aggregate JSON producers (collectPolicyEvidence, collectSessions, analyzeSpans, renderReport, toRuntimeStore) and none emit a reusable per-session index with totals/context. The repo-resolution heuristics are pragmatic because harness transcripts do not reliably carry structured cwd metadata; centrali
  • Model: opencode/kimi-for-coding/k2p7
  • Bridge attempts: 1

🎯 Usefulness — sound

A well-integrated session-catalog command plus a richer repo resolver that flows through the shared parse funnel to every command; reuses the evidence pattern rather than duplicating it.

  • Integration: Fully reachable. The new index command is dispatched in main() (src/cli.ts:539) and cmdIndex (src/cli.ts:264) reuses the existing collectSessionRows→buildPolicyEvidenceRecord funnel. The richer resolveSessionRepoAttrs is wired into parseSession (src/session-source.ts:22), which is THE shared locate→parse path called by convert, evidence, index, and scanSessions (cli.ts:195,206,233,238; session
  • Fit with existing patterns: Excellent fit. cmdIndex mirrors cmdEvidence exactly (src/cli.ts:243 vs 264): same collectSessionRows filter, same buildPolicyEvidenceRecord-per-row, same --out/stdout dual path. session-index.ts wraps PolicyEvidenceRecord rows into an aggregate catalog with totals + a context index rather than reimplementing metric collection — it delegates to buildPolicyEvidenceRecord (session-index.ts:284) and o
  • Real-world viability: Holds up. The path-scanning resolver is bounded (MAX_SPAN_PATH_CANDIDATES=64, MAX_SPAN_TEXT_CHARS=200_000 at repo.ts), deduplicates candidates by path in a Map, and every fs/git op is wrapped fail-safe (pathStat returns null, readGit try/catch, resolveRepoAttrs never throws). Context walking (collectContextRoot) caps markdown walks at 100 files and only stats named files under .evolve/. The eviden
  • Model: opencode/zai-coding-plan/glm-5.2
  • Bridge attempts: 1

🔎 Heuristic Signals

🟡 Cruft: console debug added src/cli.ts

  • console.log(session index → ${path} (${index.totals.sessions} session rows))

💰 Value Audit

🟡 Repo resolver scans every session's full span text even when cwd is already good [maintenance] ``

resolveSessionRepoAttrs always calls extractAbsolutePaths (repo.ts:266-287) across span names, status messages, and attributes, recursively parsing JSON strings, even when a usable ref-cwd exists. The scan is bounded (200k chars, 64 candidates at repo.ts:199-200), but it adds per-session overhead to analyze/evidence/convert/upload/index. Consider short-circuiting when the recorded cwd already resolves to a git repo.

🟡 Context file discovery partially overlaps adoption's .evolve reading [duplication] ``

session-index.ts walks .evolve/skill-runs.jsonl (session-index.ts:243-246) for file metadata while adoption.ts already reads the same path for skill counts (adoption.ts:111-128). The goals differ (catalog vs. tally), but if more .evolve file kinds are added the two lists could drift. A shared context-root helper would keep them consistent.

🟡 Absolute-path regex uses a hardcoded root allow-list [maintenance] ``

ABSOLUTE_PATH_RE at repo.ts:198 only matches paths under /home, /tmp, /Users, /work, etc. Sessions in other mount points (/data, /project, container bind mounts) won't be inferred and will fall back to the recorded cwd. The list will need ongoing maintenance as new environments appear.


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260702T020637Z

@drewstone drewstone merged commit 8cbe6e5 into main Jul 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants