Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
0e35e5f
feat: agentic benchmark ingest + UI with offload-mode halo
cquil11 Apr 23, 2026
9c43a76
fix: agentic offload variants — render both halos + map renamed fields
cquil11 May 1, 2026
07ba106
fix: render offload halo on every offload-on point, not just frontier
cquil11 May 1, 2026
95e9dc7
fix: strip runner-pool suffix (-p1, -p2, ...) from hw identifier
cquil11 May 1, 2026
982106d
feat: bold scatter labels with concurrency tag + collision avoidance
cquil11 May 1, 2026
9572b95
fix: stack multi-line point labels upward so they don't overlap the p…
cquil11 May 1, 2026
37eecc6
fix: anchor multi-line labels via first tspan + tspan-aware collision…
cquil11 May 1, 2026
f317377
fix: dedupe artifacts by logical name + skip 0-successful agg rows
cquil11 May 1, 2026
52d35ba
Merge remote-tracking branch 'origin/master' into feat/agentx
cquil11 May 1, 2026
c2f66f6
feat: add AIPerf to FRAMEWORK_LABELS
cquil11 May 7, 2026
024797a
fix(changelog): coerce ids to string when filtering changelog by run
cquil11 May 12, 2026
aa15419
feat: default sequence to Agentic Traces when available
cquil11 May 12, 2026
cb4e87c
Merge remote-tracking branch 'origin/master' into feat/agentx
cquil11 May 14, 2026
099a33e
fix(agentic): respect percentile selector for input-throughput x axis
cquil11 May 15, 2026
50a06d1
fix(agentic): default percentile to p99 and drop median option
cquil11 May 15, 2026
25305dc
Merge remote-tracking branch 'origin/master' into feat/agentx
cquil11 May 15, 2026
3c96e91
fix(agentic): keep only p90 as the percentile option
cquil11 May 15, 2026
642081a
fix(agentic): default percentile to p90, surface only p90/p99
functionstackx May 15, 2026
3f45f4d
fix(agentic): drop p99 + median TTFT, p90 only across selectors
functionstackx May 15, 2026
03c775a
fix(agentic): honor e2e TTFT override in agentic mode too
functionstackx May 15, 2026
49f2b27
fix(agentic): default e2e chart x-axis to p90 TTFT
functionstackx May 15, 2026
9e2c532
fix(tooltip): cap data-point numeric values at 3 decimal places
cquil11 May 15, 2026
50ed25f
fix(agentic): relabel x-axis title for natural-x case too
cquil11 May 15, 2026
e9d8e3f
fix(agentic): include percentile word in chart heading
cquil11 May 15, 2026
2046282
fix(agentic): include percentile in e2e chart heading dropdown
cquil11 May 15, 2026
9957f19
feat(agentic): per-point trace_replay storage + detail page POC
cquil11 May 20, 2026
0067bfc
feat(agentic): hover crosshair + expand-to-dialog on detail charts
cquil11 May 21, 2026
1d502ac
feat(inference): one chart with TTFT / E2E / Interactivity x-axis picker
cquil11 May 21, 2026
965c862
fix(inference): TTFT/E2E pick metric by sequence kind + add P75 option
cquil11 May 21, 2026
e4d97f2
feat(metrics): wire P75/P95 through frontend + register new aiperf keys
cquil11 May 21, 2026
a7a1354
fix(inference): don't drop agentic TTFT points over 60s as outliers
cquil11 May 21, 2026
07194de
fix(trace-histograms): chunk DB query + blob-cache to escape size caps
cquil11 May 21, 2026
a1e594b
feat(inference): run selector actually filters chart data
cquil11 May 21, 2026
b0d228a
feat(inference): Session Time + Prefill TPS x-axis (live from trace b…
cquil11 May 21, 2026
8af1f5c
fix(inference): show Mean Normalized Session Time in minutes
functionstackx May 21, 2026
be34e97
fix(inference): use global P90 of per-turn prefill TPS/user
functionstackx May 21, 2026
c774c00
fix(inference): no-data flash on session-time / prefill-tps modes
functionstackx May 21, 2026
d5dbda7
feat(agentic-detail): aggregates-across-configs view
cquil11 May 21, 2026
41ef33b
fix(agentic-aggregates): metric name + stream-parse oversized blobs
cquil11 May 21, 2026
1cedd24
feat(agentic-aggregates): pre-compute stats at ingest time
cquil11 May 21, 2026
9d9c7c1
fix(agentic-aggregates): drop .js extension on app-route-traced import
cquil11 May 21, 2026
6063d01
feat(agentic-detail): pre-compute chart_series at ingest time
cquil11 May 21, 2026
24fe8fe
feat(agentic-detail): per-request Gantt timeline view
cquil11 May 22, 2026
f2618f4
fix(agentic-detail): aggregate vllm metrics across all engine series
cquil11 May 22, 2026
b3e315c
fix(scenario-selector): wrap "Deprecated" in SelectLabel + lead with …
cquil11 May 26, 2026
19b9958
fix(scenario-selector): wrap Deprecated header in SelectLabel only in…
cquil11 May 26, 2026
7114833
feat(agentic-detail): add cumulative input tokens chart
cquil11 May 27, 2026
c6697de
feat(agentic-detail): plot cumulative unique input tokens
cquil11 May 27, 2026
b5679bb
feat(request-timeline): expandable subagent -> stream rows
cquil11 May 27, 2026
2e1f1ce
fix(agentic-detail): make unique-input-tokens chart monotonic
cquil11 May 27, 2026
08bbe66
feat(agentic-detail): add unique input tokens in flight chart
cquil11 May 27, 2026
7561deb
feat(chart-series): extract SGLang metrics alongside vllm
cquil11 May 28, 2026
625d6e8
fix(ingest): derive GPU cache hit rate for SGLang at ingest time
cquil11 May 28, 2026
aa76e9e
feat(chart-series): map sglang:realtime_tokens to promptTokensBySource
cquil11 May 28, 2026
5872a3d
feat(chart-series): break out SGLang cache hits by cache_source
cquil11 May 28, 2026
94a3e8b
feat(chart-series): host cache util line + fix SGLang stacked-area co…
cquil11 May 28, 2026
93e197b
fix(stacked-area): align sources by timestamp before computing shares
cquil11 May 28, 2026
c14e19e
fix(ingest): split GPU vs CPU cache hit rate for SGLang hicache rows
cquil11 May 28, 2026
268617c
fix(ingest): recognize vLLM LMCache external_kv_transfer as CPU hit
cquil11 Jun 3, 2026
7fc6b4f
fix(scatter): use lightweight presence endpoint for View charts button
cquil11 Jun 4, 2026
80468eb
feat(chart-series): per-DP-rank KV cache utilization overlay
cquil11 Jun 4, 2026
3a5ef15
feat(scatter): restrict non-e2e xmodes to e2e-pareto points
cquil11 Jun 4, 2026
5035e17
fix(scatter): keep non-pareto points visible on non-e2e xmodes
cquil11 Jun 4, 2026
2bfea38
fix(scatter): scope e2e-pareto restriction to agentic only
cquil11 Jun 4, 2026
cbeeb69
feat(legend): info tooltip on Optimal Only for agentic non-e2e modes
cquil11 Jun 4, 2026
de5e51a
fix(inference): don't scope chart to one run when runs cover differen…
cquil11 Jun 4, 2026
72e1cbb
Merge remote-tracking branch 'origin/master' into feat/agentx
cquil11 Jun 9, 2026
af8766d
fix(inference): carry forward un-contested configs when a run is sele…
cquil11 Jun 11, 2026
ab5f4f9
fix(agentic): derive unique input tokens from prompt-source breakdown
cquil11 Jun 17, 2026
d6d3143
fix: reconcile agentic data after master merge
cquil11 Jun 17, 2026
f60ef9c
fix(gpu-compare): show concurrency (C=) over points
cquil11 Jun 17, 2026
22028cc
fix(agentic-timeline): hide no-op phase toggle; fixed-height scroll w…
cquil11 Jun 17, 2026
28d25a5
feat(agentic-timeline): sticky bottom h-scroll + double-click to rese…
cquil11 Jun 17, 2026
6e56bbf
fix(gpu-compare): show CPU-offload halo on points
cquil11 Jun 18, 2026
2c06009
fix(high-contrast): use full hue wheel for single-vendor comparisons
cquil11 Jun 18, 2026
68b35b7
Merge remote-tracking branch 'origin/master' into feat/agentx
cquil11 Jun 22, 2026
6275aa7
feat(inference): default line labels off, parallelism labels + high c…
cquil11 Jun 22, 2026
5c290a4
feat(agentic): use the chart's TP/EP/DEP/TEP parallelism labels on si…
cquil11 Jun 22, 2026
32adf6b
feat(agentic): sort dropdown for the sibling point navigator
cquil11 Jun 22, 2026
60c5c2d
feat(datasets): add 011 schema for datasets + dataset_conversations
cquil11 Jun 22, 2026
71e388f
feat(datasets): weka trace structure + cached-prefix builder
cquil11 Jun 22, 2026
9fbc716
feat(datasets): HF cc-traces-weka ingest script
cquil11 Jun 22, 2026
b6be5a8
fix(datasets): handle HF 429 rate-limiting in ingest
cquil11 Jun 22, 2026
a376b5b
feat(datasets): DB queries, API routes, and React Query hooks
cquil11 Jun 22, 2026
574dfcc
feat(datasets): /datasets pages, distribution cards, flamegraph, nav
cquil11 Jun 22, 2026
0c50139
docs(ingest): note the separate agentic-dataset ingest script
cquil11 Jun 22, 2026
2ae6eba
fix(datasets): flamegraph scroll box + dual-scale group bars
cquil11 Jun 22, 2026
c749f8f
feat(datasets): link request timeline to source-dataset conversation
cquil11 Jun 22, 2026
6b700a3
feat(datasets): deep-link request-timeline bar to the exact turn
cquil11 Jun 22, 2026
83fcd04
fix(datasets): visible turn highlight + pointer-tracking flamegraph t…
cquil11 Jun 22, 2026
3c40d31
fix(datasets): deep-link highlight fires on first navigation
cquil11 Jun 22, 2026
e460ea2
fix(high-contrast): stable line colors when deselecting legend items
cquil11 Jun 23, 2026
605bff7
merge origin/master into feat/agentx; resolve quick-filter/category-s…
adibarra Jun 23, 2026
a912eab
chore(security): bump dompurify override to >=3.4.11 (GHSA-cmwh-pvxp-…
adibarra Jun 23, 2026
ba6bc1c
test(e2e): align selector testid with scenario-selector rename; rewri…
adibarra Jun 23, 2026
ada19b5
test(datasets): component tests for distribution card, trace flamegra…
adibarra Jun 23, 2026
1c61ee3
refactor(datasets): extract shared compact() formatter, dedupe 5 loca…
adibarra Jun 23, 2026
e2e5424
refactor(db): squash agentic migrations into 007_agentic.sql so numbe…
adibarra Jun 23, 2026
772dfef
add agentic time-series and dataset timing
cquil11 Jun 23, 2026
13471d7
add dataset percentile distributions
cquil11 Jun 23, 2026
8bfe664
use cumulative percentiles for agentic charts
cquil11 Jun 23, 2026
e3e0bf4
fix(db): build each chart line from a single run, no cross-run/date s…
adibarra Jun 23, 2026
2c3bb6d
Default agentic charts to interactivity
cquil11 Jun 24, 2026
28d007f
feat(datasets): bracket grouping for parallel requests in flamegraph
cquil11 Jun 25, 2026
f7f82d4
fix(datasets): bound flamegraph bracket gutter for high-parallelism t…
cquil11 Jun 25, 2026
95d7f01
fix(db): add endS to TurnNode so flamegraph timing typechecks
github-actions[bot] Jun 26, 2026
5a40444
Merge remote-tracking branch 'origin/master' into feat/agentx
github-actions[bot] Jun 26, 2026
e3a6d41
fix(agentic): enforce slow-tail interactivity (intvty = 1/itl) end-to…
cquil11 Jun 26, 2026
3ab43e6
feat(agentic): agentic-point detail, datasets, and trace-replay metrics
cquil11 Jun 26, 2026
8b243e4
feat(agentic): KV-cache pool ceiling + warmup/profiling phase split
cquil11 Jun 30, 2026
af6bc11
fix(agentic): stable conversation row order + color across timeline p…
cquil11 Jul 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 191 additions & 0 deletions .claude/agents/ingest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
---
name: ingest
description: Ingest a benchmark run from GitHub Actions into the Neon DB used by the feat/agentx deployment. The target DB write URL must be provided in the invocation. Handles standard ingest, delete+reingest, and changelog entries. Invoke when the user asks to ingest a workflow run URL.
tools: Bash, Read, Edit, Write
---

You ingest benchmark runs from `SemiAnalysisAI/InferenceX` GitHub Actions into the Neon branch used by the `feat/agentx` deployment of this dashboard. Operate on `/Users/quilicic/InferenceX-app`.

## Environment

- **Repo root**: `/Users/quilicic/InferenceX-app`
- **DB write URL — MUST be provided by the invoker.** There is no default: the target Neon branch changes over time, and ingesting into the wrong one silently corrupts a live deployment. If the prompt does not include a `postgresql://` write URL, STOP and ask for it before touching anything. Requirements:
- Use the **direct (non-pooled)** host for ingest/migrations — no `-pooler` in the hostname.
- For psql diagnostics you may use the same URL directly: `psql "$DATABASE_WRITE_URL" -c "..."`.
- **Local dev server**: usually `http://localhost:3002` (port 3000 is a different project on this machine — never purge port 3000)
- **Preview URL**: `https://inferencemax-app-git-feat-agentx-semianalysisai.vercel.app`
- **INVALIDATE_SECRET** lives in repo root `.env` under that key.
- **GitHub auth**: `gh auth token` for `gh` calls and the GITHUB_TOKEN env var.

## Standard ingest

```bash
cd /Users/quilicic/InferenceX-app/packages/db
DATABASE_WRITE_URL='<provided direct non-pooled write URL>' \
GITHUB_TOKEN=$(gh auth token) \
pnpm exec tsx src/ingest-ci-run.ts --download <RUN_ID> SemiAnalysisAI/InferenceX
```

Then refresh the materialized view (the script's auto-refresh sometimes races):
`REFRESH MATERIALIZED VIEW latest_benchmarks;`

## Cache purge (always do after any DB mutation)

```bash
SECRET=$(grep "^INVALIDATE_SECRET" /Users/quilicic/InferenceX-app/.env | cut -d= -f2 | tr -d '"')
# Localhost (port 3002, NOT 3000)
curl -s -X POST -H "Authorization: Bearer $SECRET" http://localhost:3002/api/v1/invalidate
# Preview
mkdir -p /tmp/vp && cd /tmp/vp \
&& vercel link --project inferencemax-app --scope semianalysisai --yes >/dev/null 2>&1 \
&& vercel curl /api/v1/invalidate \
--deployment https://inferencemax-app-git-feat-agentx-semianalysisai.vercel.app \
--yes -- -sS -X POST -H "Authorization: Bearer $SECRET"
rm -rf /tmp/vp
```

## Delete + reingest (use only when user explicitly says "delete and reingest" OR when the run supersedes prior data with the same (model, hw, framework, precision))

```sql
BEGIN;
DELETE FROM benchmark_results br USING configs c
WHERE c.id = br.config_id
AND c.model = '<model>' AND c.hardware = '<hw>' AND c.framework = '<framework>'
AND c.precision = '<prec>' AND br.benchmark_type = '<bt>';
DELETE FROM availability
WHERE model = '<model>' AND hardware = '<hw>' AND framework = '<framework>'
AND precision = '<prec>' AND benchmark_type = '<bt>';
COMMIT;
```

If the user says "replace ONLY the points this run produces", scope the DELETE to `AND br.conc IN (...)` so untouched conc levels survive. Don't do this unless asked.

## AIPerf tagging — DO NOT use by default

AIPerf is no longer a separate harness from the user's perspective. **Always** ingest with `spec_method='none'` (the standard path above), regardless of run name. Run names that include the word "aiperf" do NOT mean you should set `spec_decoding='aiperf'` — the user wants those runs to merge into the standard legend entry alongside other runs of the same (model, hw, framework, precision).

Only override this if the user **explicitly** asks for the run to appear as a separate legend line. If they do, the patching procedure is preserved below. Otherwise, use the standard ingest section above and do not touch `spec_decoding`.

<details>
<summary>Explicit-request-only: how to tag a run as `spec_decoding='aiperf'`</summary>

```bash
RID=<run_id>
TMPDIR=$(mktemp -d -t aiperf-$RID-XXXX)
cd $TMPDIR

# 1. Logical-name dedup + download
gh api "repos/SemiAnalysisAI/InferenceX/actions/runs/$RID/artifacts" --paginate \
--jq '.artifacts[] | "\(.name)\t\(.archive_download_url)\t\(.created_at)"' \
| python3 -c "
import sys, re, collections
seen = collections.OrderedDict()
for line in sys.stdin:
name, url, created = line.rstrip('\n').split('\t')
key = re.sub(r'_[a-zA-Z][a-zA-Z0-9.-]*_\d+$', '', name)
if key not in seen or seen[key][2] < created:
seen[key] = (name, url, created)
for _, (name, url, _) in seen.items():
print(f'{name}\t{url}')
" > artifacts.tsv
while IFS=$'\t' read -r name url; do
mkdir -p "$name"
gh api "$url" > "$name/a.zip" 2>/dev/null
unzip -oq "$name/a.zip" -d "$name" 2>/dev/null
rm "$name/a.zip"
done < artifacts.tsv

# 2. Patch every benchmark JSON to set spec_decoding=aiperf
find $TMPDIR -name "*.json" | python3 -c "
import sys, json
for fn in (l.strip() for l in sys.stdin):
try:
with open(fn) as f: d = json.load(f)
except Exception: continue
rows = d if isinstance(d, list) else [d]
if not rows or not isinstance(rows[0], dict): continue
changed = False
for row in rows:
if isinstance(row, dict) and ('scenario_type' in row or 'infmax_model_prefix' in row or 'tput_per_gpu' in row):
row['spec_decoding'] = 'aiperf'
changed = True
if changed:
with open(fn, 'w') as f: json.dump(d if isinstance(d, list) else rows[0], f)
"

# 3. Ingest in CI mode (reads INGEST_* env vars)
cd /Users/quilicic/InferenceX-app/packages/db
INGEST_RUN_ID=$RID INGEST_RUN_ATTEMPT=1 INGEST_ARTIFACTS_PATH=$TMPDIR INGEST_REPO=SemiAnalysisAI/InferenceX \
DATABASE_WRITE_URL='<provided direct non-pooled write URL>' \
GITHUB_TOKEN=$(gh auth token) \
pnpm exec tsx src/ingest-ci-run.ts
rm -rf $TMPDIR
```

The `spec_method` column has a lowercase check constraint — always lowercase.

</details>

## Don't auto-mention "AIPerf" in changelog entries

Changelog descriptions used to include "AIPerf harness" wording. Don't add this anymore — the user considers AIPerf the standard harness now. A run named "e2e Test - kimi aiperf w/ live assistant" should become a changelog entry like `B200 Kimi Ingest #N (live assistant)`, not `... (AIPerf harness, live assistant)`.

## Adding a perf changelog entry

Run AFTER ingest. The popover filters by `config_keys[].split('-')[1] === selected_precision` and drops entries with empty `config_keys`, so you MUST provide at least one config_key in the format `<model>-<precision>-<hw>-<framework>` (matches what the user actually sees in the filter chain).

```sql
INSERT INTO changelog_entries (workflow_run_id, date, base_ref, head_ref, config_keys, description, pr_link)
SELECT id, date, '', '', ARRAY['<model>-<precision>-<hw>-<framework>'], '<description>', NULL
FROM latest_workflow_runs WHERE github_run_id = <RUN_ID>
RETURNING id, workflow_run_id, date::text, description;
```

Description convention from prior entries: `<HW upper> <Model> Ingest #<N> (<note>)` — e.g.

- `B200 Kimi Ingest #1`
- `MI355X Kimi Ingest #2`
- `H200 Kimi Ingest #1 (mmap cache)`

If user doesn't specify a description, ask for one OR derive from the run name.

## Common gotchas

- **`conclusion IS NULL` filter**: availability hides runs whose `latest_workflow_runs.conclusion` is null (still in_progress). If a user wants in-progress data shown, you can `UPDATE workflow_runs SET conclusion='success', status='completed' WHERE id = <wr_id>` then `REFRESH MATERIALIZED VIEW latest_benchmarks`.
- **failed_run filter**: rows where `num_requests_successful === 0 AND num_requests_total > 0` get skipped on purpose — they have null metrics and would overwrite good rows via ON CONFLICT.
- **Aggregated `results_bmk` artifact** contains rows from all runner attempts merged together — pair the artifact-level logical-name dedup with the row-level failed-run skip to avoid empty-row overwrites.
- **Multi-attempt artifacts**: a single GitHub run can spill across runners (`h200-cw_00` + `h200-dgxc-slurm_1`); the logical-name dedup strips the `_<runner>_<attempt>` suffix.
- **Materialized view dedup tiebreaker**: `latest_benchmarks` picks rows by `date DESC, wr.run_started_at DESC`. Backfilling old data may not surface unless dates align with the user's date picker selection.
- **Date alignment for partial runs**: when a re-run only covers a subset of concs (`replace ONLY the points this run produces`), align dates with prior full sweep via `UPDATE benchmark_results.date = '<full-sweep-date>'` so the frontend's max-date-per-group dedup doesn't drop the older sweep.
- **Agentic interactivity normalization (`*_intvty`)**: for `agentic_traces` runs, interactivity MUST be the slow-tail reciprocal of the ITL percentile — `*_intvty = 1/*_itl` (so `p90_intvty = 1/p90_itl`). Some harness versions emit `*_intvty` as `p(1/ITL)` instead (fast-tail — inverts percentile order, e.g. p90 shows ~`1/p10(ITL)`), which silently contaminates cross-run Pareto comparisons. The ingest mapper (`benchmark-mapper.ts`) now **derives `*_intvty` from `*_itl` and discards the artifact's value** for agentic rows, so a normal ingest is self-correcting — no manual step needed. The frontend `agenticAliases` does the same for overlay / `?unofficialrun=` rows. If you ever load agentic data through a path that bypasses the mapper, run `pnpm --filter @semianalysisai/inferencex-db db:backfill-agentic-intvty --yes` (idempotent; rewrites `mean/p75/p90/p95 _intvty = 1/_itl`) then refresh the MV + purge cache. `std_intvty` is intentionally left alone (the reciprocal of a std is meaningless; the API strips it anyway).

## Process

1. **Always start by checking the run** with `gh api repos/SemiAnalysisAI/InferenceX/actions/runs/<RID> --jq '{name, status, conclusion}'`. Note the model/hw/precision from the name. If `status != "completed"`, ask the user if they want to ingest in-progress data (will likely have failed_run skips).
2. **Check the DB** for any pre-existing rows for this run or the same (model, hw, framework, precision) combo if the user mentioned superseding.
3. **Ingest** via the standard path. Do NOT use AIPerf tagging unless the user explicitly asks for a separate legend line.
4. **Refresh materialized view**.
5. **Add changelog entry** if the user asked or if the run is a "marker" worth surfacing.
6. **Purge both caches** (localhost 3002 + preview).
7. **Report** the row count, date, hardware, run id, and changelog id (if added).

## Related: ingesting agentic _datasets_ (not benchmark runs)

This agent ingests **benchmark runs**. The HF agentic trace **datasets** (`semianalysisai/cc-traces-weka-*`) that the agentic benchmark replays are ingested by a separate script, not this flow:

```bash
cd packages/db && DATABASE_WRITE_URL='<direct write url>' \
pnpm exec tsx src/ingest-weka-dataset.ts <hf-dataset-id> \
[--label "…"] [--variant full|256k] [--description "…"] [--limit N]
```

It populates the `datasets` + `dataset_conversations` tables (migration `007_agentic.sql`) that back the `/datasets` pages — upsert/replace per dataset, then purge the API cache like any other ingest. Same write-URL rule applies (direct, non-pooled, provided by the invoker).

New agentic benchmark artifacts preserve AIPerf's `metadata.dataset` provenance as a top-level `dataset` object. Standard benchmark ingest automatically derives the dataset slug from `dataset.hf_dataset_name` and upserts `run_datasets`; do not manually backfill that mapping for new-format runs. Manual mapping is only needed for legacy artifacts that do not contain dataset provenance.

## Don't

- Don't push to git unless the user asked.
- Don't ingest without permission if it's a delete+reingest of existing data.
- Don't hit port 3000 for cache purge — it's a different project.
- Don't capitalize `spec_method` values (DB has a lowercase check constraint).
3 changes: 3 additions & 0 deletions .eslintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Stale agent worktrees produced by parallel Claude Code sessions — they
# hold their own branches and are linted as part of their own runs.
.claude/worktrees/
1 change: 1 addition & 0 deletions .oxlintrc.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
"no-undef": "off",
"no-underscore-dangle": "off",
"no-useless-undefined": "off",
"require-unicode-regexp": "off",
"no-warning-comments": "off",
"prefer-destructuring": "off",
"sort-imports": "off",
Expand Down
12 changes: 12 additions & 0 deletions docs/data-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,18 @@ Configs are preloaded into an in-memory Map at ingest start. `getOrCreateConfig(

Unmapped models/hardware are tracked (not silently dropped) so operators can see what new GPU or model names appeared in CI artifacts. This is how new GPUs get added to the system — the skip tracker acts as a change detection mechanism.

### Server-Metric Orchestrator Adapters

AIPerf defines the `server_metrics_export.json` envelope, but labels such as worker role and rank belong to the serving orchestrator. The chart-series ETL therefore normalizes raw series through an orchestrator-specific adapter before exposing per-worker metrics. For example, the Dynamo adapter maps `dynamo_component=prefill|backend` to canonical `prefill|decode` roles and uses the endpoint, worker ID, DP rank, and engine together as the source identity.

Adapters are selected from the benchmark's canonical framework, and per-worker series are only emitted for disaggregated configs with a recognized adapter. Unknown orchestrators and non-disaggregated configs retain their aggregate-only series; roles are never guessed from ports or metric names. The frontend only consumes the canonical source identity and never interprets orchestrator-native labels.

### Agentic Dataset Provenance

AIPerf exports public-dataset provenance in `metadata.dataset`, including the Hugging Face dataset ID. InferenceX preserves that object as `dataset` on each agentic aggregate benchmark row. During benchmark ingest, `ingest-ci-run.ts` derives the dashboard slug from `hf_dataset_name` (for example, `semianalysisai/cc-traces-weka-062126` becomes `cc-traces-weka-062126`) and upserts `run_datasets` for the workflow run.

Legacy artifacts without provenance leave any existing mapping untouched. A workflow run can map to only one dataset; conflicting dataset IDs fail ingest rather than silently linking the run to an arbitrary dataset.

## Frontend Transform Pipeline

### Why transformBenchmarkRows Exists
Expand Down
Loading