[FE / Feat] Add evaluators to existing evals by ardaerzin · Pull Request #4577 · Agenta-AI/agenta

ardaerzin · 2026-06-08T10:20:03Z

Summary

Testing

Verified locally

Added or updated tests

QA follow-up

Demo

Checklist

I have included a video or screen recording for UI changes, or marked Demo as N/A
Relevant tests pass locally
Relevant linting and formatting pass locally
I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

- Shared 'Edit evaluation' drawer (name/description + evaluators) opened from a run-header actions dropdown (all tabs), the config General 'Edit' button, and the evaluations-table row action; the config General section is now display-only. - Jotai mutation flow (editSimpleEvaluation + process slice) with a terminal-gated background refresh so the evaluations list and the run scenarios table converge reliably (columns, metric cells, status) after an edit. - Resolve evaluator output metrics for staged (pending) evaluators in the drawer. - Dark mode fixes: drawer edge shadow, entity-picker hover/selected highlight, and the cascader child-panel loading/loaded width jump.

dispatch_run_slice re-activates the run (status=RUNNING, is_active=True) before dispatching the worker, so the status indicator reflects the reprocess; _finalize_run_after_slice floors it back to terminal when scoring completes. Adds an acceptance probe for the edit+process path.

Link ids recovered from stored result cells on the re-run/process path arrive as dashed UUIDs (live spans send bare hex); both encode the same integer. Strip dashes before base-16 parsing so add_link no longer raises ValueError on the hyphens.

vercel · 2026-06-08T10:20:10Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 12, 2026 6:14pm

coderabbitai · 2026-06-08T10:20:12Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b393bac2-48ba-4f65-a12e-756f3658ea24

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fe-feat/add-evaluators-to-existing-eval

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Drops the exploratory acceptance probe added alongside the run-status change; it was a proof-of-contract probe, not a maintained test.

…-existing-eval

… slice Mirrors the run-level re-activation at the scenario level so per-scenario status indicators also reflect the reprocess; dispatch_run_slice now bulk-sets the addressed scenarios to RUNNING/is_active before dispatch (full-PUT edit preserves flags/interval/ timestamp/meta), and the engine writes each scenario's terminal status back on completion.

…ding an evaluator The post-edit background refresh now (1) matches any query scoped to the run id (reload-equivalent — covers the scenario rows+status query the old allowlist missed), (2) detects run completion authoritatively via the run batcher instead of getQueryData, and (3) invalidates twice (now + a short settle) so cell results that persist just after the run status flips terminal aren't left frozen by the per-scenario poller.

…-existing-eval

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (5)

web/packages/agenta-ui/src/components/selection/ListItem.tsx (1)

134-137: ⚡ Quick win

Condense this explanatory block to one terse “why” line.

This comment is longer than the repo standard and partially restates behavior; keep only a short invariant-focused rationale.

As per coding guidelines, “Keep AI-generated in-code comments minimal; comment only the non-obvious why … Prefer one terse line over multi-line blocks.”

Source: Coding guidelines
web/oss/src/components/EvalRunDetails/components/Page.tsx (1)
54-57: ⚡ Quick win

Shorten comment to focus on the non-obvious constraint.

The comment lists trigger locations (header dropdown, config buttons), which describes what uses the drawer rather than why this specific wiring is needed. Per coding guidelines, keep comments minimal and focus on non-obvious invariants.
♻️ Suggested revision
-    // Shared "Edit evaluation" drawer — opened by the header dropdown (all tabs), the
-    // config General Edit button, and the Add-evaluator button.
+    // Shared drawer; multiple triggers set this atom to open.
     const editDrawerRunId = useAtomValue(editEvaluationDrawerRunIdAtom)
     const setEditDrawerRunId = useSetAtom(editEvaluationDrawerRunIdAtom)
Source: Coding guidelines
web/oss/src/components/EvalRunDetails/components/RunActionsDropdown.tsx (1)
11-17: ⚡ Quick win

Condense comment to the essential invariant.

The block comment narrates what the component renders, where it lives, and includes a speculative note about future extensibility. Per coding guidelines, keep comments minimal and avoid TODO-style prose about future features. The key constraint here is that the dropdown lives in the page header to stay accessible across tabs.
♻️ Suggested revision
-/**
- * Actions dropdown rendered next to the run name in the run-details header. Lives in the
- * page header so "Edit evaluation" is reachable from every tab (Overview / Scenarios /
- * Configuration), mirroring the testset page's name-adjacent actions menu. Opens the
- * shared Edit drawer via `editEvaluationDrawerRunIdAtom`. Built as a menu so more run
- * actions (duplicate, delete, …) can slot in later.
- */
+// In header to keep Edit action reachable across all tabs.
 const RunActionsDropdown = ({runId}: {runId: string}) => {
Source: Coding guidelines
web/oss/src/components/EvalRunDetails/components/views/ConfigurationView/components/GeneralSection.tsx (1)
46-50: ⚡ Quick win

Remove or condense the refactor-explanation comment.

The comment documents the historical transition from inline editing to drawer-based editing, which is more suited for a PR description than inline code. Per coding guidelines, keep comments minimal and focus on non-obvious constraints in the current code.
♻️ Suggested revision
-/**
- * Read-only view of the run's metadata. Editing is no longer inline — it goes through the
- * shared "Edit evaluation" drawer (consistent with the header actions dropdown and the
- * Add-evaluator button), opened via the Edit trigger here.
- */
 const GeneralSection = ({runId, showActions = true, showHeader = true}: GeneralSectionProps) => {
If context is needed, a single line suffices:
+// Read-only; editing opens the shared drawer.
 const GeneralSection = ({runId, showActions = true, showHeader = true}: GeneralSectionProps) => {
Source: Coding guidelines
web/oss/src/components/EvaluationRunsTablePOC/hooks/useEvaluationRunsColumns/constants.tsx (1)

69-73: ⚡ Quick win

Condense the in-code explanation to a single terse invariant.

The multi-line block is clear but too verbose for this codebase standard; keep just one short “why” line and avoid narrating behavior details.

As per coding guidelines, "Keep AI-generated in-code comments minimal; comment only the non-obvious why ... Prefer one terse line over multi-line blocks."

Source: Coding guidelines

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: fc59aeb5-7ce7-489d-a4f6-66e27b639cb4

📥 Commits

Reviewing files that changed from the base of the PR and between 0fa46dd and 2961145.

📒 Files selected for processing (19)

api/oss/src/core/evaluations/service.py
sdks/python/agenta/sdk/engines/tracing/processors.py
web/oss/src/components/EditEvaluationDrawer/index.tsx
web/oss/src/components/EvalRunDetails/atoms/mutations/editEvaluation.ts
web/oss/src/components/EvalRunDetails/atoms/table/evaluators.ts
web/oss/src/components/EvalRunDetails/components/Page.tsx
web/oss/src/components/EvalRunDetails/components/RunActionsDropdown.tsx
web/oss/src/components/EvalRunDetails/components/views/ConfigurationView/components/EvaluatorSection.tsx
web/oss/src/components/EvalRunDetails/components/views/ConfigurationView/components/GeneralSection.tsx
web/oss/src/components/EvalRunDetails/state/editDrawer.ts
web/oss/src/components/EvaluationRunsTablePOC/components/EvaluationRunsTable/index.tsx
web/oss/src/components/EvaluationRunsTablePOC/components/cells/ActionsCell/index.tsx
web/oss/src/components/EvaluationRunsTablePOC/hooks/useEvaluationRunsColumns/constants.tsx
web/oss/src/components/EvaluationRunsTablePOC/hooks/useEvaluationRunsColumns/index.tsx
web/oss/src/components/EvaluationRunsTablePOC/hooks/useEvaluationRunsColumns/types.ts
web/oss/src/components/Layout/ThemeContextProvider.tsx
web/oss/src/services/evaluations/runShape/api.ts
web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx
web/packages/agenta-ui/src/components/selection/ListItem.tsx

coderabbitai · 2026-06-12T10:49:59Z

+        scenarios = await self.evaluations_service.query_scenarios(
+            project_id=project_id,
+            scenario=EvaluationScenarioQuery(run_id=run_id, ids=scenario_ids),
+            windowing=Windowing(limit=10_000),
+        )
+        if scenarios:
+            await self.evaluations_service.edit_scenarios(
+                project_id=project_id,
+                user_id=user_id,
+                scenarios=[
+                    EvaluationScenarioEdit(
+                        id=scenario.id,
+                        flags=(
+                            scenario.flags.model_copy(update={"is_active": True})
+                            if scenario.flags
+                            else EvaluationRunFlags(is_active=True)
+                        ),
+                        status=EvaluationStatus.RUNNING,
+                        interval=scenario.interval,
+                        timestamp=scenario.timestamp,
+                        meta=scenario.meta,
+                    )
+                    for scenario in scenarios
+                ],
+            )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Paginate the scenario re-activation query.

query_scenarios(..., windowing=Windowing(limit=10_000)) only updates the first page. On large runs, or when scenario_ids itself is longer than 10k, the rerun still targets all addressed scenarios via process_rerun(...), but only a subset gets flipped back to RUNNING/is_active=True. That leaves scenario-level status inconsistent during the reprocess.

Use fetch_scenarios(...) when scenario_ids is already known, and paginate when reactivating the full run slice.

coderabbitai · 2026-06-12T10:49:59Z

+def _as_otel_id(value) -> int:
+    # Link ids arrive as bare hex (live spans) or as dashed UUIDs (recovered from
+    # stored result cells on the re-run/process path). Both encode the same
+    # integer, so strip dashes before parsing base-16.
+    return int(str(value).replace("-", ""), 16)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In OpenTelemetry Python, what are the valid numeric ranges for SpanContext.trace_id and SpanContext.span_id, and does SpanContext raise if span_id exceeds 64 bits?

💡 Result:

In OpenTelemetry Python, the valid numeric ranges for SpanContext are determined by the W3C TraceContext specification, which defines TraceId as 16 bytes and SpanId as 8 bytes [1][2]. For SpanContext.span_id, the valid range is 1 to 2^64 - 1 (0x0 to 0xffffffffffffffff, excluding 0) [3][4][5]. For SpanContext.trace_id, the valid range is 1 to 2^128 - 1 (0x0 to 0xffffffffffffffffffffffffffffffff, excluding 0) [3][4][5]. SpanContext does not raise an exception if the provided span_id (or trace_id) exceeds these limits [3][4][6]. Instead, the SpanContext object is successfully created, but its is_valid property will be set to False [4][6][5]. The OpenTelemetry Python implementation explicitly checks these boundaries during initialization to determine the validity of the context [4][6], and test cases confirm that values exceeding the maximum or being less than or equal to zero result in an is_valid status of False [5].

Citations:

1: https://opentelemetry.io/docs/specs/otel/trace/api/

2: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.43.0/specification/trace/api.md

3: Fix validity calculation for trace/span ID open-telemetry/opentelemetry-python#2145

4: https://opentelemetry-python.readthedocs.io/en/stable/_modules/opentelemetry/trace/span.html

5: https://github.com/open-telemetry/opentelemetry-python/blob/eed100c3/opentelemetry-api/tests/trace/test_span_context.py

6: https://opentelemetry-python.readthedocs.io/en/latest/_modules/opentelemetry/trace/span.html

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show exact current code around the helper definition and its use sites python - <<'PY' import pathlib, itertools path = pathlib.Path("sdks/python/agenta/sdk/engines/tracing/processors.py") lines = path.read_text().splitlines() for start, end in [(1,80),(100,180),(180,260)]: print(f"\n--- {path} lines {start}-{end} ---") for i in range(start-1, min(end, len(lines))): print(f"{i+1:4d}: {lines[i]}") PY # Find all usages of _as_otel_id in the repo (quick scope) rg -n "_as_otel_id\(" sdks/python/agenta/sdk/engines/tracing/processors.py sdks/python/agenta/sdk -S || true # Run ruff check on the touched file (no auto-fix) ruff check sdks/python/agenta/sdk/engines/tracing/processors.py || true

Repository: Agenta-AI/agenta

Length of output: 10251

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect Link model/type hints for f in sdks/python/agenta/sdk/models/shared.py sdks/python/agenta/sdk/models/tracing.py; do if [ -f "$f" ]; then echo "=== $f (relevant sections) ===" rg -n "class Link|Link\\b|trace_id|span_id" "$f" | head -n 80 echo fi done # Find where context.links is built/populated rg -n "context\\.links|links\\s*=\\s*|\"trace_id\"\\s*[:]" sdks/python/agenta/sdk | head -n 120

Repository: Agenta-AI/agenta

Length of output: 2461

🏁 Script executed:

#!/bin/bash set -euo pipefail # Print the relevant model definitions for TraceID/SpanID/Link and OTelLink sed -n '1,220p' sdks/python/agenta/sdk/models/shared.py | cat -n | head -n 220 echo "----" sed -n '130,260p' sdks/python/agenta/sdk/models/tracing.py | cat -n | head -n 160 # Inspect how tracing_ctx.links is assembled (formatting assumptions) sed -n '250,470p' sdks/python/agenta/sdk/decorators/running.py | cat -n | sed -n '250,470p' || true echo "----" sed -n '300,520p' sdks/python/agenta/sdk/decorators/tracing.py | cat -n | sed -n '300,520p' || true # Inspect the link creation in evaluation runtime processor (if used for re-run/process links) sed -n '620,760p' sdks/python/agenta/sdk/evaluations/runtime/processor.py | cat -n | sed -n '620,760p' || true

Repository: Agenta-AI/agenta

Length of output: 10756

🏁 Script executed:

#!/bin/bash set -euo pipefail # Where tracing_ctx.links is assigned rg -n "tracing_ctx\\.links\\s*=|self\\.links\\s*=|links=request\\.links|upstream\\.get\$\"links\"\$" sdks/python/agenta/sdk/decorators/running.py sdks/python/agenta/sdk/evaluations/runtime/processor.py sdks/python/agenta/sdk/decorators/tracing.py # Show the relevant blocks around those assignments sed -n '300,460p' sdks/python/agenta/sdk/decorators/running.py | cat -n echo "----" sed -n '380,520p' sdks/python/agenta/sdk/decorators/running.py | cat -n echo "----" sed -n '620,760p' sdks/python/agenta/sdk/evaluations/runtime/processor.py | cat -n echo "----" sed -n '680,740p' sdks/python/agenta/sdk/evaluations/runtime/processor.py | cat -n echo "----" sed -n '300,420p' sdks/python/agenta/sdk/decorators/tracing.py | cat -n

Repository: Agenta-AI/agenta

Length of output: 25262

🏁 Script executed:

#!/bin/bash set -euo pipefail python - <<'PY' import inspect from opentelemetry.trace import SpanContext from opentelemetry.sdk.trace import Span print("SpanContext module:", SpanContext.__module__) print("SpanContext qualname:", SpanContext.__qualname__) print("\nSpanContext.__init__ source (truncated):") src = inspect.getsource(SpanContext.__init__) print("\n".join(src.splitlines()[:120])) print("\n---") print("Span.add_link module:", Span.add_link.__module__) print("Span.add_link qualname:", Span.add_link.__qualname__) print("\nSpan.add_link source (truncated):") src2 = inspect.getsource(Span.add_link) print("\n".join(src2.splitlines()[:160])) PY

Repository: Agenta-AI/agenta

Length of output: 1078

🏁 Script executed:

#!/bin/bash set -euo pipefail python - <<'PY' import opentelemetry.trace.span as span_mod import opentelemetry.sdk.trace as sdk_trace import inspect from opentelemetry.trace import SpanContext from opentelemetry.sdk.trace import Span print("span_mod file:", span_mod.__file__) print("SpanContext:", SpanContext, "module:", SpanContext.__module__) print("Span.add_link module:", Span.add_link.__module__, "qualname:", Span.add_link.__qualname__) PY # Show relevant portions of SpanContext implementation from installed opentelemetry SPAN_MOD_FILE=$(python - <<'PY' import opentelemetry.trace.span as span_mod print(span_mod.__file__) PY ) echo "---- SpanContext implementation snippets from: $SPAN_MOD_FILE ----" python - <<'PY' import re, pathlib path = pathlib.Path(open(__import__("opentelemetry.trace.span").trace.span.__file__).name) PY

Repository: Agenta-AI/agenta

Length of output: 426

🏁 Script executed:

#!/bin/bash set -euo pipefail SPAN_PY="/usr/local/lib/python3.11/dist-packages/opentelemetry/trace/span.py" echo "Using: $SPAN_PY" ls -l "$SPAN_PY" # Print SpanContext class definition area and ID validation helpers python - <<PY import pathlib, re path = pathlib.Path("$SPAN_PY") txt = path.read_text() # Find SpanContext class start m = re.search(r"class SpanContext\\b", txt) print("SpanContext class start idx:", m.start() if m else None) # Print first ~300 lines from SpanContext onwards (bounded by file size) lines = txt.splitlines() start = m and txt[:m.start()].count("\\n") or 0 for i in range(start, min(start+320, len(lines))): print(f"{i+1:5d}: {lines[i]}") PY echo "----" # Extract Span.add_link implementation python - <<'PY' import pathlib, re import opentelemetry.sdk.trace as sdk_trace import opentelemetry.sdk.trace from opentelemetry.sdk.trace import Span import inspect # We'll locate span.py inside sdk trace package import opentelemetry.sdk.trace as st import opentelemetry import pkgutil import os, sys import opentelemetry.sdk.trace import opentelemetry.sdk.trace as sdk_trace # Locate file containing Span.add_link import opentelemetry.sdk.trace as sdk_trace_mod import inspect span_file = None for name in dir(sdk_trace_mod): pass # Directly import the module where Span class is defined # Span is in opentelemetry/sdk/trace/__init__.py which re-exports; find actual class import opentelemetry.sdk.trace as sdk span_cls = Span print("Span class module:", span_cls.__module__) import importlib mod = importlib.import_module(span_cls.__module__) print("Span class file:", mod.__file__) PY SDK_SPAN_FILE="$(python - <<'PY' import importlib from opentelemetry.sdk.trace import Span mod = importlib.import_module(Span.__module__) print(mod.__file__) PY )" echo "Using: $SDK_SPAN_FILE" # Grep around add_link in sdk span file rg -n "def add_link\\b" "$SDK_SPAN_FILE" || true if rg -n "def add_link\\b" "$SDK_SPAN_FILE" >/dev/null; then LINE=$(rg -n "def add_link\\b" "$SDK_SPAN_FILE" | head -n 1 | cut -d: -f1) START=$((LINE-30)) END=$((LINE+180)) sed -n "${START},${END}p" "$SDK_SPAN_FILE" | cat -n fi

Repository: Agenta-AI/agenta

Length of output: 15538

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Find OpenTelemetry's link validity helper SDK_INIT="/usr/local/lib/python3.11/dist-packages/opentelemetry/sdk/trace/__init__.py" ls -l "$SDK_INIT" rg -n "def _is_valid_link|_is_valid_link\\(" "$SDK_INIT" || true if rg -n "def _is_valid_link" "$SDK_INIT" >/dev/null; then LINE=$(rg -n "def _is_valid_link" "$SDK_INIT" | head -n1 | cut -d: -f1) START=$((LINE-30)); END=$((LINE+120)) sed -n "${START},${END}p" "$SDK_INIT" | cat -n fi # 2) Inspect TracingContext model/usage of `link` vs `links` rg -n "class TracingContext|context\\.links|context\\.link\\b" sdks/python/agenta/sdk/contexts sdks/python/agenta/sdk -S | head -n 80 # 3) Read TracingContext definition CTX_FILE=$(fd -a "tracing.py" sdks/python/agenta/sdk/contexts | head -n 5 | head -n 1 || true) echo "CTX_FILE=$CTX_FILE" if [ -n "$CTX_FILE" ] && [ -f "$CTX_FILE" ]; then sed -n '1,220p' "$CTX_FILE" | cat -n fi # 4) Check how `context.link` (singular) is converted into `context.links` (plural), if at all rg -n "context\\.link\\b|\\.link\\b\\s*=" sdks/python/agenta/sdk | head -n 120

Repository: Agenta-AI/agenta

Length of output: 9323

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find where runtime processor remembers context and how trace_id/span_id are sourced rg -n "_remember_context\\(" sdks/python/agenta/sdk/evaluations/runtime/processor.py # Inspect around those call sites # (grab a generous range around each match) python - <<'PY' import re, pathlib path = pathlib.Path("sdks/python/agenta/sdk/evaluations/runtime/processor.py") lines = path.read_text().splitlines() for i,l in enumerate(lines, start=1): if "_remember_context(" in l: start=max(1,i-30); end=min(len(lines), i+60) print(f"\n--- {path}:{start}-{end} around line {i} ---") for j in range(start,end+1): print(f"{j:5d}: {lines[j-1]}") PY echo "----" # Locate WorkflowInvokeRequest definition to see links typing/shape rg -n "class WorkflowInvokeRequest|WorkflowInvokeRequest\\b" sdks/python/agenta/sdk/decorators/running.py # Show definition block python - <<'PY' import pathlib, re path = pathlib.Path("sdks/python/agenta/sdk/decorators/running.py") txt = path.read_text().splitlines() for i,l in enumerate(txt, start=1): if "class WorkflowInvokeRequest" in l: start=i; end=min(len(txt), i+180) for j in range(start,end+1): print(f"{j:5d}: {txt[j-1]}") break PY echo "----" # Show links field types in WorkflowInvokeRequest (if present) rg -n "links\\s*:" sdks/python/agenta/sdk/decorators/running.py | head -n 80

Repository: Agenta-AI/agenta

Length of output: 8978

🏁 Script executed:

#!/bin/bash set -euo pipefail RUNNING="sdks/python/agenta/sdk/decorators/running.py" # Print around WorkflowInvokeRequest definition including links typing python - <<'PY' import pathlib path = pathlib.Path("sdks/python/agenta/sdk/decorators/running.py") lines = path.read_text().splitlines() for i,l in enumerate(lines, start=1): if "class WorkflowInvokeRequest" in l: start=i-10 end=i+160 for j in range(start,end+1): if 1 <= j <= len(lines): print(f"{j:5d}: {lines[j-1]}") break PY # Print Link import section / Link type alias in running.py sed -n '1,120p' "$RUNNING" | cat -n | head -n 120 # Ensure we know what Link refers to: search for "from ... import Link" and "Link =" occurrences rg -n "from .* import .*Link\\b|\\bLink\\b\\s*=\\s*|WorkflowInvokeRequest\\b" "$RUNNING" | head -n 80

Repository: Agenta-AI/agenta

Length of output: 5023

🏁 Script executed:

#!/bin/bash set -euo pipefail WORKF="sdks/python/agenta/sdk/models/workflows.py" ls -l "$WORKF" 2>/dev/null || true rg -n "class WorkflowInvokeRequest\\b|WorkflowInvokeRequest\\b" "$WORKF" | head -n 40 rg -n "\\bclass Link\\b|\\bLink\\b\\s*=" "$WORKF" | head -n 80 # Print around WorkflowInvokeRequest definition python - <<'PY' import pathlib path = pathlib.Path("sdks/python/agenta/sdk/models/workflows.py") lines = path.read_text().splitlines() idx = None for i,l in enumerate(lines): if "class WorkflowInvokeRequest" in l: idx = i+1 break if idx: start=max(1,idx-10); end=min(len(lines), idx+220) for j in range(start,end+1): print(f"{j:5d}: {lines[j-1]}") else: print("WorkflowInvokeRequest not found") PY echo "----" # Print around Link-related type imports/aliases in workflows.py python - <<'PY' import pathlib, re path = pathlib.Path("sdks/python/agenta/sdk/models/workflows.py") txt = path.read_text().splitlines() for i,l in enumerate(txt, start=1): if "Link" in l and ("from " in l or "import" in l or "=" in l): if i<200: print(f"{i:5d}: {l}") PY

Repository: Agenta-AI/agenta

Length of output: 258

🏁 Script executed:

#!/bin/bash set -euo pipefail WORKF="sdks/python/agenta/sdk/models/workflows.py" # Show around WorkflowInvokeRequest sed -n '220,330p' "$WORKF" | cat -n echo "----" # Search for Link usage/imports/aliases in this file rg -n "\\bLink\\b|\\bOTelLink\\b|trace_id|span_id|links:" "$WORKF" | head -n 120 echo "----" # Print full file if small? It's 22k lines? Actually 22,078 bytes; safe to show entire with cat -n maybe ~500 lines. wc -l "$WORKF" sed -n '1,220p' "$WORKF" | cat -n

Repository: Agenta-AI/agenta

Length of output: 11125

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Find all uses of TracingContext.link and how it becomes links rg -n "context\\.link\\b|\\.link\\b\\s*=" sdks/python/agenta/sdk | head -n 120 echo "----" # 2) Inspect where evaluations `execution.trace_id` / `execution.span_id` are created rg -n "execution\\.trace_id|execution\\.span_id" sdks/python/agenta/sdk/evaluations/runtime/processor.py | head -n 80 echo "----" # 3) Search for the type/model that defines `trace_id`/`span_id` on an `execution` rg -n "class .*Execution|Execution\\b" sdks/python/agenta/sdk/evaluations/runtime/processor.py sdks/python/agenta/sdk/evaluations/runtime -S | head -n 80 echo "----" # 4) Search for assignments to `trace_id=` near execution creation rg -n "trace_id\\s*=" sdks/python/agenta/sdk/evaluations/runtime/processor.py | head -n 120 rg -n "span_id\\s*=" sdks/python/agenta/sdk/evaluations/runtime/processor.py | head -n 120

Repository: Agenta-AI/agenta

Length of output: 2004

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect conversion between TracingContext.link -> TracingContext.links sed -n '250,380p' sdks/python/agenta/sdk/decorators/tracing.py | cat -n echo "----" # 2) Inspect evaluation runtime models for types of trace_id/span_id on execution/results sed -n '1,220p' sdks/python/agenta/sdk/evaluations/runtime/models.py | cat -n echo "----" # 3) Inspect runtime executor/execution creation around trace_id/span_id rg -n "trace_id:|span_id:|trace_id=|span_id=" sdks/python/agenta/sdk/evaluations/runtime/executor.py | head -n 120 # Print a portion around first matches for trace_id/span_id assignments python - <<'PY' import pathlib, re path = pathlib.Path("sdks/python/agenta/sdk/evaluations/runtime/executor.py") lines = path.read_text().splitlines() hits=[] for i,l in enumerate(lines, start=1): if re.search(r"\btrace_id\b|\bspan_id\b", l) and ("=" in l or ":" in l): hits.append(i) print("first hits:", hits[:20]) for i in hits[:3]: start=max(1,i-40); end=min(len(lines), i+60) print(f"\n--- executor.py:{start}-{end} around line {i} ---") for j in range(start,end+1): print(f"{j:5d}: {lines[j-1]}") PY

Repository: Agenta-AI/agenta

Length of output: 19457

Fix OTel link ID parsing to respect trace_id(128) vs span_id(64) widths

_as_otel_id (lines 22-26) parses both trace_id and span_id with the same base-16 logic; if span_id arrives in a UUID/dashed-UUID form, the resulting integer can exceed the OpenTelemetry 64-bit span_id limit. OpenTelemetry SpanContext(...) won’t throw on out-of-range IDs (it marks is_valid=False), but Span.add_link(...) can still record the link because _is_valid_link treats the presence of attributes as sufficient. Since this code passes attributes=dict(key=str(key)) (lines 133-142), invalid link contexts may still be attached/exported with incorrect linkage.

Suggested fix

-def _as_otel_id(value) -> int: - # Link ids arrive as bare hex (live spans) or as dashed UUIDs (recovered from - # stored result cells on the re-run/process path). Both encode the same - # integer, so strip dashes before parsing base-16. - return int(str(value).replace("-", ""), 16) +def _as_otel_id(value: object, *, bits: int) -> Optional[int]: + cleaned = str(value).replace("-", "") + try: + parsed = int(cleaned, 16) + except (TypeError, ValueError): + return None + return parsed & ((1 << bits) - 1)

- span.add_link( - context=SpanContext( - trace_id=_as_otel_id(link.get("trace_id")), - span_id=_as_otel_id(link.get("span_id")), - is_remote=True, - ), - attributes=dict( - key=str(key), - ), - ) + parsed_trace_id = _as_otel_id(link.get("trace_id"), bits=128) + parsed_span_id = _as_otel_id(link.get("span_id"), bits=64) + if parsed_trace_id is None or parsed_span_id is None: + continue + span.add_link( + context=SpanContext( + trace_id=parsed_trace_id, + span_id=parsed_span_id, + is_remote=True, + ), + attributes=dict(key=str(key)), + )

coderabbitai · 2026-06-12T10:49:59Z

+    const disabledRevisionIds = useMemo(
+        () =>
+            new Set<string>([
+                ...(connected.map((evaluator) => evaluator.id).filter(Boolean) as string[]),
+                ...pending.map((selection) => selection.id),
+            ]),
+        [connected, pending],
+    )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check how disabledChildIds is consumed in PopoverCascaderVariant and what IDs the cascader items use ast-grep --pattern 'disabledChildIds' rg -n -C5 'disabledChildIds' --type=tsx --type=ts

Repository: Agenta-AI/agenta

Length of output: 1308

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect how disabledChildIds is checked (what "childId" is) sed -n '520,700p' web/packages/agenta-entity-ui/src/selection/hooks/modes/useTreeSelectMode.ts echo '---' sed -n '330,420p' web/packages/agenta-entity-ui/src/selection/hooks/modes/useListPopoverMode.ts echo '---' # 2) Inspect where disabledChildIds is threaded into picker UI sed -n '600,720p' web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx echo '---' sed -n '180,280p' web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/ListPopoverVariant.tsx echo '---' sed -n '70,160p' web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/shared/ChildPopoverContent.tsx echo '---' # 3) Inspect EditEvaluationDrawer to confirm exact types of connected/pending ids sed -n '1,140p' web/oss/src/components/EditEvaluationDrawer/index.tsx echo '---' sed -n '140,340p' web/oss/src/components/EditEvaluationDrawer/index.tsx echo '---' # 4) Inspect evaluators.ts where definition.id is derived (as referenced in the review) # Use fd to find the relevant file. FILE=$(fd -a 'evaluators.ts' web/oss web/packages | head -n 1 || true) echo "Found evaluators.ts at: $FILE" if [[ -n "${FILE}" ]]; then sed -n '130,220p' "$FILE" fi

Repository: Agenta-AI/agenta

Length of output: 35346

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find adapters rg -n "useEnrichedEvaluatorOnlyAdapter" web/packages web/oss -S rg -n "useEnrichedHumanEvaluatorAdapter" web/packages web/oss -S # Inspect adapters implementations # (Use fd to locate files containing these hooks) FILES=$(rg -l "useEnrichedEvaluatorOnlyAdapter" web/packages web/oss -S || true) for f in $FILES; do echo "---- $f"; sed -n '1,220p' "$f"; done FILES=$(rg -l "useEnrichedHumanEvaluatorAdapter" web/packages web/oss -S || true) for f in $FILES; do echo "---- $f"; sed -n '1,220p' "$f"; sed -n '1,220p' "$f"; done # Inspect PopoverCascaderVariant hierarchy levels / getId usage for child nodes rg -n "popover-cascader" web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx sed -n '1,260p' web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx sed -n '260,520p' web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx # Also inspect shared hierarchy config building (if present) rg -n "getId\\(.*childLevelConfig" web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx -S || true rg -n "hierarchyLevels" web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx -S || true # Locate where TreeSelectNode gets created from childLevelConfig.getId and how getId is wired rg -n "childLevelConfig\\.getId" web/packages/agenta-entity-ui/src/selection -S

Repository: Agenta-AI/agenta

Length of output: 50375

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate the workflow revision adapter implementation f=$(fd -a "workflowRevisionRelationAdapter.ts" web/packages/agenta-entity-ui/src/selection/adapters web/packages web/oss | head -n 1 || true) echo "workflowRevisionRelationAdapter.ts: ${f:-NOT_FOUND}" # If found, inspect parts where hierarchy levels + getId are defined if [[ -n "${f:-}" ]]; then rg -n "createWorkflowRevisionAdapter|createEvaluatorRevisionLevel|getId\\s*\\(" "$f" --context 2 # Show candidate blocks around createWorkflowRevisionAdapter definition rg -n "function createWorkflowRevisionAdapter|export function createWorkflowRevisionAdapter" "$f" # Show nearby section around revision level config rg -n "revision.*getId|getId:.*revision|getId\\s*\\(.*revision" "$f" --context 3 # Print a window around the most relevant match m=$(rg -n "getId\\s*[:=].*revision|getId\\s*\\(.*revision|createEvaluatorRevisionLevel|getEvaluator" "$f" | head -n 1 | cut -d: -f1 || true) if [[ -n "${m:-}" ]]; then start=$((m-60)); end=$((m+120)) sed -n "${start},${end}p" "$f" fi fi # Also inspect evaluator revision level factory which likely defines leaf IDs ef=$(fd -a "createEvaluatorRevisionLevel" web/packages/agenta-entity-ui/src/selection/adapters web/packages/agenta-entity-ui/src/selection -t ts -t tsx | head -n 20 || true) echo "createEvaluatorRevisionLevel matches: ${ef}" for g in $(rg -l "createEvaluatorRevisionLevel" web/packages/agenta-entity-ui/src/selection/adapters web/packages/agenta-entity-ui/src/selection | head -n 20); do echo "---- $g" rg -n "createEvaluatorRevisionLevel" "$g" --context 3 done # Finally, inspect how WorkflowRevisionSelectionResult.id is constructed rr=$(rg -l "WorkflowRevisionSelectionResult" web/packages/agenta-entity-ui/src/selection/adapters | head -n 20 || true) for h in $rr; do echo "---- $h" rg -n "id\\s*:" "$h" --context 2 | head -n 20 done

Repository: Agenta-AI/agenta

Length of output: 15162

🏁 Script executed:

#!/bin/bash set -euo pipefail f="web/packages/agenta-entity-ui/src/selection/adapters/revisionLevelFactory.ts" echo "---- $f (relevant sections) ----" # Find createRevisionLevel and show its level config, especially getId/getLabel rg -n "function createRevisionLevel|export function createRevisionLevel|createRevisionLevel\\(" "$f" # show around createRevisionLevel definition line=$(rg -n "function createRevisionLevel|export function createRevisionLevel" "$f" | head -n 1 | cut -d: -f1) start=$((line-60)); end=$((line+220)) sed -n "${start},${end}p" "$f" # show around createEvaluatorRevisionLevel rg -n "createEvaluatorRevisionLevel" "$f" line2=$(rg -n "export function createEvaluatorRevisionLevel" "$f" | head -n 1 | cut -d: -f1) start2=$((line2-40)); end2=$((line2+140)) sed -n "${start2},${end2}p" "$f" # Also inspect where workflowRevisionLevel is constructed if it exists in this file rg -n "workflowRevisionLevel|revisionLevel" "$f"

Repository: Agenta-AI/agenta

Length of output: 8474

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Find where workflowRevisionsListQueryStateAtomFamily is defined (entities/workflow) rg -n "workflowRevisionsListQueryStateAtomFamily" web/packages web/oss -S --hidden # 2) Inspect the definition and, crucially, what the returned "revision" entity shape is f=$(rg -l "workflowRevisionsListQueryStateAtomFamily" web/packages web/oss -S --hidden | head -n 1 || true) echo "First match: $f" if [[ -n "${f:-}" ]]; then sed -n '1,260p' "$f" | head -n 200 # narrow to the atom/query logic if possible rg -n "atomFamily|family|return|workflowRevisionsListQueryStateAtomFamily" "$f" --context 2 | head -n 80 fi # 3) Confirm what workflowToRevisionRelation uses for listing (skipVariantLevel path) a=$(fd -a "workflowRevisionRelationAdapter.ts" web/packages/agenta-entity-ui/src/selection/adapters web/packages web/oss | head -n 5 || true) echo "workflowRevisionRelationAdapter candidates:" echo "$a" for g in $a; do echo "---- $g: workflowToRevisionRelation definition ----" rg -n "workflowToRevisionRelation" "$g" --context 2 | head -n 40 # show nearby block around workflowToRevisionRelation line=$(rg -n "workflowToRevisionRelation" "$g" | head -n 1 | cut -d: -f1) start=$((line-80)); end=$((line+120)) sed -n "${start},${end}p" "$g" done # 4) Inspect evaluator refs mapping in evaluators.ts for artifactId vs revisionId meaning ev="web/oss/src/components/EvalRunDetails/atoms/table/evaluators.ts" if [[ -f "$ev" ]]; then rg -n "artifactId|revisionId" "$ev" --context 3 fi

Repository: Agenta-AI/agenta

Length of output: 11188

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect workflowToRevisionRelation mapping rel="web/packages/agenta-entities/src/workflow/relations.ts" echo "---- $rel (workflowToRevisionRelation) ----" rg -n "workflowToRevisionRelation" "$rel" line=$(rg -n "workflowToRevisionRelation" "$rel" | head -n 1 | cut -d: -f1) start=$((line-120)); end=$((line+220)) sed -n "${start},${end}p" "$rel" | head -n 220 # 2) Inspect workflowRevisionsListQueryStateAtomFamily implementation (store.ts) store="web/packages/agenta-entities/src/workflow/state/store.ts" echo "---- $store (workflowRevisionsListQueryStateAtomFamily) ----" rg -n "workflowRevisionsListQueryStateAtomFamily" "$store" line2=$(rg -n "export const workflowRevisionsListQueryStateAtomFamily" "$store" | head -n 1 | cut -d: -f1) start2=$((line2-60)); end2=$((line2+240)) sed -n "${start2},${end2}p" "$store" | head -n 260 # 3) Inspect evaluator refs meaning (artifactId vs revisionId) in evaluator extraction core (if present) core="web/packages/agenta-entities/src/workflow/core/index.ts" if [[ -f "$core" ]]; then echo "---- $core (extractEvaluatorRef / type defs) ----" rg -n "artifactId|revisionId" "$core" | head -n 50 else echo "Core index not found at $core; skipping." fi # Find extractEvaluatorRef definition if separate file ex=$(fd -a "extractEvaluatorRef" web/packages/agenta-entities/src/workflow | head -n 5 || true) for f in $ex; do echo "---- $f ----"; rg -n "artifactId|revisionId" "$f" --context 2; done

Repository: Agenta-AI/agenta

Length of output: 2224

🏁 Script executed:

#!/bin/bash set -euo pipefail # Helpers safe_start_end () { # $1=line $2=back $3=forward local line="$1" back="$2" fwd="$3" local start=$((line-back)) local end=$((line+fwd)) if (( start < 1 )); then start=1; fi echo "$start $end" } # 1) workflowToRevisionRelation actual mapping rel="web/packages/agenta-entities/src/workflow/relations.ts" line=$(rg -n "export const workflowToRevisionRelation" "$rel" | head -n 1 | cut -d: -f1) read start end < <(safe_start_end "$line" 80 220) echo "---- $rel (lines $start-$end) ----" sed -n "${start},${end}p" "$rel" # 2) workflowRevisionsListQueryStateAtomFamily: what does "revision entity.id" represent? store="web/packages/agenta-entities/src/workflow/state/store.ts" line2=$(rg -n "export const workflowRevisionsListQueryStateAtomFamily" "$store" | head -n 1 | cut -d: -f1) read start2 end2 < <(safe_start_end "$line2" 40 260) echo "---- $store (lines $start2-$end2) ----" sed -n "${start2},${end2}p" "$store" # 3) Find where revision list items are shaped (mapping to entity fields) # Search for extractRevisionFields usage in workflow state store rg -n "extractRevisionFields|workflowRevision|Revision" "$store" | head -n 80 # 4) In case the store only calls query, inspect the query function that returns revisions # Find "queryWorkflowRevisions" implementation qf=$(fd -a "queryWorkflowRevisions" web/packages/agenta-entities/src/workflow | head -n 5 || true) echo "queryWorkflowRevisions candidates:" echo "$qf" if [[ -n "${qf:-}" ]]; then for f in $qf; do echo "---- $f (context) ----" rg -n "queryWorkflowRevisions|queryWorkflowRevisionsByWorkflow|queryWorkflowRevisionsByWorkflows" "$f" # print around first match l=$(rg -n "queryWorkflowRevisionsByWorkflow|queryWorkflowRevisions" "$f" | head -n 1 | cut -d: -f1) read s e < <(safe_start_end "$l" 60 220) sed -n "${s},${e}p" "$f" done fi

Repository: Agenta-AI/agenta

Length of output: 118

🏁 Script executed:

#!/bin/bash set -euo pipefail rel="web/packages/agenta-entities/src/workflow/relations.ts" line=$(rg -n "export const workflowToRevisionRelation" "$rel" | head -n 1 | cut -d: -f1) start=$((line-80)); if (( start<1 )); then start=1; fi end=$((line+220)) echo "---- $rel (lines $start-$end) ----" sed -n "${start},${end}p" "$rel" store="web/packages/agenta-entities/src/workflow/state/store.ts" line2=$(rg -n "export const workflowRevisionsListQueryStateAtomFamily" "$store" | head -n 1 | cut -d: -f1) start2=$((line2-40)); if (( start2<1 )); then start2=1; fi end2=$((line2+240)) echo "---- $store (lines $start2-$end2) ----" sed -n "${start2},${end2}p" "$store" # Inspect how revision entities are shaped from API results (id/version/etc) # Look for extractRevisionFields usage in workflow store + helpers rg -n "extractRevisionFields\\(|getField<.*>\\(entity" web/packages/agenta-entities/src/workflow -S

Repository: Agenta-AI/agenta

Length of output: 19206

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Find where workflowRevisionsByWorkflowQueryAtomFamily is defined and inspect how refs.id is built store="web/packages/agenta-entities/src/workflow/state/store.ts" qatom="$(rg -n "workflowRevisionsByWorkflowQueryAtomFamily" "$store" -S || true)" echo "Matches for workflowRevisionsByWorkflowQueryAtomFamily:" echo "$qatom" | head -n 20 f="$(rg -l "workflowRevisionsByWorkflowQueryAtomFamily" web/packages/agenta-entities/src/workflow/state | head -n 1 || true)" echo "First defining file: $f" if [[ -n "${f:-}" ]]; then # Print around the atomFamily definition line=$(rg -n "export const workflowRevisionsByWorkflowQueryAtomFamily" "$f" | head -n 1 | cut -d: -f1 || true) if [[ -n "${line:-}" ]]; then start=$((line-60)); end=$((line+220)) sed -n "${start},${end}p" "$f" fi # Also inspect the underlying API function mapping if called directly api="$(rg -n "queryWorkflowRevisionsByWorkflow|queryWorkflowRevisions\\(" "$f" -S || true)" echo "Underlying query references in $f:" echo "$api" | head -n 50 fi # 2) Inspect WorkflowReference / QueryResult types (to see if refs.id is revision-id) types_dir="web/packages/agenta-entities/src/workflow" typefile="$(rg -l "export type WorkflowReference" "$types_dir" web/packages/agenta-entities/src/workflow -S | head -n 20 || true)" echo "WorkflowReference files:" echo "$typefile" for t in $typefile; do echo "---- $t ----" rg -n "export type WorkflowReference|type WorkflowReference" "$t" --context 3 done # 3) Inspect extractEvaluatorRef types/meaning of artifactId vs revisionId (likely in core) core_dir="web/packages/agenta-entities/src/workflow/core" core_file="$(fd -a "core" "$core_dir" 2>/dev/null || true)" # just search for artifactId/revisionId fields rg -n "artifactId|revisionId" "$core_dir" -S | head -n 80

Repository: Agenta-AI/agenta

Length of output: 13181

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect EvaluatorDefinition type (does it expose revisionId separately?) evdef_file="$(rg -l "export type EvaluatorDefinition" web/packages/agenta-entities/src/workflow web/oss/src -S | head -n 1 || true)" echo "EvaluatorDefinition file: ${evdef_file:-NOT_FOUND}" if [[ -n "${evdef_file:-}" ]]; then rg -n "export type EvaluatorDefinition|type EvaluatorDefinition" "$evdef_file" --context 6 # show a larger window around it line=$(rg -n "export type EvaluatorDefinition|type EvaluatorDefinition" "$evdef_file" | head -n 1 | cut -d: -f1) start=$((line-40)); end=$((line+120)) sed -n "${start},${end}p" "$evdef_file" fi # 2) Inspect evaluatorDefinitionByRevisionQueryAtomFamily usage/definition atom_file="$(rg -l "evaluatorDefinitionByRevisionQueryAtomFamily" web/oss/src -S | head -n 1 || true)" echo "evaluatorDefinitionByRevisionQueryAtomFamily file: ${atom_file:-NOT_FOUND}" if [[ -n "${atom_file:-}" ]]; then line=$(rg -n "evaluatorDefinitionByRevisionQueryAtomFamily" "$atom_file" | head -n 1 | cut -d: -f1) start=$((line-80)); end=$((line+200)) sed -n "${start},${end}p" "$atom_file" fi # 3) Inspect EvaluatorDefinition construction: toEvaluatorDefinitionFromRaw / evaluatorResolution core_dir="web/packages/agenta-entities/src/workflow/core" for f in $(rg -l "toEvaluatorDefinitionFromRaw|extractEvaluatorRef" "$core_dir" -S | head -n 20); do echo "---- $f ----" rg -n "toEvaluatorDefinitionFromRaw|extractEvaluatorRef|artifactId|revisionId" "$f" --context 3 done

Repository: Agenta-AI/agenta

Length of output: 2052

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Find EvaluatorDefinition type rg -n "EvaluatorDefinition" web/packages/agenta-entities/src/workflow -S --hidden | head -n 50 rg -n "export type EvaluatorDefinition|type EvaluatorDefinition|interface EvaluatorDefinition" web/packages/agenta-entities/src/workflow -S --hidden | head -n 50 # 2) Inspect evaluatorDefinitionByRevisionQueryAtomFamily implementation (fix snippet around definition) file="web/oss/src/components/EvalRunDetails/atoms/table/evaluators.ts" echo "---- $file (definition + key id handling) ----" rg -n "evaluatorDefinitionByRevisionQueryAtomFamily" "$file" -S line=$(rg -n "evaluatorDefinitionByRevisionQueryAtomFamily" "$file" | head -n 1 | cut -d: -f1) start=$((line>120?line-120:1)) end=$((line+220)) sed -n "${start},${end}p" "$file" # 3) Inspect queryWorkflowRevisionsByWorkflow to see what r.id represents core_dir="web/packages/agenta-entities/src/workflow" rg -n "queryWorkflowRevisionsByWorkflow" "$core_dir" -S --hidden qfile=$(rg -l "queryWorkflowRevisionsByWorkflow" "$core_dir" --hidden | head -n 1 || true) echo "First query file: ${qfile:-NOT_FOUND}" if [[ -n "${qfile:-}" ]]; then qline=$(rg -n "queryWorkflowRevisionsByWorkflow" "$qfile" | head -n 1 | cut -d: -f1) start=$((qline>140?qline-140:1)) end=$((qline+260)) sed -n "${start},${end}p" "$qfile" fi # 4) Inspect the API response shape for workflow revisions listing (look for id/artifact_id fields) # This helps determine whether response.workflow_revisions[].id is revision id or artifact id. rg -n "workflow_revisions" "$core_dir" -S --hidden | head -n 50 rg -n "artifact.*id|revision.*id|artifact_id|revision_id" "$core_dir" -S --hidden | head -n 50

Repository: Agenta-AI/agenta

Length of output: 29036

🏁 Script executed:

#!/bin/bash set -euo pipefail f="web/packages/agenta-entities/src/workflow/core/evaluatorResolution.ts" echo "---- $f (EvaluatorDefinition + extractEvaluatorRef) ----" # show around EvaluatorDefinition interface and extractEvaluatorRef rg -n "export interface EvaluatorDefinition|export const extractEvaluatorRef|artifactId\\?|revisionId\\?" "$f" line=$(rg -n "export interface EvaluatorDefinition" "$f" | head -n 1 | cut -d: -f1) start=$((line-40)); end=$((line+120)) sed -n "${start},${end}p" "$f" line2=$(rg -n "export const extractEvaluatorRef" "$f" | head -n 1 | cut -d: -f1) start2=$((line2-40)); end2=$((line2+140)) sed -n "${start2},${end2}p" "$f" line3=$(rg -n "toEvaluatorDefinitionFromWorkflow|toEvaluatorDefinitionFromRaw" "$f" | head -n 1 | cut -d: -f1) start3=$((line3-40)); end3=$((line3+200)) sed -n "${start3},${end3}p" "$f"

Repository: Agenta-AI/agenta

Length of output: 15707

Fix disabledChildIds ID mismatch in EditEvaluationDrawer picker

disabledRevisionIds (web/oss/src/components/EditEvaluationDrawer/index.tsx, lines 80-87) mixes ID types: connected uses EvaluatorDefinition.id which is overridden with ref.artifactId ?? ref.revisionId ?? refId (artifact id takes precedence), while the popover-cascader picker disables children by checking disabledChildIds.has(childLevelConfig.getId(child))—and for the workflow→revision level, getId is the revision entity’s id (revision id). As a result, already-connected evaluators may not be disabled.

Derive disabledRevisionIds for connected from the revision IDs (or adjust the evaluationEvaluatorsByRunQueryAtomFamily/EvaluatorDefinition shaping to preserve revisionId alongside artifactId).

coderabbitai · 2026-06-12T10:49:59Z

+const BRIDGE_ATTEMPTS = 15
+const BRIDGE_INTERVAL_MS = 2000
+
+/**
+ * Bridge the async reprocess after adding an evaluator.
+ *
+ * `dispatch_run_slice` flips the run to RUNNING *synchronously* before the 202 returns
+ * (api/.../service.py), then the worker fills the new evaluator's cells and finalizes back
+ * to a terminal status (~seconds). So by the time this runs the run is already RUNNING;
+ * we poll every surface until the run reads TERMINAL again — which means the worker is done
+ * and every new cell is persisted — then run a FINAL invalidation.
+ *
+ * The final terminal-gated invalidation is the key to a reliable scenario table. The
+ * per-scenario `scenario-steps` queries only poll while non-terminal (scenarioSteps.ts
+ * `refetchInterval`), so when finalize lands between their 5s polls, only the rows that
+ * happened to refetch afterwards show new data — a half-updated table. Invalidating once
+ * the run is terminal converges every mounted row and marks off-screen/virtualized rows
+ * stale for their next mount. Fire-and-forget: the drawer closes immediately.
+ */
+const SETTLE_MS = 1500
+
+const bridgeRunReprocessing = async (queryClient: any, projectId: string, runId: string) => {
+    for (let attempt = 0; attempt < BRIDGE_ATTEMPTS; attempt++) {
+        await delay(BRIDGE_INTERVAL_MS)
+        const status = await readRunStatusAuthoritative(projectId, runId)
+        if (status && isTerminalStatus(status)) {
+            // Worker finished. Cell results can persist a beat after the run status flips
+            // terminal, so invalidate now AND once more after a short settle. Each call
+            // refetches active scenario rows/steps/metrics and marks off-screen ones stale —
+            // reload-equivalent, so nothing is left frozen by the per-scenario poller that
+            // stops the instant the run goes terminal.
+            await invalidateRunSurfaces(queryClient, projectId, runId)
+            await delay(SETTLE_MS)
+            await invalidateRunSurfaces(queryClient, projectId, runId)
+            return
+        }
+        // Still running — refresh active surfaces so progress (RUNNING status, cells) streams in.
+        await refetchRunSurfaces(queryClient, projectId, runId)
+    }
+    // Budget exhausted (very long reprocess, or status unreadable). Final invalidation so
+    // nothing is left half-updated.
+    await invalidateRunSurfaces(queryClient, projectId, runId)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The reprocess bridge gives up too early for large reruns.

The bridge stops after ~30s (15 * 2000ms). For longer evaluator backfills, it exits before the run returns to terminal, so the terminal-gated invalidation this code depends on never happens. Mounted rows will keep updating via polling, but virtualized/off-screen rows can stay stale until some unrelated refresh.

Please keep polling until terminal with backoff, or reschedule another check when the budget expires and the run is still non-terminal.

coderabbitai · 2026-06-12T10:49:59Z

+export interface SaveEvaluationEditArgs {
+    runId: string
+    /** Current name (seeded from the run; sent even if unchanged — edit replaces it). */
+    name: string
+    /** Current description (sent even if unchanged). */
+    description: string
+    /** Evaluator REVISION ids staged for addition in the drawer. */
+    addedEvaluatorRevisionIds: string[]
+}
+
+export const saveEvaluationEditAtom = atom(
+    null,
+    async (
+        get,
+        _set,
+        {runId, name, description, addedEvaluatorRevisionIds}: SaveEvaluationEditArgs,
+    ) => {
+        if (!runId) return
+
+        const projectId = get(projectIdAtom)
+        if (!projectId) throw new Error("[edit-evaluation] missing projectId")
+
+        const run = get(evaluationRunQueryAtomFamily(runId))?.data?.rawRun
+        if (!run) throw new Error("[edit-evaluation] run not loaded")
+
+        const steps = stepsOf(run)
+        const annotationSteps = steps.filter((step) => step?.type === "annotation")
+        const existingEvaluatorIds = revisionIds(steps, "annotation", "evaluator_revision")
+        const fresh = (addedEvaluatorRevisionIds ?? []).filter(
+            (id) => !existingEvaluatorIds.includes(id),
+        )
+
+        // New evaluators inherit the run's evaluator origin (kind anchor).
+        const newOrigin = (annotationSteps[0]?.origin as EvaluatorOrigin | undefined) ?? "auto"
+
+        // edit REPLACES run data, so send the complete target set (existing + new).
+        const evaluatorSteps: Record<string, EvaluatorOrigin> = {}
+        for (const step of annotationSteps) {
+            const id = step.references?.evaluator_revision?.id
+            if (id) evaluatorSteps[id] = (step.origin as EvaluatorOrigin) ?? newOrigin
+        }
+        for (const id of fresh) evaluatorSteps[id] = newOrigin


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

New evaluator origins are lost in the mutation contract.

SaveEvaluationEditArgs only carries revision ids, so lines 226-235 have to invent an origin for every added evaluator from the first existing annotation step (or "auto"). That corrupts mixed-origin edits: e.g. adding a human evaluator to an auto run persists it as auto, and adding the first evaluator to an empty run always persists it as auto.

Pass {id, origin} from the drawer and build evaluatorSteps from those explicit origins instead of inferring one fallback for all new evaluators.

coderabbitai · 2026-06-12T10:49:59Z

+export const queryRunScenarioIds = async ({
+    projectId,
+    runId,
+}: {
+    projectId: string
+    runId: string
+}): Promise<string[]> => {
+    const res = await client().queryScenarios(
+        {scenario: {run_id: runId}, windowing: {limit: 1000}},
+        {queryParams: {project_id: projectId}},
+    )
+    return (res.scenarios ?? [])
+        .map((scenario) => scenario.id)
+        .filter((id): id is string => Boolean(id))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

queryRunScenarioIds drops every scenario after the first page.

This helper issues a single queryScenarios request with limit: 1000 and never follows pagination. When a run has more than 1000 scenarios, processSlice will only rerun the first page, so newly added evaluators never backfill the remaining rows.

Please page until exhaustion (or switch to an endpoint that returns all ids).

Resolved conflict: accept deletion of GeneralSection.tsx, which was removed in the release branch as part of the ConfigurationView refactor (replaced by RunSummaryCard, SectionNavCard, V2SectionShell). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ng-eval

github-actions · 2026-06-12T18:26:24Z

Railway Preview Environment


Status	Destroyed (PR closed)

Updated at 2026-06-12T18:53:16.547Z

ardaerzin added 3 commits June 7, 2026 18:41

chore(api): remove add-evaluators edit-path acceptance probe

a8b2422

Drops the exploratory acceptance probe added alongside the run-status change; it was a proof-of-contract probe, not a maintained test.

vercel Bot deployed to Preview June 8, 2026 12:30 View deployment

junaway approved these changes Jun 8, 2026

View reviewed changes

Merge branch 'feat/unified-eval-loops' into fe-feat/add-evaluators-to…

ecafe67

…-existing-eval

vercel Bot deployed to Preview June 8, 2026 12:34 View deployment

ardaerzin added 2 commits June 8, 2026 16:21

vercel Bot deployed to Preview June 8, 2026 14:25 View deployment

Merge branch 'feat/unified-eval-loops' into fe-feat/add-evaluators-to…

2961145

…-existing-eval

ardaerzin marked this pull request as ready for review June 10, 2026 15:54

vercel Bot deployed to Preview June 10, 2026 15:55 View deployment

ashrafchowdury approved these changes Jun 11, 2026

View reviewed changes

mmabrouk changed the base branch from feat/unified-eval-loops to release/v0.103.3 June 12, 2026 10:39

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 12, 2026

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

vercel Bot deployed to Preview June 12, 2026 11:32 View deployment

Merge branch 'release/v0.103.3' into fe-feat/add-evaluators-to-existi…

106c826

…ng-eval

vercel Bot deployed to Preview June 12, 2026 18:14 View deployment

bekossy approved these changes Jun 12, 2026

View reviewed changes

dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 12, 2026

bekossy merged commit 6e8b589 into release/v0.103.3 Jun 12, 2026
41 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FE / Feat] Add evaluators to existing evals#4577

[FE / Feat] Add evaluators to existing evals#4577
bekossy merged 10 commits into
release/v0.103.3from
fe-feat/add-evaluators-to-existing-eval

ardaerzin commented Jun 8, 2026

Uh oh!

vercel Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Review skipped

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ardaerzin commented Jun 8, 2026

Summary

Testing

Verified locally

Added or updated tests

QA follow-up

Demo

Checklist

Contributor Resources

Uh oh!

vercel Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vercel Bot commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading