[FE / Feat] Add evaluators to existing evals#4577
Conversation
- Shared 'Edit evaluation' drawer (name/description + evaluators) opened from a run-header actions dropdown (all tabs), the config General 'Edit' button, and the evaluations-table row action; the config General section is now display-only. - Jotai mutation flow (editSimpleEvaluation + process slice) with a terminal-gated background refresh so the evaluations list and the run scenarios table converge reliably (columns, metric cells, status) after an edit. - Resolve evaluator output metrics for staged (pending) evaluators in the drawer. - Dark mode fixes: drawer edge shadow, entity-picker hover/selected highlight, and the cascader child-panel loading/loaded width jump.
dispatch_run_slice re-activates the run (status=RUNNING, is_active=True) before dispatching the worker, so the status indicator reflects the reprocess; _finalize_run_after_slice floors it back to terminal when scoring completes. Adds an acceptance probe for the edit+process path.
Link ids recovered from stored result cells on the re-run/process path arrive as dashed UUIDs (live spans send bare hex); both encode the same integer. Strip dashes before base-16 parsing so add_link no longer raises ValueError on the hyphens.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Drops the exploratory acceptance probe added alongside the run-status change; it was a proof-of-contract probe, not a maintained test.
… slice Mirrors the run-level re-activation at the scenario level so per-scenario status indicators also reflect the reprocess; dispatch_run_slice now bulk-sets the addressed scenarios to RUNNING/is_active before dispatch (full-PUT edit preserves flags/interval/ timestamp/meta), and the engine writes each scenario's terminal status back on completion.
…ding an evaluator The post-edit background refresh now (1) matches any query scoped to the run id (reload-equivalent — covers the scenario rows+status query the old allowlist missed), (2) detects run completion authoritatively via the run batcher instead of getQueryData, and (3) invalidates twice (now + a short settle) so cell results that persist just after the run status flips terminal aren't left frozen by the per-scenario poller.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (5)
web/packages/agenta-ui/src/components/selection/ListItem.tsx (1)
134-137: ⚡ Quick winCondense this explanatory block to one terse “why” line.
This comment is longer than the repo standard and partially restates behavior; keep only a short invariant-focused rationale.
As per coding guidelines, “Keep AI-generated in-code comments minimal; comment only the non-obvious why … Prefer one terse line over multi-line blocks.”
Source: Coding guidelines
web/oss/src/components/EvalRunDetails/components/Page.tsx (1)
54-57: ⚡ Quick winShorten comment to focus on the non-obvious constraint.
The comment lists trigger locations (header dropdown, config buttons), which describes what uses the drawer rather than why this specific wiring is needed. Per coding guidelines, keep comments minimal and focus on non-obvious invariants.
♻️ Suggested revision
- // Shared "Edit evaluation" drawer — opened by the header dropdown (all tabs), the - // config General Edit button, and the Add-evaluator button. + // Shared drawer; multiple triggers set this atom to open. const editDrawerRunId = useAtomValue(editEvaluationDrawerRunIdAtom) const setEditDrawerRunId = useSetAtom(editEvaluationDrawerRunIdAtom)Source: Coding guidelines
web/oss/src/components/EvalRunDetails/components/RunActionsDropdown.tsx (1)
11-17: ⚡ Quick winCondense comment to the essential invariant.
The block comment narrates what the component renders, where it lives, and includes a speculative note about future extensibility. Per coding guidelines, keep comments minimal and avoid TODO-style prose about future features. The key constraint here is that the dropdown lives in the page header to stay accessible across tabs.
♻️ Suggested revision
-/** - * Actions dropdown rendered next to the run name in the run-details header. Lives in the - * page header so "Edit evaluation" is reachable from every tab (Overview / Scenarios / - * Configuration), mirroring the testset page's name-adjacent actions menu. Opens the - * shared Edit drawer via `editEvaluationDrawerRunIdAtom`. Built as a menu so more run - * actions (duplicate, delete, …) can slot in later. - */ +// In header to keep Edit action reachable across all tabs. const RunActionsDropdown = ({runId}: {runId: string}) => {Source: Coding guidelines
web/oss/src/components/EvalRunDetails/components/views/ConfigurationView/components/GeneralSection.tsx (1)
46-50: ⚡ Quick winRemove or condense the refactor-explanation comment.
The comment documents the historical transition from inline editing to drawer-based editing, which is more suited for a PR description than inline code. Per coding guidelines, keep comments minimal and focus on non-obvious constraints in the current code.
♻️ Suggested revision
-/** - * Read-only view of the run's metadata. Editing is no longer inline — it goes through the - * shared "Edit evaluation" drawer (consistent with the header actions dropdown and the - * Add-evaluator button), opened via the Edit trigger here. - */ const GeneralSection = ({runId, showActions = true, showHeader = true}: GeneralSectionProps) => {If context is needed, a single line suffices:
+// Read-only; editing opens the shared drawer. const GeneralSection = ({runId, showActions = true, showHeader = true}: GeneralSectionProps) => {Source: Coding guidelines
web/oss/src/components/EvaluationRunsTablePOC/hooks/useEvaluationRunsColumns/constants.tsx (1)
69-73: ⚡ Quick winCondense the in-code explanation to a single terse invariant.
The multi-line block is clear but too verbose for this codebase standard; keep just one short “why” line and avoid narrating behavior details.
As per coding guidelines, "Keep AI-generated in-code comments minimal; comment only the non-obvious why ... Prefer one terse line over multi-line blocks."
Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: fc59aeb5-7ce7-489d-a4f6-66e27b639cb4
📒 Files selected for processing (19)
api/oss/src/core/evaluations/service.pysdks/python/agenta/sdk/engines/tracing/processors.pyweb/oss/src/components/EditEvaluationDrawer/index.tsxweb/oss/src/components/EvalRunDetails/atoms/mutations/editEvaluation.tsweb/oss/src/components/EvalRunDetails/atoms/table/evaluators.tsweb/oss/src/components/EvalRunDetails/components/Page.tsxweb/oss/src/components/EvalRunDetails/components/RunActionsDropdown.tsxweb/oss/src/components/EvalRunDetails/components/views/ConfigurationView/components/EvaluatorSection.tsxweb/oss/src/components/EvalRunDetails/components/views/ConfigurationView/components/GeneralSection.tsxweb/oss/src/components/EvalRunDetails/state/editDrawer.tsweb/oss/src/components/EvaluationRunsTablePOC/components/EvaluationRunsTable/index.tsxweb/oss/src/components/EvaluationRunsTablePOC/components/cells/ActionsCell/index.tsxweb/oss/src/components/EvaluationRunsTablePOC/hooks/useEvaluationRunsColumns/constants.tsxweb/oss/src/components/EvaluationRunsTablePOC/hooks/useEvaluationRunsColumns/index.tsxweb/oss/src/components/EvaluationRunsTablePOC/hooks/useEvaluationRunsColumns/types.tsweb/oss/src/components/Layout/ThemeContextProvider.tsxweb/oss/src/services/evaluations/runShape/api.tsweb/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsxweb/packages/agenta-ui/src/components/selection/ListItem.tsx
| scenarios = await self.evaluations_service.query_scenarios( | ||
| project_id=project_id, | ||
| scenario=EvaluationScenarioQuery(run_id=run_id, ids=scenario_ids), | ||
| windowing=Windowing(limit=10_000), | ||
| ) | ||
| if scenarios: | ||
| await self.evaluations_service.edit_scenarios( | ||
| project_id=project_id, | ||
| user_id=user_id, | ||
| scenarios=[ | ||
| EvaluationScenarioEdit( | ||
| id=scenario.id, | ||
| flags=( | ||
| scenario.flags.model_copy(update={"is_active": True}) | ||
| if scenario.flags | ||
| else EvaluationRunFlags(is_active=True) | ||
| ), | ||
| status=EvaluationStatus.RUNNING, | ||
| interval=scenario.interval, | ||
| timestamp=scenario.timestamp, | ||
| meta=scenario.meta, | ||
| ) | ||
| for scenario in scenarios | ||
| ], | ||
| ) |
There was a problem hiding this comment.
Paginate the scenario re-activation query.
query_scenarios(..., windowing=Windowing(limit=10_000)) only updates the first page. On large runs, or when scenario_ids itself is longer than 10k, the rerun still targets all addressed scenarios via process_rerun(...), but only a subset gets flipped back to RUNNING/is_active=True. That leaves scenario-level status inconsistent during the reprocess.
Use fetch_scenarios(...) when scenario_ids is already known, and paginate when reactivating the full run slice.
| def _as_otel_id(value) -> int: | ||
| # Link ids arrive as bare hex (live spans) or as dashed UUIDs (recovered from | ||
| # stored result cells on the re-run/process path). Both encode the same | ||
| # integer, so strip dashes before parsing base-16. | ||
| return int(str(value).replace("-", ""), 16) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In OpenTelemetry Python, what are the valid numeric ranges for SpanContext.trace_id and SpanContext.span_id, and does SpanContext raise if span_id exceeds 64 bits?
💡 Result:
In OpenTelemetry Python, the valid numeric ranges for SpanContext are determined by the W3C TraceContext specification, which defines TraceId as 16 bytes and SpanId as 8 bytes [1][2]. For SpanContext.span_id, the valid range is 1 to 2^64 - 1 (0x0 to 0xffffffffffffffff, excluding 0) [3][4][5]. For SpanContext.trace_id, the valid range is 1 to 2^128 - 1 (0x0 to 0xffffffffffffffffffffffffffffffff, excluding 0) [3][4][5]. SpanContext does not raise an exception if the provided span_id (or trace_id) exceeds these limits [3][4][6]. Instead, the SpanContext object is successfully created, but its is_valid property will be set to False [4][6][5]. The OpenTelemetry Python implementation explicitly checks these boundaries during initialization to determine the validity of the context [4][6], and test cases confirm that values exceeding the maximum or being less than or equal to zero result in an is_valid status of False [5].
Citations:
- 1: https://opentelemetry.io/docs/specs/otel/trace/api/
- 2: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.43.0/specification/trace/api.md
- 3: Fix validity calculation for trace/span ID open-telemetry/opentelemetry-python#2145
- 4: https://opentelemetry-python.readthedocs.io/en/stable/_modules/opentelemetry/trace/span.html
- 5: https://github.com/open-telemetry/opentelemetry-python/blob/eed100c3/opentelemetry-api/tests/trace/test_span_context.py
- 6: https://opentelemetry-python.readthedocs.io/en/latest/_modules/opentelemetry/trace/span.html
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show exact current code around the helper definition and its use sites
python - <<'PY'
import pathlib, itertools
path = pathlib.Path("sdks/python/agenta/sdk/engines/tracing/processors.py")
lines = path.read_text().splitlines()
for start, end in [(1,80),(100,180),(180,260)]:
print(f"\n--- {path} lines {start}-{end} ---")
for i in range(start-1, min(end, len(lines))):
print(f"{i+1:4d}: {lines[i]}")
PY
# Find all usages of _as_otel_id in the repo (quick scope)
rg -n "_as_otel_id\(" sdks/python/agenta/sdk/engines/tracing/processors.py sdks/python/agenta/sdk -S || true
# Run ruff check on the touched file (no auto-fix)
ruff check sdks/python/agenta/sdk/engines/tracing/processors.py || trueRepository: Agenta-AI/agenta
Length of output: 10251
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect Link model/type hints
for f in sdks/python/agenta/sdk/models/shared.py sdks/python/agenta/sdk/models/tracing.py; do
if [ -f "$f" ]; then
echo "=== $f (relevant sections) ==="
rg -n "class Link|Link\\b|trace_id|span_id" "$f" | head -n 80
echo
fi
done
# Find where context.links is built/populated
rg -n "context\\.links|links\\s*=\\s*|\"trace_id\"\\s*[:]" sdks/python/agenta/sdk | head -n 120Repository: Agenta-AI/agenta
Length of output: 2461
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Print the relevant model definitions for TraceID/SpanID/Link and OTelLink
sed -n '1,220p' sdks/python/agenta/sdk/models/shared.py | cat -n | head -n 220
echo "----"
sed -n '130,260p' sdks/python/agenta/sdk/models/tracing.py | cat -n | head -n 160
# Inspect how tracing_ctx.links is assembled (formatting assumptions)
sed -n '250,470p' sdks/python/agenta/sdk/decorators/running.py | cat -n | sed -n '250,470p' || true
echo "----"
sed -n '300,520p' sdks/python/agenta/sdk/decorators/tracing.py | cat -n | sed -n '300,520p' || true
# Inspect the link creation in evaluation runtime processor (if used for re-run/process links)
sed -n '620,760p' sdks/python/agenta/sdk/evaluations/runtime/processor.py | cat -n | sed -n '620,760p' || trueRepository: Agenta-AI/agenta
Length of output: 10756
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Where tracing_ctx.links is assigned
rg -n "tracing_ctx\\.links\\s*=|self\\.links\\s*=|links=request\\.links|upstream\\.get\\(\"links\"\\)" sdks/python/agenta/sdk/decorators/running.py sdks/python/agenta/sdk/evaluations/runtime/processor.py sdks/python/agenta/sdk/decorators/tracing.py
# Show the relevant blocks around those assignments
sed -n '300,460p' sdks/python/agenta/sdk/decorators/running.py | cat -n
echo "----"
sed -n '380,520p' sdks/python/agenta/sdk/decorators/running.py | cat -n
echo "----"
sed -n '620,760p' sdks/python/agenta/sdk/evaluations/runtime/processor.py | cat -n
echo "----"
sed -n '680,740p' sdks/python/agenta/sdk/evaluations/runtime/processor.py | cat -n
echo "----"
sed -n '300,420p' sdks/python/agenta/sdk/decorators/tracing.py | cat -nRepository: Agenta-AI/agenta
Length of output: 25262
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
import inspect
from opentelemetry.trace import SpanContext
from opentelemetry.sdk.trace import Span
print("SpanContext module:", SpanContext.__module__)
print("SpanContext qualname:", SpanContext.__qualname__)
print("\nSpanContext.__init__ source (truncated):")
src = inspect.getsource(SpanContext.__init__)
print("\n".join(src.splitlines()[:120]))
print("\n---")
print("Span.add_link module:", Span.add_link.__module__)
print("Span.add_link qualname:", Span.add_link.__qualname__)
print("\nSpan.add_link source (truncated):")
src2 = inspect.getsource(Span.add_link)
print("\n".join(src2.splitlines()[:160]))
PYRepository: Agenta-AI/agenta
Length of output: 1078
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
import opentelemetry.trace.span as span_mod
import opentelemetry.sdk.trace as sdk_trace
import inspect
from opentelemetry.trace import SpanContext
from opentelemetry.sdk.trace import Span
print("span_mod file:", span_mod.__file__)
print("SpanContext:", SpanContext, "module:", SpanContext.__module__)
print("Span.add_link module:", Span.add_link.__module__, "qualname:", Span.add_link.__qualname__)
PY
# Show relevant portions of SpanContext implementation from installed opentelemetry
SPAN_MOD_FILE=$(python - <<'PY'
import opentelemetry.trace.span as span_mod
print(span_mod.__file__)
PY
)
echo "---- SpanContext implementation snippets from: $SPAN_MOD_FILE ----"
python - <<'PY'
import re, pathlib
path = pathlib.Path(open(__import__("opentelemetry.trace.span").trace.span.__file__).name)
PYRepository: Agenta-AI/agenta
Length of output: 426
🏁 Script executed:
#!/bin/bash
set -euo pipefail
SPAN_PY="/usr/local/lib/python3.11/dist-packages/opentelemetry/trace/span.py"
echo "Using: $SPAN_PY"
ls -l "$SPAN_PY"
# Print SpanContext class definition area and ID validation helpers
python - <<PY
import pathlib, re
path = pathlib.Path("$SPAN_PY")
txt = path.read_text()
# Find SpanContext class start
m = re.search(r"class SpanContext\\b", txt)
print("SpanContext class start idx:", m.start() if m else None)
# Print first ~300 lines from SpanContext onwards (bounded by file size)
lines = txt.splitlines()
start = m and txt[:m.start()].count("\\n") or 0
for i in range(start, min(start+320, len(lines))):
print(f"{i+1:5d}: {lines[i]}")
PY
echo "----"
# Extract Span.add_link implementation
python - <<'PY'
import pathlib, re
import opentelemetry.sdk.trace as sdk_trace
import opentelemetry.sdk.trace
from opentelemetry.sdk.trace import Span
import inspect
# We'll locate span.py inside sdk trace package
import opentelemetry.sdk.trace as st
import opentelemetry
import pkgutil
import os, sys
import opentelemetry.sdk.trace
import opentelemetry.sdk.trace as sdk_trace
# Locate file containing Span.add_link
import opentelemetry.sdk.trace as sdk_trace_mod
import inspect
span_file = None
for name in dir(sdk_trace_mod):
pass
# Directly import the module where Span class is defined
# Span is in opentelemetry/sdk/trace/__init__.py which re-exports; find actual class
import opentelemetry.sdk.trace as sdk
span_cls = Span
print("Span class module:", span_cls.__module__)
import importlib
mod = importlib.import_module(span_cls.__module__)
print("Span class file:", mod.__file__)
PY
SDK_SPAN_FILE="$(python - <<'PY'
import importlib
from opentelemetry.sdk.trace import Span
mod = importlib.import_module(Span.__module__)
print(mod.__file__)
PY
)"
echo "Using: $SDK_SPAN_FILE"
# Grep around add_link in sdk span file
rg -n "def add_link\\b" "$SDK_SPAN_FILE" || true
if rg -n "def add_link\\b" "$SDK_SPAN_FILE" >/dev/null; then
LINE=$(rg -n "def add_link\\b" "$SDK_SPAN_FILE" | head -n 1 | cut -d: -f1)
START=$((LINE-30))
END=$((LINE+180))
sed -n "${START},${END}p" "$SDK_SPAN_FILE" | cat -n
fiRepository: Agenta-AI/agenta
Length of output: 15538
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find OpenTelemetry's link validity helper
SDK_INIT="/usr/local/lib/python3.11/dist-packages/opentelemetry/sdk/trace/__init__.py"
ls -l "$SDK_INIT"
rg -n "def _is_valid_link|_is_valid_link\\(" "$SDK_INIT" || true
if rg -n "def _is_valid_link" "$SDK_INIT" >/dev/null; then
LINE=$(rg -n "def _is_valid_link" "$SDK_INIT" | head -n1 | cut -d: -f1)
START=$((LINE-30)); END=$((LINE+120))
sed -n "${START},${END}p" "$SDK_INIT" | cat -n
fi
# 2) Inspect TracingContext model/usage of `link` vs `links`
rg -n "class TracingContext|context\\.links|context\\.link\\b" sdks/python/agenta/sdk/contexts sdks/python/agenta/sdk -S | head -n 80
# 3) Read TracingContext definition
CTX_FILE=$(fd -a "tracing.py" sdks/python/agenta/sdk/contexts | head -n 5 | head -n 1 || true)
echo "CTX_FILE=$CTX_FILE"
if [ -n "$CTX_FILE" ] && [ -f "$CTX_FILE" ]; then
sed -n '1,220p' "$CTX_FILE" | cat -n
fi
# 4) Check how `context.link` (singular) is converted into `context.links` (plural), if at all
rg -n "context\\.link\\b|\\.link\\b\\s*=" sdks/python/agenta/sdk | head -n 120Repository: Agenta-AI/agenta
Length of output: 9323
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find where runtime processor remembers context and how trace_id/span_id are sourced
rg -n "_remember_context\\(" sdks/python/agenta/sdk/evaluations/runtime/processor.py
# Inspect around those call sites
# (grab a generous range around each match)
python - <<'PY'
import re, pathlib
path = pathlib.Path("sdks/python/agenta/sdk/evaluations/runtime/processor.py")
lines = path.read_text().splitlines()
for i,l in enumerate(lines, start=1):
if "_remember_context(" in l:
start=max(1,i-30); end=min(len(lines), i+60)
print(f"\n--- {path}:{start}-{end} around line {i} ---")
for j in range(start,end+1):
print(f"{j:5d}: {lines[j-1]}")
PY
echo "----"
# Locate WorkflowInvokeRequest definition to see links typing/shape
rg -n "class WorkflowInvokeRequest|WorkflowInvokeRequest\\b" sdks/python/agenta/sdk/decorators/running.py
# Show definition block
python - <<'PY'
import pathlib, re
path = pathlib.Path("sdks/python/agenta/sdk/decorators/running.py")
txt = path.read_text().splitlines()
for i,l in enumerate(txt, start=1):
if "class WorkflowInvokeRequest" in l:
start=i; end=min(len(txt), i+180)
for j in range(start,end+1):
print(f"{j:5d}: {txt[j-1]}")
break
PY
echo "----"
# Show links field types in WorkflowInvokeRequest (if present)
rg -n "links\\s*:" sdks/python/agenta/sdk/decorators/running.py | head -n 80Repository: Agenta-AI/agenta
Length of output: 8978
🏁 Script executed:
#!/bin/bash
set -euo pipefail
RUNNING="sdks/python/agenta/sdk/decorators/running.py"
# Print around WorkflowInvokeRequest definition including links typing
python - <<'PY'
import pathlib
path = pathlib.Path("sdks/python/agenta/sdk/decorators/running.py")
lines = path.read_text().splitlines()
for i,l in enumerate(lines, start=1):
if "class WorkflowInvokeRequest" in l:
start=i-10
end=i+160
for j in range(start,end+1):
if 1 <= j <= len(lines):
print(f"{j:5d}: {lines[j-1]}")
break
PY
# Print Link import section / Link type alias in running.py
sed -n '1,120p' "$RUNNING" | cat -n | head -n 120
# Ensure we know what Link refers to: search for "from ... import Link" and "Link =" occurrences
rg -n "from .* import .*Link\\b|\\bLink\\b\\s*=\\s*|WorkflowInvokeRequest\\b" "$RUNNING" | head -n 80Repository: Agenta-AI/agenta
Length of output: 5023
🏁 Script executed:
#!/bin/bash
set -euo pipefail
WORKF="sdks/python/agenta/sdk/models/workflows.py"
ls -l "$WORKF" 2>/dev/null || true
rg -n "class WorkflowInvokeRequest\\b|WorkflowInvokeRequest\\b" "$WORKF" | head -n 40
rg -n "\\bclass Link\\b|\\bLink\\b\\s*=" "$WORKF" | head -n 80
# Print around WorkflowInvokeRequest definition
python - <<'PY'
import pathlib
path = pathlib.Path("sdks/python/agenta/sdk/models/workflows.py")
lines = path.read_text().splitlines()
idx = None
for i,l in enumerate(lines):
if "class WorkflowInvokeRequest" in l:
idx = i+1
break
if idx:
start=max(1,idx-10); end=min(len(lines), idx+220)
for j in range(start,end+1):
print(f"{j:5d}: {lines[j-1]}")
else:
print("WorkflowInvokeRequest not found")
PY
echo "----"
# Print around Link-related type imports/aliases in workflows.py
python - <<'PY'
import pathlib, re
path = pathlib.Path("sdks/python/agenta/sdk/models/workflows.py")
txt = path.read_text().splitlines()
for i,l in enumerate(txt, start=1):
if "Link" in l and ("from " in l or "import" in l or "=" in l):
if i<200:
print(f"{i:5d}: {l}")
PYRepository: Agenta-AI/agenta
Length of output: 258
🏁 Script executed:
#!/bin/bash
set -euo pipefail
WORKF="sdks/python/agenta/sdk/models/workflows.py"
# Show around WorkflowInvokeRequest
sed -n '220,330p' "$WORKF" | cat -n
echo "----"
# Search for Link usage/imports/aliases in this file
rg -n "\\bLink\\b|\\bOTelLink\\b|trace_id|span_id|links:" "$WORKF" | head -n 120
echo "----"
# Print full file if small? It's 22k lines? Actually 22,078 bytes; safe to show entire with cat -n maybe ~500 lines.
wc -l "$WORKF"
sed -n '1,220p' "$WORKF" | cat -nRepository: Agenta-AI/agenta
Length of output: 11125
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find all uses of TracingContext.link and how it becomes links
rg -n "context\\.link\\b|\\.link\\b\\s*=" sdks/python/agenta/sdk | head -n 120
echo "----"
# 2) Inspect where evaluations `execution.trace_id` / `execution.span_id` are created
rg -n "execution\\.trace_id|execution\\.span_id" sdks/python/agenta/sdk/evaluations/runtime/processor.py | head -n 80
echo "----"
# 3) Search for the type/model that defines `trace_id`/`span_id` on an `execution`
rg -n "class .*Execution|Execution\\b" sdks/python/agenta/sdk/evaluations/runtime/processor.py sdks/python/agenta/sdk/evaluations/runtime -S | head -n 80
echo "----"
# 4) Search for assignments to `trace_id=` near execution creation
rg -n "trace_id\\s*=" sdks/python/agenta/sdk/evaluations/runtime/processor.py | head -n 120
rg -n "span_id\\s*=" sdks/python/agenta/sdk/evaluations/runtime/processor.py | head -n 120Repository: Agenta-AI/agenta
Length of output: 2004
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect conversion between TracingContext.link -> TracingContext.links
sed -n '250,380p' sdks/python/agenta/sdk/decorators/tracing.py | cat -n
echo "----"
# 2) Inspect evaluation runtime models for types of trace_id/span_id on execution/results
sed -n '1,220p' sdks/python/agenta/sdk/evaluations/runtime/models.py | cat -n
echo "----"
# 3) Inspect runtime executor/execution creation around trace_id/span_id
rg -n "trace_id:|span_id:|trace_id=|span_id=" sdks/python/agenta/sdk/evaluations/runtime/executor.py | head -n 120
# Print a portion around first matches for trace_id/span_id assignments
python - <<'PY'
import pathlib, re
path = pathlib.Path("sdks/python/agenta/sdk/evaluations/runtime/executor.py")
lines = path.read_text().splitlines()
hits=[]
for i,l in enumerate(lines, start=1):
if re.search(r"\btrace_id\b|\bspan_id\b", l) and ("=" in l or ":" in l):
hits.append(i)
print("first hits:", hits[:20])
for i in hits[:3]:
start=max(1,i-40); end=min(len(lines), i+60)
print(f"\n--- executor.py:{start}-{end} around line {i} ---")
for j in range(start,end+1):
print(f"{j:5d}: {lines[j-1]}")
PYRepository: Agenta-AI/agenta
Length of output: 19457
Fix OTel link ID parsing to respect trace_id(128) vs span_id(64) widths
_as_otel_id (lines 22-26) parses both trace_id and span_id with the same base-16 logic; if span_id arrives in a UUID/dashed-UUID form, the resulting integer can exceed the OpenTelemetry 64-bit span_id limit. OpenTelemetry SpanContext(...) won’t throw on out-of-range IDs (it marks is_valid=False), but Span.add_link(...) can still record the link because _is_valid_link treats the presence of attributes as sufficient. Since this code passes attributes=dict(key=str(key)) (lines 133-142), invalid link contexts may still be attached/exported with incorrect linkage.
Suggested fix
-def _as_otel_id(value) -> int:
- # Link ids arrive as bare hex (live spans) or as dashed UUIDs (recovered from
- # stored result cells on the re-run/process path). Both encode the same
- # integer, so strip dashes before parsing base-16.
- return int(str(value).replace("-", ""), 16)
+def _as_otel_id(value: object, *, bits: int) -> Optional[int]:
+ cleaned = str(value).replace("-", "")
+ try:
+ parsed = int(cleaned, 16)
+ except (TypeError, ValueError):
+ return None
+ return parsed & ((1 << bits) - 1)- span.add_link(
- context=SpanContext(
- trace_id=_as_otel_id(link.get("trace_id")),
- span_id=_as_otel_id(link.get("span_id")),
- is_remote=True,
- ),
- attributes=dict(
- key=str(key),
- ),
- )
+ parsed_trace_id = _as_otel_id(link.get("trace_id"), bits=128)
+ parsed_span_id = _as_otel_id(link.get("span_id"), bits=64)
+ if parsed_trace_id is None or parsed_span_id is None:
+ continue
+ span.add_link(
+ context=SpanContext(
+ trace_id=parsed_trace_id,
+ span_id=parsed_span_id,
+ is_remote=True,
+ ),
+ attributes=dict(key=str(key)),
+ )| const disabledRevisionIds = useMemo( | ||
| () => | ||
| new Set<string>([ | ||
| ...(connected.map((evaluator) => evaluator.id).filter(Boolean) as string[]), | ||
| ...pending.map((selection) => selection.id), | ||
| ]), | ||
| [connected, pending], | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how disabledChildIds is consumed in PopoverCascaderVariant and what IDs the cascader items use
ast-grep --pattern 'disabledChildIds'
rg -n -C5 'disabledChildIds' --type=tsx --type=tsRepository: Agenta-AI/agenta
Length of output: 1308
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect how disabledChildIds is checked (what "childId" is)
sed -n '520,700p' web/packages/agenta-entity-ui/src/selection/hooks/modes/useTreeSelectMode.ts
echo '---'
sed -n '330,420p' web/packages/agenta-entity-ui/src/selection/hooks/modes/useListPopoverMode.ts
echo '---'
# 2) Inspect where disabledChildIds is threaded into picker UI
sed -n '600,720p' web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx
echo '---'
sed -n '180,280p' web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/ListPopoverVariant.tsx
echo '---'
sed -n '70,160p' web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/shared/ChildPopoverContent.tsx
echo '---'
# 3) Inspect EditEvaluationDrawer to confirm exact types of connected/pending ids
sed -n '1,140p' web/oss/src/components/EditEvaluationDrawer/index.tsx
echo '---'
sed -n '140,340p' web/oss/src/components/EditEvaluationDrawer/index.tsx
echo '---'
# 4) Inspect evaluators.ts where definition.id is derived (as referenced in the review)
# Use fd to find the relevant file.
FILE=$(fd -a 'evaluators.ts' web/oss web/packages | head -n 1 || true)
echo "Found evaluators.ts at: $FILE"
if [[ -n "${FILE}" ]]; then
sed -n '130,220p' "$FILE"
fiRepository: Agenta-AI/agenta
Length of output: 35346
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find adapters
rg -n "useEnrichedEvaluatorOnlyAdapter" web/packages web/oss -S
rg -n "useEnrichedHumanEvaluatorAdapter" web/packages web/oss -S
# Inspect adapters implementations
# (Use fd to locate files containing these hooks)
FILES=$(rg -l "useEnrichedEvaluatorOnlyAdapter" web/packages web/oss -S || true)
for f in $FILES; do echo "---- $f"; sed -n '1,220p' "$f"; done
FILES=$(rg -l "useEnrichedHumanEvaluatorAdapter" web/packages web/oss -S || true)
for f in $FILES; do echo "---- $f"; sed -n '1,220p' "$f"; sed -n '1,220p' "$f"; done
# Inspect PopoverCascaderVariant hierarchy levels / getId usage for child nodes
rg -n "popover-cascader" web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx
sed -n '1,260p' web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx
sed -n '260,520p' web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx
# Also inspect shared hierarchy config building (if present)
rg -n "getId\\(.*childLevelConfig" web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx -S || true
rg -n "hierarchyLevels" web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/PopoverCascaderVariant.tsx -S || true
# Locate where TreeSelectNode gets created from childLevelConfig.getId and how getId is wired
rg -n "childLevelConfig\\.getId" web/packages/agenta-entity-ui/src/selection -SRepository: Agenta-AI/agenta
Length of output: 50375
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the workflow revision adapter implementation
f=$(fd -a "workflowRevisionRelationAdapter.ts" web/packages/agenta-entity-ui/src/selection/adapters web/packages web/oss | head -n 1 || true)
echo "workflowRevisionRelationAdapter.ts: ${f:-NOT_FOUND}"
# If found, inspect parts where hierarchy levels + getId are defined
if [[ -n "${f:-}" ]]; then
rg -n "createWorkflowRevisionAdapter|createEvaluatorRevisionLevel|getId\\s*\\(" "$f" --context 2
# Show candidate blocks around createWorkflowRevisionAdapter definition
rg -n "function createWorkflowRevisionAdapter|export function createWorkflowRevisionAdapter" "$f"
# Show nearby section around revision level config
rg -n "revision.*getId|getId:.*revision|getId\\s*\\(.*revision" "$f" --context 3
# Print a window around the most relevant match
m=$(rg -n "getId\\s*[:=].*revision|getId\\s*\\(.*revision|createEvaluatorRevisionLevel|getEvaluator" "$f" | head -n 1 | cut -d: -f1 || true)
if [[ -n "${m:-}" ]]; then
start=$((m-60)); end=$((m+120))
sed -n "${start},${end}p" "$f"
fi
fi
# Also inspect evaluator revision level factory which likely defines leaf IDs
ef=$(fd -a "createEvaluatorRevisionLevel" web/packages/agenta-entity-ui/src/selection/adapters web/packages/agenta-entity-ui/src/selection -t ts -t tsx | head -n 20 || true)
echo "createEvaluatorRevisionLevel matches: ${ef}"
for g in $(rg -l "createEvaluatorRevisionLevel" web/packages/agenta-entity-ui/src/selection/adapters web/packages/agenta-entity-ui/src/selection | head -n 20); do
echo "---- $g"
rg -n "createEvaluatorRevisionLevel" "$g" --context 3
done
# Finally, inspect how WorkflowRevisionSelectionResult.id is constructed
rr=$(rg -l "WorkflowRevisionSelectionResult" web/packages/agenta-entity-ui/src/selection/adapters | head -n 20 || true)
for h in $rr; do
echo "---- $h"
rg -n "id\\s*:" "$h" --context 2 | head -n 20
doneRepository: Agenta-AI/agenta
Length of output: 15162
🏁 Script executed:
#!/bin/bash
set -euo pipefail
f="web/packages/agenta-entity-ui/src/selection/adapters/revisionLevelFactory.ts"
echo "---- $f (relevant sections) ----"
# Find createRevisionLevel and show its level config, especially getId/getLabel
rg -n "function createRevisionLevel|export function createRevisionLevel|createRevisionLevel\\(" "$f"
# show around createRevisionLevel definition
line=$(rg -n "function createRevisionLevel|export function createRevisionLevel" "$f" | head -n 1 | cut -d: -f1)
start=$((line-60)); end=$((line+220))
sed -n "${start},${end}p" "$f"
# show around createEvaluatorRevisionLevel
rg -n "createEvaluatorRevisionLevel" "$f"
line2=$(rg -n "export function createEvaluatorRevisionLevel" "$f" | head -n 1 | cut -d: -f1)
start2=$((line2-40)); end2=$((line2+140))
sed -n "${start2},${end2}p" "$f"
# Also inspect where workflowRevisionLevel is constructed if it exists in this file
rg -n "workflowRevisionLevel|revisionLevel" "$f"Repository: Agenta-AI/agenta
Length of output: 8474
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find where workflowRevisionsListQueryStateAtomFamily is defined (entities/workflow)
rg -n "workflowRevisionsListQueryStateAtomFamily" web/packages web/oss -S --hidden
# 2) Inspect the definition and, crucially, what the returned "revision" entity shape is
f=$(rg -l "workflowRevisionsListQueryStateAtomFamily" web/packages web/oss -S --hidden | head -n 1 || true)
echo "First match: $f"
if [[ -n "${f:-}" ]]; then
sed -n '1,260p' "$f" | head -n 200
# narrow to the atom/query logic if possible
rg -n "atomFamily|family|return|workflowRevisionsListQueryStateAtomFamily" "$f" --context 2 | head -n 80
fi
# 3) Confirm what workflowToRevisionRelation uses for listing (skipVariantLevel path)
a=$(fd -a "workflowRevisionRelationAdapter.ts" web/packages/agenta-entity-ui/src/selection/adapters web/packages web/oss | head -n 5 || true)
echo "workflowRevisionRelationAdapter candidates:"
echo "$a"
for g in $a; do
echo "---- $g: workflowToRevisionRelation definition ----"
rg -n "workflowToRevisionRelation" "$g" --context 2 | head -n 40
# show nearby block around workflowToRevisionRelation
line=$(rg -n "workflowToRevisionRelation" "$g" | head -n 1 | cut -d: -f1)
start=$((line-80)); end=$((line+120))
sed -n "${start},${end}p" "$g"
done
# 4) Inspect evaluator refs mapping in evaluators.ts for artifactId vs revisionId meaning
ev="web/oss/src/components/EvalRunDetails/atoms/table/evaluators.ts"
if [[ -f "$ev" ]]; then
rg -n "artifactId|revisionId" "$ev" --context 3
fiRepository: Agenta-AI/agenta
Length of output: 11188
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect workflowToRevisionRelation mapping
rel="web/packages/agenta-entities/src/workflow/relations.ts"
echo "---- $rel (workflowToRevisionRelation) ----"
rg -n "workflowToRevisionRelation" "$rel"
line=$(rg -n "workflowToRevisionRelation" "$rel" | head -n 1 | cut -d: -f1)
start=$((line-120)); end=$((line+220))
sed -n "${start},${end}p" "$rel" | head -n 220
# 2) Inspect workflowRevisionsListQueryStateAtomFamily implementation (store.ts)
store="web/packages/agenta-entities/src/workflow/state/store.ts"
echo "---- $store (workflowRevisionsListQueryStateAtomFamily) ----"
rg -n "workflowRevisionsListQueryStateAtomFamily" "$store"
line2=$(rg -n "export const workflowRevisionsListQueryStateAtomFamily" "$store" | head -n 1 | cut -d: -f1)
start2=$((line2-60)); end2=$((line2+240))
sed -n "${start2},${end2}p" "$store" | head -n 260
# 3) Inspect evaluator refs meaning (artifactId vs revisionId) in evaluator extraction core (if present)
core="web/packages/agenta-entities/src/workflow/core/index.ts"
if [[ -f "$core" ]]; then
echo "---- $core (extractEvaluatorRef / type defs) ----"
rg -n "artifactId|revisionId" "$core" | head -n 50
else
echo "Core index not found at $core; skipping."
fi
# Find extractEvaluatorRef definition if separate file
ex=$(fd -a "extractEvaluatorRef" web/packages/agenta-entities/src/workflow | head -n 5 || true)
for f in $ex; do echo "---- $f ----"; rg -n "artifactId|revisionId" "$f" --context 2; doneRepository: Agenta-AI/agenta
Length of output: 2224
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Helpers
safe_start_end () { # $1=line $2=back $3=forward
local line="$1" back="$2" fwd="$3"
local start=$((line-back))
local end=$((line+fwd))
if (( start < 1 )); then start=1; fi
echo "$start $end"
}
# 1) workflowToRevisionRelation actual mapping
rel="web/packages/agenta-entities/src/workflow/relations.ts"
line=$(rg -n "export const workflowToRevisionRelation" "$rel" | head -n 1 | cut -d: -f1)
read start end < <(safe_start_end "$line" 80 220)
echo "---- $rel (lines $start-$end) ----"
sed -n "${start},${end}p" "$rel"
# 2) workflowRevisionsListQueryStateAtomFamily: what does "revision entity.id" represent?
store="web/packages/agenta-entities/src/workflow/state/store.ts"
line2=$(rg -n "export const workflowRevisionsListQueryStateAtomFamily" "$store" | head -n 1 | cut -d: -f1)
read start2 end2 < <(safe_start_end "$line2" 40 260)
echo "---- $store (lines $start2-$end2) ----"
sed -n "${start2},${end2}p" "$store"
# 3) Find where revision list items are shaped (mapping to entity fields)
# Search for extractRevisionFields usage in workflow state store
rg -n "extractRevisionFields|workflowRevision|Revision" "$store" | head -n 80
# 4) In case the store only calls query, inspect the query function that returns revisions
# Find "queryWorkflowRevisions" implementation
qf=$(fd -a "queryWorkflowRevisions" web/packages/agenta-entities/src/workflow | head -n 5 || true)
echo "queryWorkflowRevisions candidates:"
echo "$qf"
if [[ -n "${qf:-}" ]]; then
for f in $qf; do
echo "---- $f (context) ----"
rg -n "queryWorkflowRevisions|queryWorkflowRevisionsByWorkflow|queryWorkflowRevisionsByWorkflows" "$f"
# print around first match
l=$(rg -n "queryWorkflowRevisionsByWorkflow|queryWorkflowRevisions" "$f" | head -n 1 | cut -d: -f1)
read s e < <(safe_start_end "$l" 60 220)
sed -n "${s},${e}p" "$f"
done
fiRepository: Agenta-AI/agenta
Length of output: 118
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rel="web/packages/agenta-entities/src/workflow/relations.ts"
line=$(rg -n "export const workflowToRevisionRelation" "$rel" | head -n 1 | cut -d: -f1)
start=$((line-80)); if (( start<1 )); then start=1; fi
end=$((line+220))
echo "---- $rel (lines $start-$end) ----"
sed -n "${start},${end}p" "$rel"
store="web/packages/agenta-entities/src/workflow/state/store.ts"
line2=$(rg -n "export const workflowRevisionsListQueryStateAtomFamily" "$store" | head -n 1 | cut -d: -f1)
start2=$((line2-40)); if (( start2<1 )); then start2=1; fi
end2=$((line2+240))
echo "---- $store (lines $start2-$end2) ----"
sed -n "${start2},${end2}p" "$store"
# Inspect how revision entities are shaped from API results (id/version/etc)
# Look for extractRevisionFields usage in workflow store + helpers
rg -n "extractRevisionFields\\(|getField<.*>\\(entity" web/packages/agenta-entities/src/workflow -SRepository: Agenta-AI/agenta
Length of output: 19206
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find where workflowRevisionsByWorkflowQueryAtomFamily is defined and inspect how refs.id is built
store="web/packages/agenta-entities/src/workflow/state/store.ts"
qatom="$(rg -n "workflowRevisionsByWorkflowQueryAtomFamily" "$store" -S || true)"
echo "Matches for workflowRevisionsByWorkflowQueryAtomFamily:"
echo "$qatom" | head -n 20
f="$(rg -l "workflowRevisionsByWorkflowQueryAtomFamily" web/packages/agenta-entities/src/workflow/state | head -n 1 || true)"
echo "First defining file: $f"
if [[ -n "${f:-}" ]]; then
# Print around the atomFamily definition
line=$(rg -n "export const workflowRevisionsByWorkflowQueryAtomFamily" "$f" | head -n 1 | cut -d: -f1 || true)
if [[ -n "${line:-}" ]]; then
start=$((line-60)); end=$((line+220))
sed -n "${start},${end}p" "$f"
fi
# Also inspect the underlying API function mapping if called directly
api="$(rg -n "queryWorkflowRevisionsByWorkflow|queryWorkflowRevisions\\(" "$f" -S || true)"
echo "Underlying query references in $f:"
echo "$api" | head -n 50
fi
# 2) Inspect WorkflowReference / QueryResult types (to see if refs.id is revision-id)
types_dir="web/packages/agenta-entities/src/workflow"
typefile="$(rg -l "export type WorkflowReference" "$types_dir" web/packages/agenta-entities/src/workflow -S | head -n 20 || true)"
echo "WorkflowReference files:"
echo "$typefile"
for t in $typefile; do
echo "---- $t ----"
rg -n "export type WorkflowReference|type WorkflowReference" "$t" --context 3
done
# 3) Inspect extractEvaluatorRef types/meaning of artifactId vs revisionId (likely in core)
core_dir="web/packages/agenta-entities/src/workflow/core"
core_file="$(fd -a "core" "$core_dir" 2>/dev/null || true)"
# just search for artifactId/revisionId fields
rg -n "artifactId|revisionId" "$core_dir" -S | head -n 80Repository: Agenta-AI/agenta
Length of output: 13181
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect EvaluatorDefinition type (does it expose revisionId separately?)
evdef_file="$(rg -l "export type EvaluatorDefinition" web/packages/agenta-entities/src/workflow web/oss/src -S | head -n 1 || true)"
echo "EvaluatorDefinition file: ${evdef_file:-NOT_FOUND}"
if [[ -n "${evdef_file:-}" ]]; then
rg -n "export type EvaluatorDefinition|type EvaluatorDefinition" "$evdef_file" --context 6
# show a larger window around it
line=$(rg -n "export type EvaluatorDefinition|type EvaluatorDefinition" "$evdef_file" | head -n 1 | cut -d: -f1)
start=$((line-40)); end=$((line+120))
sed -n "${start},${end}p" "$evdef_file"
fi
# 2) Inspect evaluatorDefinitionByRevisionQueryAtomFamily usage/definition
atom_file="$(rg -l "evaluatorDefinitionByRevisionQueryAtomFamily" web/oss/src -S | head -n 1 || true)"
echo "evaluatorDefinitionByRevisionQueryAtomFamily file: ${atom_file:-NOT_FOUND}"
if [[ -n "${atom_file:-}" ]]; then
line=$(rg -n "evaluatorDefinitionByRevisionQueryAtomFamily" "$atom_file" | head -n 1 | cut -d: -f1)
start=$((line-80)); end=$((line+200))
sed -n "${start},${end}p" "$atom_file"
fi
# 3) Inspect EvaluatorDefinition construction: toEvaluatorDefinitionFromRaw / evaluatorResolution
core_dir="web/packages/agenta-entities/src/workflow/core"
for f in $(rg -l "toEvaluatorDefinitionFromRaw|extractEvaluatorRef" "$core_dir" -S | head -n 20); do
echo "---- $f ----"
rg -n "toEvaluatorDefinitionFromRaw|extractEvaluatorRef|artifactId|revisionId" "$f" --context 3
doneRepository: Agenta-AI/agenta
Length of output: 2052
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find EvaluatorDefinition type
rg -n "EvaluatorDefinition" web/packages/agenta-entities/src/workflow -S --hidden | head -n 50
rg -n "export type EvaluatorDefinition|type EvaluatorDefinition|interface EvaluatorDefinition" web/packages/agenta-entities/src/workflow -S --hidden | head -n 50
# 2) Inspect evaluatorDefinitionByRevisionQueryAtomFamily implementation (fix snippet around definition)
file="web/oss/src/components/EvalRunDetails/atoms/table/evaluators.ts"
echo "---- $file (definition + key id handling) ----"
rg -n "evaluatorDefinitionByRevisionQueryAtomFamily" "$file" -S
line=$(rg -n "evaluatorDefinitionByRevisionQueryAtomFamily" "$file" | head -n 1 | cut -d: -f1)
start=$((line>120?line-120:1))
end=$((line+220))
sed -n "${start},${end}p" "$file"
# 3) Inspect queryWorkflowRevisionsByWorkflow to see what r.id represents
core_dir="web/packages/agenta-entities/src/workflow"
rg -n "queryWorkflowRevisionsByWorkflow" "$core_dir" -S --hidden
qfile=$(rg -l "queryWorkflowRevisionsByWorkflow" "$core_dir" --hidden | head -n 1 || true)
echo "First query file: ${qfile:-NOT_FOUND}"
if [[ -n "${qfile:-}" ]]; then
qline=$(rg -n "queryWorkflowRevisionsByWorkflow" "$qfile" | head -n 1 | cut -d: -f1)
start=$((qline>140?qline-140:1))
end=$((qline+260))
sed -n "${start},${end}p" "$qfile"
fi
# 4) Inspect the API response shape for workflow revisions listing (look for id/artifact_id fields)
# This helps determine whether response.workflow_revisions[].id is revision id or artifact id.
rg -n "workflow_revisions" "$core_dir" -S --hidden | head -n 50
rg -n "artifact.*id|revision.*id|artifact_id|revision_id" "$core_dir" -S --hidden | head -n 50Repository: Agenta-AI/agenta
Length of output: 29036
🏁 Script executed:
#!/bin/bash
set -euo pipefail
f="web/packages/agenta-entities/src/workflow/core/evaluatorResolution.ts"
echo "---- $f (EvaluatorDefinition + extractEvaluatorRef) ----"
# show around EvaluatorDefinition interface and extractEvaluatorRef
rg -n "export interface EvaluatorDefinition|export const extractEvaluatorRef|artifactId\\?|revisionId\\?" "$f"
line=$(rg -n "export interface EvaluatorDefinition" "$f" | head -n 1 | cut -d: -f1)
start=$((line-40)); end=$((line+120))
sed -n "${start},${end}p" "$f"
line2=$(rg -n "export const extractEvaluatorRef" "$f" | head -n 1 | cut -d: -f1)
start2=$((line2-40)); end2=$((line2+140))
sed -n "${start2},${end2}p" "$f"
line3=$(rg -n "toEvaluatorDefinitionFromWorkflow|toEvaluatorDefinitionFromRaw" "$f" | head -n 1 | cut -d: -f1)
start3=$((line3-40)); end3=$((line3+200))
sed -n "${start3},${end3}p" "$f"Repository: Agenta-AI/agenta
Length of output: 15707
Fix disabledChildIds ID mismatch in EditEvaluationDrawer picker
disabledRevisionIds (web/oss/src/components/EditEvaluationDrawer/index.tsx, lines 80-87) mixes ID types: connected uses EvaluatorDefinition.id which is overridden with ref.artifactId ?? ref.revisionId ?? refId (artifact id takes precedence), while the popover-cascader picker disables children by checking disabledChildIds.has(childLevelConfig.getId(child))—and for the workflow→revision level, getId is the revision entity’s id (revision id). As a result, already-connected evaluators may not be disabled.
Derive disabledRevisionIds for connected from the revision IDs (or adjust the evaluationEvaluatorsByRunQueryAtomFamily/EvaluatorDefinition shaping to preserve revisionId alongside artifactId).
| const BRIDGE_ATTEMPTS = 15 | ||
| const BRIDGE_INTERVAL_MS = 2000 | ||
|
|
||
| /** | ||
| * Bridge the async reprocess after adding an evaluator. | ||
| * | ||
| * `dispatch_run_slice` flips the run to RUNNING *synchronously* before the 202 returns | ||
| * (api/.../service.py), then the worker fills the new evaluator's cells and finalizes back | ||
| * to a terminal status (~seconds). So by the time this runs the run is already RUNNING; | ||
| * we poll every surface until the run reads TERMINAL again — which means the worker is done | ||
| * and every new cell is persisted — then run a FINAL invalidation. | ||
| * | ||
| * The final terminal-gated invalidation is the key to a reliable scenario table. The | ||
| * per-scenario `scenario-steps` queries only poll while non-terminal (scenarioSteps.ts | ||
| * `refetchInterval`), so when finalize lands between their 5s polls, only the rows that | ||
| * happened to refetch afterwards show new data — a half-updated table. Invalidating once | ||
| * the run is terminal converges every mounted row and marks off-screen/virtualized rows | ||
| * stale for their next mount. Fire-and-forget: the drawer closes immediately. | ||
| */ | ||
| const SETTLE_MS = 1500 | ||
|
|
||
| const bridgeRunReprocessing = async (queryClient: any, projectId: string, runId: string) => { | ||
| for (let attempt = 0; attempt < BRIDGE_ATTEMPTS; attempt++) { | ||
| await delay(BRIDGE_INTERVAL_MS) | ||
| const status = await readRunStatusAuthoritative(projectId, runId) | ||
| if (status && isTerminalStatus(status)) { | ||
| // Worker finished. Cell results can persist a beat after the run status flips | ||
| // terminal, so invalidate now AND once more after a short settle. Each call | ||
| // refetches active scenario rows/steps/metrics and marks off-screen ones stale — | ||
| // reload-equivalent, so nothing is left frozen by the per-scenario poller that | ||
| // stops the instant the run goes terminal. | ||
| await invalidateRunSurfaces(queryClient, projectId, runId) | ||
| await delay(SETTLE_MS) | ||
| await invalidateRunSurfaces(queryClient, projectId, runId) | ||
| return | ||
| } | ||
| // Still running — refresh active surfaces so progress (RUNNING status, cells) streams in. | ||
| await refetchRunSurfaces(queryClient, projectId, runId) | ||
| } | ||
| // Budget exhausted (very long reprocess, or status unreadable). Final invalidation so | ||
| // nothing is left half-updated. | ||
| await invalidateRunSurfaces(queryClient, projectId, runId) |
There was a problem hiding this comment.
The reprocess bridge gives up too early for large reruns.
The bridge stops after ~30s (15 * 2000ms). For longer evaluator backfills, it exits before the run returns to terminal, so the terminal-gated invalidation this code depends on never happens. Mounted rows will keep updating via polling, but virtualized/off-screen rows can stay stale until some unrelated refresh.
Please keep polling until terminal with backoff, or reschedule another check when the budget expires and the run is still non-terminal.
| export interface SaveEvaluationEditArgs { | ||
| runId: string | ||
| /** Current name (seeded from the run; sent even if unchanged — edit replaces it). */ | ||
| name: string | ||
| /** Current description (sent even if unchanged). */ | ||
| description: string | ||
| /** Evaluator REVISION ids staged for addition in the drawer. */ | ||
| addedEvaluatorRevisionIds: string[] | ||
| } | ||
|
|
||
| export const saveEvaluationEditAtom = atom( | ||
| null, | ||
| async ( | ||
| get, | ||
| _set, | ||
| {runId, name, description, addedEvaluatorRevisionIds}: SaveEvaluationEditArgs, | ||
| ) => { | ||
| if (!runId) return | ||
|
|
||
| const projectId = get(projectIdAtom) | ||
| if (!projectId) throw new Error("[edit-evaluation] missing projectId") | ||
|
|
||
| const run = get(evaluationRunQueryAtomFamily(runId))?.data?.rawRun | ||
| if (!run) throw new Error("[edit-evaluation] run not loaded") | ||
|
|
||
| const steps = stepsOf(run) | ||
| const annotationSteps = steps.filter((step) => step?.type === "annotation") | ||
| const existingEvaluatorIds = revisionIds(steps, "annotation", "evaluator_revision") | ||
| const fresh = (addedEvaluatorRevisionIds ?? []).filter( | ||
| (id) => !existingEvaluatorIds.includes(id), | ||
| ) | ||
|
|
||
| // New evaluators inherit the run's evaluator origin (kind anchor). | ||
| const newOrigin = (annotationSteps[0]?.origin as EvaluatorOrigin | undefined) ?? "auto" | ||
|
|
||
| // edit REPLACES run data, so send the complete target set (existing + new). | ||
| const evaluatorSteps: Record<string, EvaluatorOrigin> = {} | ||
| for (const step of annotationSteps) { | ||
| const id = step.references?.evaluator_revision?.id | ||
| if (id) evaluatorSteps[id] = (step.origin as EvaluatorOrigin) ?? newOrigin | ||
| } | ||
| for (const id of fresh) evaluatorSteps[id] = newOrigin |
There was a problem hiding this comment.
New evaluator origins are lost in the mutation contract.
SaveEvaluationEditArgs only carries revision ids, so lines 226-235 have to invent an origin for every added evaluator from the first existing annotation step (or "auto"). That corrupts mixed-origin edits: e.g. adding a human evaluator to an auto run persists it as auto, and adding the first evaluator to an empty run always persists it as auto.
Pass {id, origin} from the drawer and build evaluatorSteps from those explicit origins instead of inferring one fallback for all new evaluators.
| export const queryRunScenarioIds = async ({ | ||
| projectId, | ||
| runId, | ||
| }: { | ||
| projectId: string | ||
| runId: string | ||
| }): Promise<string[]> => { | ||
| const res = await client().queryScenarios( | ||
| {scenario: {run_id: runId}, windowing: {limit: 1000}}, | ||
| {queryParams: {project_id: projectId}}, | ||
| ) | ||
| return (res.scenarios ?? []) | ||
| .map((scenario) => scenario.id) | ||
| .filter((id): id is string => Boolean(id)) |
There was a problem hiding this comment.
queryRunScenarioIds drops every scenario after the first page.
This helper issues a single queryScenarios request with limit: 1000 and never follows pagination. When a run has more than 1000 scenarios, processSlice will only rerun the first page, so newly added evaluators never backfill the remaining rows.
Please page until exhaustion (or switch to an endpoint that returns all ids).
Resolved conflict: accept deletion of GeneralSection.tsx, which was removed in the release branch as part of the ConfigurationView refactor (replaced by RunSummaryCard, SectionNavCard, V2SectionShell). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Railway Preview Environment
Updated at 2026-06-12T18:53:16.547Z |
Summary
Testing
Verified locally
Added or updated tests
QA follow-up
Demo
Checklist
Contributor Resources