test: cover dataset run item observation links#1714
Conversation
|
@claude review |
There was a problem hiding this comment.
Test-only addition; uses existing fixtures and pins the observation_id linkage behavior.
Extended reasoning...
Overview
This PR adds a single new file tests/unit/test_dataset_run_item_observation_id.py containing one regression test that verifies DatasetClient.run_experiment creates a dataset run item linked to the root span's trace and observation IDs. No production code is modified.
Security risks
None. The change is confined to a unit test file under tests/unit/ and uses an in-memory client with a monkeypatched API method.
Level of scrutiny
Low. Test-only additions for regression coverage carry minimal risk — the worst case is a flaky/failing test, which would surface in CI rather than affect users. The test depends on existing fixtures (langfuse_memory_client, get_span) defined in tests/conftest.py and standard public symbols (DatasetClient, Dataset, DatasetItem, DatasetRunItem, DatasetStatus).
Other factors
The bug hunting system reported no issues. The test pins a specific contract (observation_id is the root span's hex span id) that aligns with the linked Linear ticket LFE-10352, so it serves as useful regression coverage. PR description notes the test was not executed locally, but CI will validate.
Summary
observation_id=span.id.Linear: LFE-10352
Verification
Greptile Summary
This PR adds a targeted unit test verifying that
DatasetClient.run_experimentcorrectly links each dataset run item to the root span's observation ID, guarding against a specific regression class where the wrong span ID (or no span ID) could be passed todataset_run_items.create.dataset_run_items.createto capture kwargs, runs a single-item experiment, flushes the in-memory exporter, and then asserts bothtrace_idandobservation_idmatch the root\"experiment-item-run\"span's context — confirming the existingobservation_id=span.idpath already passes.Confidence Score: 5/5
Test-only addition with no production code changes; the existing SDK path it exercises already behaves correctly.
The change is a single new test file with no modifications to production code. The test correctly exercises the observation-ID linking path through a real DatasetClient, a monkeypatched API layer, and the in-memory OTel exporter already used by the unit test suite.
No files require special attention.
Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant Test participant DatasetClient participant Langfuse as Langfuse Client participant Span as OTel Span (experiment-item-run) participant Mock as Mock dataset_run_items.create Test->>DatasetClient: "run_experiment(name=experiment, task=...)" DatasetClient->>Langfuse: "run_experiment(data=[item], ...)" Langfuse->>Span: start_as_current_observation(experiment-item-run) Span-->>Langfuse: "span (span.id = root observation id)" Langfuse->>Mock: "create(trace_id=trace_id, observation_id=span.id, ...)" Mock-->>Langfuse: DatasetRunItem Langfuse-->>Span: end span Test->>Langfuse: flush() Test->>Test: get_span(experiment-item-run) Test->>Test: "assert created_run_item trace_id == root_span.context.trace_id" Test->>Test: "assert created_run_item observation_id == root_span.context.span_id"%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant Test participant DatasetClient participant Langfuse as Langfuse Client participant Span as OTel Span (experiment-item-run) participant Mock as Mock dataset_run_items.create Test->>DatasetClient: "run_experiment(name=experiment, task=...)" DatasetClient->>Langfuse: "run_experiment(data=[item], ...)" Langfuse->>Span: start_as_current_observation(experiment-item-run) Span-->>Langfuse: "span (span.id = root observation id)" Langfuse->>Mock: "create(trace_id=trace_id, observation_id=span.id, ...)" Mock-->>Langfuse: DatasetRunItem Langfuse-->>Span: end span Test->>Langfuse: flush() Test->>Test: get_span(experiment-item-run) Test->>Test: "assert created_run_item trace_id == root_span.context.trace_id" Test->>Test: "assert created_run_item observation_id == root_span.context.span_id"Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "test: cover dataset run item observation..." | Re-trigger Greptile