test: cover dataset run item observation links by marliessophie · Pull Request #1714 · langfuse/langfuse-python

marliessophie · 2026-06-16T15:12:13Z

Summary

Add a targeted regression test that dataset experiments create dataset run items with the root span observation id.
Confirms the current Python SDK path already passes observation_id=span.id.

Linear: LFE-10352

Verification

Not run per request.

Greptile Summary

This PR adds a targeted unit test verifying that DatasetClient.run_experiment correctly links each dataset run item to the root span's observation ID, guarding against a specific regression class where the wrong span ID (or no span ID) could be passed to dataset_run_items.create.

The test monkeypatches dataset_run_items.create to capture kwargs, runs a single-item experiment, flushes the in-memory exporter, and then asserts both trace_id and observation_id match the root \"experiment-item-run\" span's context — confirming the existing observation_id=span.id path already passes.

Confidence Score: 5/5

Test-only addition with no production code changes; the existing SDK path it exercises already behaves correctly.

The change is a single new test file with no modifications to production code. The test correctly exercises the observation-ID linking path through a real DatasetClient, a monkeypatched API layer, and the in-memory OTel exporter already used by the unit test suite.

No files require special attention.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Test
    participant DatasetClient
    participant Langfuse as Langfuse Client
    participant Span as OTel Span (experiment-item-run)
    participant Mock as Mock dataset_run_items.create

    Test->>DatasetClient: "run_experiment(name=experiment, task=...)"
    DatasetClient->>Langfuse: "run_experiment(data=[item], ...)"
    Langfuse->>Span: start_as_current_observation(experiment-item-run)
    Span-->>Langfuse: "span (span.id = root observation id)"
    Langfuse->>Mock: "create(trace_id=trace_id, observation_id=span.id, ...)"
    Mock-->>Langfuse: DatasetRunItem
    Langfuse-->>Span: end span
    Test->>Langfuse: flush()
    Test->>Test: get_span(experiment-item-run)
    Test->>Test: "assert created_run_item trace_id == root_span.context.trace_id"
    Test->>Test: "assert created_run_item observation_id == root_span.context.span_id"

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Test
    participant DatasetClient
    participant Langfuse as Langfuse Client
    participant Span as OTel Span (experiment-item-run)
    participant Mock as Mock dataset_run_items.create

    Test->>DatasetClient: "run_experiment(name=experiment, task=...)"
    DatasetClient->>Langfuse: "run_experiment(data=[item], ...)"
    Langfuse->>Span: start_as_current_observation(experiment-item-run)
    Span-->>Langfuse: "span (span.id = root observation id)"
    Langfuse->>Mock: "create(trace_id=trace_id, observation_id=span.id, ...)"
    Mock-->>Langfuse: DatasetRunItem
    Langfuse-->>Span: end span
    Test->>Langfuse: flush()
    Test->>Test: get_span(experiment-item-run)
    Test->>Test: "assert created_run_item trace_id == root_span.context.trace_id"
    Test->>Test: "assert created_run_item observation_id == root_span.context.span_id"

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
tests/unit/test_dataset_run_item_observation_id.py:63-64
If `run_experiment` ever fails to call `dataset_run_items.create` (e.g. the item is skipped for any reason), `created_run_item` stays as an empty `{}` and the assertion on `created_run_item["trace_id"]` raises a `KeyError` instead of a meaningful `AssertionError`. Adding an explicit guard makes the failure message immediately actionable.

```suggestion
    root_span = get_span("experiment-item-run")
    assert created_run_item, "dataset_run_items.create was never called"
    assert created_run_item["trace_id"] == format(root_span.context.trace_id, "032x")
```

_{Reviews (1): Last reviewed commit: "test: cover dataset run item observation..." | Re-trigger Greptile}

github-actions · 2026-06-16T15:12:44Z

@claude review

claude

Test-only addition; uses existing fixtures and pins the observation_id linkage behavior.

Extended reasoning...

Overview

This PR adds a single new file tests/unit/test_dataset_run_item_observation_id.py containing one regression test that verifies DatasetClient.run_experiment creates a dataset run item linked to the root span's trace and observation IDs. No production code is modified.

Security risks

None. The change is confined to a unit test file under tests/unit/ and uses an in-memory client with a monkeypatched API method.

Level of scrutiny

Low. Test-only additions for regression coverage carry minimal risk — the worst case is a flaky/failing test, which would surface in CI rather than affect users. The test depends on existing fixtures (langfuse_memory_client, get_span) defined in tests/conftest.py and standard public symbols (DatasetClient, Dataset, DatasetItem, DatasetRunItem, DatasetStatus).

Other factors

The bug hunting system reported no issues. The test pins a specific contract (observation_id is the root span's hex span id) that aligns with the linked Linear ticket LFE-10352, so it serves as useful regression coverage. PR description notes the test was not executed locally, but CI will validate.

test: cover dataset run item observation links

3d68289

claude Bot reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: cover dataset run item observation links#1714

test: cover dataset run item observation links#1714
marliessophie wants to merge 1 commit into
mainfrom
codex/lfe-10352-python-dataset-run-item-observation

marliessophie commented Jun 16, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marliessophie commented Jun 16, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Greptile Summary

Confidence Score: 5/5

Sequence Diagram

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marliessophie commented Jun 16, 2026 •

edited by greptile-apps Bot

Loading