Skip to content

feat(tracing): add export-stage OTEL span masking and media detection#1646

Merged
hassiebp merged 18 commits into
mainfrom
codex/export-stage-span-mask
Jun 16, 2026
Merged

feat(tracing): add export-stage OTEL span masking and media detection#1646
hassiebp merged 18 commits into
mainfrom
codex/export-stage-span-mask

Conversation

@hassiebp

@hassiebp hassiebp commented May 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds an export-stage transformation layer for spans before they are sent to the downstream OTLP exporter.

  • adds the public mask_otel_spans callback on Langfuse(...)
  • adds the batch-shaped public contract: MaskOtelSpansParams, MaskOtelSpansResult, OtelSpanIdentifier, OtelSpanData, and OtelSpanPatch
  • wires mask_otel_spans through the resource manager into the Langfuse span processor
  • runs export-stage media detection over exported span attributes before masking
  • prefilters serialized string attributes before JSON parsing so JSON-looking strings without supported media hints are left unchanged cheaply
  • applies sparse keyed span patches to post-media OTEL span attribute snapshots before export
  • keeps existing synchronous mask behavior unchanged
  • hardens media detection for Gemini inline_data / inlineData payloads

Why

Attribute masking and media detection currently happen when Langfuse SDK spans set attributes. That leaves third-party OTEL spans flowing through the Langfuse span processor without equivalent export-stage handling. This PR adds a batch-native export-path hook while preserving existing Langfuse SDK masking behavior.

Behavior

  • mask_otel_spans runs only after existing export filters accept a span.
  • Media detection runs before mask_otel_spans.
  • Direct base64 data URI string attributes are processed without JSON parsing.
  • JSON-looking string attributes are parsed only if they contain supported media-shape hints such as data:, inline_data, inlineData, media_type, mime_type, or mimeType plus data.
  • mask_otel_spans receives the whole OTEL export batch as params.spans, keyed by OtelSpanIdentifier(trace_id, span_id).
  • MaskOtelSpansResult.span_patches is sparse: omitted spans are exported unchanged.
  • OtelSpanPatch can delete exact attribute keys and set attribute values; delete runs before set, so set wins.
  • Callback exceptions or invalid outer batch results drop the whole export batch.
  • Patches for unknown span identifiers drop the whole export batch.
  • Invalid patch objects for known spans drop only those spans.
  • Invalid returned set values delete only that affected attribute and log the value type, not the value.
  • Span events and links are not transformed.

Validation

  • uv run --frozen ruff check .
  • uv run --frozen mypy langfuse --no-error-summary
  • uv run --frozen pytest tests/unit/test_mask_otel_spans.py tests/unit/test_media_manager.py -q
  • uv run --frozen pytest tests/unit/test_otel.py tests/unit/test_additional_headers_simple.py -q
  • uv run --frozen pytest -n auto --dist worksteal tests/unit

Disclaimer: Experimental PR review

Greptile Summary

This PR adds an export-stage transformation layer (LangfuseTransformingSpanExporter) that runs between the OTel batch span processor and the downstream OTLP exporter, enabling media detection and attribute masking on third-party spans that bypass the existing SDK-level mask hook.

  • Introduces mask_otel_spans as a new public Langfuse(...) parameter, wired through the resource manager into LangfuseSpanProcessor, and exposes a batch-native public contract (MaskOtelSpansParams, MaskOtelSpansResult, OtelSpanPatch, etc.) in langfuse.types.
  • Extends MediaManager._find_and_process_media with a fail_open flag and adds Gemini inline_data/inlineData detection; applies media replacement at export time with a substring prefilter to avoid JSON parsing on non-media strings.
  • The OTEL tracer initialization in resource_manager.py was moved after the media manager setup so media_manager is available when the span processor is constructed.

Confidence Score: 4/5

Safe to merge; all changes are additive and opt-in, existing mask and media paths are untouched.

The change is well-structured and defensively written, but cloned spans lose their dropped_attributes_count metadata — downstream OTLP consumers would see 0 even for spans that originally exceeded the attribute limit.

langfuse/_client/span_exporter.py — specifically the _clone_span method and attribute wrapping strategy.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant BSP as BatchSpanProcessor
    participant LTS as LangfuseTransformingSpanExporter
    participant MM as MediaManager
    participant MF as mask_otel_spans callback
    participant OE as OTLPSpanExporter

    App->>BSP: span ends (on_end)
    BSP->>LTS: export(batch of ReadableSpans)

    loop for each span
        LTS->>MM: "_find_and_process_media(attributes, fail_open=True)"
        MM-->>LTS: post-media attribute dict
    end

    alt mask_otel_spans configured
        LTS->>MF: MaskOtelSpansParams(spans)
        MF-->>LTS: MaskOtelSpansResult or None

        alt exception or invalid result
            LTS-->>BSP: SUCCESS (batch dropped)
        else patch for unknown identifier
            LTS-->>BSP: SUCCESS (batch dropped)
        else valid patches
            loop for each span
                LTS->>LTS: apply OtelSpanPatch (delete then set)
            end
        end
    end

    LTS->>LTS: _clone_span(span, patched_attributes)
    LTS->>OE: export(cloned ReadableSpans)
    OE-->>LTS: SpanExportResult
    LTS-->>BSP: SpanExportResult
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
langfuse/_client/span_exporter.py:579-596
`_clone_span` converts `span.attributes` (a `BoundedAttributes`) to a plain `dict`. The OTel SDK's `ReadableSpan.dropped_attributes` property guards with `isinstance(self._attributes, BoundedAttributes)` and returns `0` for any other type, so the cloned span will always export `dropped_attributes_count=0` even when the original span had attributes silently dropped by the bounds limit. This loses span fidelity in the OTLP payload for any span that hit the attribute limit.

```suggestion
    @staticmethod
    def _clone_span(
        *, span: ReadableSpan, attributes: Dict[str, AttributeValue]
    ) -> ReadableSpan:
        return ReadableSpan(
            name=span.name,
            context=span.context,
            parent=span.parent,
            resource=span.resource,
            attributes=BoundedAttributes(
                maxlen=None,
                attributes=attributes,
                immutable=True,
                max_value_len=None,
            ),
            events=span.events,
            links=span.links,
            kind=span.kind,
            status=span.status,
            start_time=span.start_time,
            end_time=span.end_time,
            instrumentation_scope=span.instrumentation_scope,
        )
```

### Issue 2 of 2
langfuse/_client/span_exporter.py:661-669
**Prefilter false-positive on `data:` substrings in URLs**`_may_contain_serialized_media` triggers on any string containing the literal `data:` (e.g., `"https://api.example.com/data:types"`), causing unnecessary `json.loads` parsing before the string is returned unchanged. No data is lost, but large attribute strings with `data:` in a URL incur a redundant parse on every export.

Reviews (1): Last reviewed commit: "test(tracing): cover export-stage span m..." | Re-trigger Greptile

@hassiebp hassiebp force-pushed the codex/export-stage-span-mask branch from 9092ee3 to 98b2a5e Compare May 7, 2026 12:45
@hassiebp hassiebp changed the title [codex] feat(tracing): add export-stage span mask feat(tracing): add export-stage span mask May 7, 2026
Comment thread langfuse/_client/client.py Outdated
Comment thread langfuse/_client/client.py Outdated
@hassiebp hassiebp force-pushed the codex/export-stage-span-mask branch from 98b2a5e to 1415295 Compare May 7, 2026 13:29
@hassiebp hassiebp changed the title feat(tracing): add export-stage span mask [codex] feat(tracing): add export-stage OTEL span masking May 7, 2026
Comment thread langfuse/_client/client.py Outdated
Comment thread langfuse/_client/client.py Outdated
Comment thread langfuse/_client/span_exporter.py
Comment thread langfuse/_client/span_exporter.py
@hassiebp hassiebp force-pushed the codex/export-stage-span-mask branch from 1415295 to 2ed8bcb Compare May 7, 2026 14:42
@hassiebp hassiebp force-pushed the codex/export-stage-span-mask branch from 2ed8bcb to 7b67d48 Compare May 7, 2026 15:09
@hassiebp hassiebp changed the title [codex] feat(tracing): add export-stage OTEL span masking feat(tracing): add export-stage OTEL span masking and media detection May 7, 2026
@hassiebp hassiebp marked this pull request as ready for review May 7, 2026 15:47
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

@claude review

Comment thread langfuse/_client/span_exporter.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c9aca2e48

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread langfuse/_client/span_exporter.py Outdated
Comment thread langfuse/_client/span_exporter.py Outdated
Comment thread langfuse/_client/span_exporter.py
@blacksmith-sh

This comment has been minimized.

Comment thread langfuse/_client/span_exporter.py Outdated
@blacksmith-sh

This comment has been minimized.

Comment thread langfuse/_client/span_exporter.py Outdated
Comment thread langfuse/_client/span_exporter.py
@blacksmith-sh

This comment has been minimized.

Comment thread langfuse/_client/span_exporter.py
Comment thread langfuse/_client/span_exporter.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8131633c01

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread langfuse/_client/span_exporter.py
@hassiebp hassiebp had a problem deploying to protected branches May 8, 2026 11:51 — with GitHub Actions Failure
@hassiebp hassiebp temporarily deployed to protected branches May 8, 2026 11:54 — with GitHub Actions Inactive

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

requires-python = ">=3.10,<4.0"

P1 Badge Restore monotonic package versioning

Bumping the project version backwards from 4.6.0b1 to 4.6.0a1 makes this release look older than the previous one under PEP 440 ordering, so environments already on 4.6.0b1 will not upgrade to this build and release automation can mis-handle publication/version checks. This should be a forward version increment to keep upgrade and release behavior correct.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread langfuse/_client/span_exporter.py
Comment thread langfuse/_client/span_processor.py
…an-mask

# Conflicts:
#	langfuse/_client/span_processor.py
#	pyproject.toml
#	uv.lock

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee4347c6b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread langfuse/_client/span_exporter.py
Comment thread langfuse/_task_manager/media_manager.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ec56e5f6e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread langfuse/types.py
@hassiebp hassiebp requested a review from a team as a code owner June 16, 2026 08:23
Comment thread langfuse/_task_manager/media_manager.py
@hassiebp hassiebp merged commit 59fa9aa into main Jun 16, 2026
27 of 29 checks passed
@hassiebp hassiebp deleted the codex/export-stage-span-mask branch June 16, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python SDK masking and media handling should run off the app thread before export

1 participant