Skip to content

fix(openai): preserve Responses tools and streamed tool calls#1715

Open
hassiebp wants to merge 1 commit into
mainfrom
hassiebbot/openai-responses-tool-normalization
Open

fix(openai): preserve Responses tools and streamed tool calls#1715
hassiebp wants to merge 1 commit into
mainfrom
hassiebbot/openai-responses-tool-normalization

Conversation

@hassiebp

@hassiebp hassiebp commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What

Fixes the Python OpenAI integration for the clustered Responses API and streamed tool-call issues:

  • include Responses API tool fields (tools, tool_choice, parallel_tool_calls) in the traced generation input
  • include text_format / response_format schema metadata for structured OpenAI calls
  • serialize OpenAI SDK Responses output wrapper objects into plain data before writing generation output
  • reconstruct streamed Chat Completions tool calls with id, type, function name/arguments, and retain tool calls even when content chunks arrive first

Why

The Responses API path only traced input and instructions, so available tools and structured output format were invisible in Langfuse. Separately, streamed Chat Completions could drop tool calls when content arrived first, and streamed tool-call output missed id / type, which prevented UI matching against available tools.

Relevant Linear/GitHub cluster: LFE-9895, LFE-7512, LFE-7245, LFE-10331, LFE-8780, LFE-8781.

Verification

  • uv run --frozen pytest tests/unit/test_openai_prompt_extraction.py tests/unit/test_openai.py
  • uv run --frozen ruff check .
  • uv run --frozen ruff format --check langfuse/openai.py tests/unit/test_openai.py tests/unit/test_openai_prompt_extraction.py
  • uv run --frozen mypy langfuse --no-error-summary

Note: full uv run --frozen ruff format --check . currently reports unrelated pre-existing formatting in tests/unit/test_experiment.py, so I kept formatting verification scoped to the touched files.

Greptile Summary

This PR fixes two classes of bugs in the Python OpenAI integration: (1) the Responses API path was omitting tools, tool_choice, parallel_tool_calls, and structured-output format fields from the traced generation, and (2) streamed Chat Completions were silently dropping tool calls when a content chunk arrived first, and the accumulated tool-call objects were missing the id and type fields required by the Langfuse UI.

  • A new _serialize_openai_value helper recursively unwraps OpenAI SDK Pydantic wrapper objects into plain JSON-safe dicts, replacing ad-hoc model_json_schema() calls scattered across the codebase.
  • _extract_streamed_openai_response replaces the elif chain with independent if blocks so that a delta containing both content and tool_calls is fully processed, and uses index-based slot allocation to reconstruct multi-tool streams with complete id/type/function structure.
  • _extract_responses_prompt is extended to embed tools/tool_choice/parallel_tool_calls in the Langfuse input, and _get_structured_output_metadata generalises the existing response_format metadata capture to also include text_format.

Confidence Score: 4/5

Safe to merge; the core streaming and serialisation logic is correct and well-covered by the new unit tests.

The index-based tool-call accumulation and the elif→if conversion are logically sound and directly verified by the new test fixture. The two minor concerns are in non-critical fallback/helper paths: _serialize_openai_value's bare except Exception can silently return non-JSON-serialisable Python objects on the second model_dump attempt, and _get_structured_output_metadata uses model_dump() without mode='json' for caller-supplied metadata BaseModels, leaving the same exposure for datetime/UUID fields. Both are low-probability edge cases but worth hardening before they surface in production traces.

langfuse/openai.py around _serialize_openai_value (lines 215–220) and _get_structured_output_metadata (lines 247–248).

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant OpenAI as OpenAI SDK
    participant Wrapper as langfuse wrapper
    participant Extract as _extract_streamed_openai_response
    participant Helper as _serialize_openai_value

    OpenAI->>Wrapper: stream chunks (content + tool_calls possible in same delta)
    Wrapper->>Extract: chunks

    loop per chunk
        Extract->>Extract: process content (if present)
        Extract->>Extract: process function_call (if present)
        Extract->>Extract: process tool_calls by index (if present)
        Note over Extract: All three now run as independent if-blocks, not elif-chain
    end

    Extract->>Extract: "get_response_for_chat() priority: tool_calls > function_call > content"
    Extract-->>Wrapper: (model, completion, usage, metadata)

    Wrapper->>Helper: _extract_response_api_completion(output)
    Helper->>Helper: _serialize_openai_value(output) unwraps Pydantic BaseModel objects
    Helper-->>Wrapper: plain dict/list

    Wrapper->>Wrapper: _get_structured_output_metadata() captures response_format / text_format in metadata
    Wrapper->>Wrapper: _extract_responses_prompt() adds tools/tool_choice/parallel_tool_calls to input
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant OpenAI as OpenAI SDK
    participant Wrapper as langfuse wrapper
    participant Extract as _extract_streamed_openai_response
    participant Helper as _serialize_openai_value

    OpenAI->>Wrapper: stream chunks (content + tool_calls possible in same delta)
    Wrapper->>Extract: chunks

    loop per chunk
        Extract->>Extract: process content (if present)
        Extract->>Extract: process function_call (if present)
        Extract->>Extract: process tool_calls by index (if present)
        Note over Extract: All three now run as independent if-blocks, not elif-chain
    end

    Extract->>Extract: "get_response_for_chat() priority: tool_calls > function_call > content"
    Extract-->>Wrapper: (model, completion, usage, metadata)

    Wrapper->>Helper: _extract_response_api_completion(output)
    Helper->>Helper: _serialize_openai_value(output) unwraps Pydantic BaseModel objects
    Helper-->>Wrapper: plain dict/list

    Wrapper->>Wrapper: _get_structured_output_metadata() captures response_format / text_format in metadata
    Wrapper->>Wrapper: _extract_responses_prompt() adds tools/tool_choice/parallel_tool_calls to input
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
langfuse/openai.py:219-220
**Fallback `model_dump` may return non-serialisable values**

When `model_dump(mode="json", warnings=False)` raises, the bare `except Exception` silently retries with `model_dump(warnings=False)` (no `mode="json"`). That second call can return raw Python objects — `datetime`, `UUID`, `Decimal`, etc. — which then pass through `_serialize_openai_value` unchanged (they are not `NotGiven`, `BaseModel`, `dict`, or `list`) and land in the trace payload. Any downstream JSON serialiser will fail on those values, and the silent catch makes the failure hard to diagnose.

### Issue 2 of 2
langfuse/openai.py:247-248
**`model_dump()` without `mode="json"` may embed non-serialisable objects in metadata**

When the caller passes a Pydantic `BaseModel` as `metadata`, the conversion `metadata.model_dump()` is done without `mode="json"`. This differs from `_serialize_openai_value`'s own Pydantic handling (which uses `mode="json"` as the primary path), so `datetime`, `Enum`, and similar field types could appear as raw Python objects in the merged metadata dict that Langfuse eventually serialises.

Reviews (1): Last reviewed commit: "fix(openai): preserve Responses tools an..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

@hassiebp hassiebp marked this pull request as ready for review June 16, 2026 15:43
@github-actions

Copy link
Copy Markdown

@claude review

Comment thread langfuse/openai.py
Comment on lines +219 to +220
except Exception:
return _serialize_openai_value(value.model_dump(warnings=False))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Fallback model_dump may return non-serialisable values

When model_dump(mode="json", warnings=False) raises, the bare except Exception silently retries with model_dump(warnings=False) (no mode="json"). That second call can return raw Python objects — datetime, UUID, Decimal, etc. — which then pass through _serialize_openai_value unchanged (they are not NotGiven, BaseModel, dict, or list) and land in the trace payload. Any downstream JSON serialiser will fail on those values, and the silent catch makes the failure hard to diagnose.

Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/openai.py
Line: 219-220

Comment:
**Fallback `model_dump` may return non-serialisable values**

When `model_dump(mode="json", warnings=False)` raises, the bare `except Exception` silently retries with `model_dump(warnings=False)` (no `mode="json"`). That second call can return raw Python objects — `datetime`, `UUID`, `Decimal`, etc. — which then pass through `_serialize_openai_value` unchanged (they are not `NotGiven`, `BaseModel`, `dict`, or `list`) and land in the trace payload. Any downstream JSON serialiser will fail on those values, and the silent catch makes the failure hard to diagnose.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread langfuse/openai.py
Comment on lines +247 to +248
metadata_dict = (
metadata.model_dump() if isinstance(metadata, BaseModel) else metadata

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 model_dump() without mode="json" may embed non-serialisable objects in metadata

When the caller passes a Pydantic BaseModel as metadata, the conversion metadata.model_dump() is done without mode="json". This differs from _serialize_openai_value's own Pydantic handling (which uses mode="json" as the primary path), so datetime, Enum, and similar field types could appear as raw Python objects in the merged metadata dict that Langfuse eventually serialises.

Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/openai.py
Line: 247-248

Comment:
**`model_dump()` without `mode="json"` may embed non-serialisable objects in metadata**

When the caller passes a Pydantic `BaseModel` as `metadata`, the conversion `metadata.model_dump()` is done without `mode="json"`. This differs from `_serialize_openai_value`'s own Pydantic handling (which uses `mode="json"` as the primary path), so `datetime`, `Enum`, and similar field types could appear as raw Python objects in the merged metadata dict that Langfuse eventually serialises.

How can I resolve this? If you propose a fix, please make it concise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8681e83b11

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread langfuse/openai.py
Comment on lines +311 to +312
for key in _STRUCTURED_OUTPUT_METADATA_FIELDS:
self.kwargs["metadata"].pop(key, None)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve user text_format metadata when storing

When store=True is used without a structured text_format kwarg, callers can still pass valid OpenAI metadata such as metadata={"text_format": "plain"}. This loop now blindly removes text_format from the metadata dict before forwarding the request, and when no structured output metadata was injected it also mutates the caller's original dict, so valid distillation metadata is silently dropped; only the SDK-injected structured field should be removed when text_format was actually supplied as an OpenAI argument.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant