fix(openai): preserve Responses tools and streamed tool calls by hassiebp · Pull Request #1715 · langfuse/langfuse-python

hassiebp · 2026-06-16T15:42:16Z

What

Fixes the Python OpenAI integration for the clustered Responses API and streamed tool-call issues:

include Responses API tool fields (tools, tool_choice, parallel_tool_calls) in the traced generation input
include text_format / response_format schema metadata for structured OpenAI calls
serialize OpenAI SDK Responses output wrapper objects into plain data before writing generation output
reconstruct streamed Chat Completions tool calls with id, type, function name/arguments, and retain tool calls even when content chunks arrive first

Why

The Responses API path only traced input and instructions, so available tools and structured output format were invisible in Langfuse. Separately, streamed Chat Completions could drop tool calls when content arrived first, and streamed tool-call output missed id / type, which prevented UI matching against available tools.

Relevant Linear/GitHub cluster: LFE-9895, LFE-7512, LFE-7245, LFE-10331, LFE-8780, LFE-8781.

Verification

uv run --frozen pytest tests/unit/test_openai_prompt_extraction.py tests/unit/test_openai.py
uv run --frozen ruff check .
uv run --frozen ruff format --check langfuse/openai.py tests/unit/test_openai.py tests/unit/test_openai_prompt_extraction.py
uv run --frozen mypy langfuse --no-error-summary

Note: full uv run --frozen ruff format --check . currently reports unrelated pre-existing formatting in tests/unit/test_experiment.py, so I kept formatting verification scoped to the touched files.

Greptile Summary

This PR fixes two classes of bugs in the Python OpenAI integration: (1) the Responses API path was omitting tools, tool_choice, parallel_tool_calls, and structured-output format fields from the traced generation, and (2) streamed Chat Completions were silently dropping tool calls when a content chunk arrived first, and the accumulated tool-call objects were missing the id and type fields required by the Langfuse UI.

A new _serialize_openai_value helper recursively unwraps OpenAI SDK Pydantic wrapper objects into plain JSON-safe dicts, replacing ad-hoc model_json_schema() calls scattered across the codebase.
_extract_streamed_openai_response replaces the elif chain with independent if blocks so that a delta containing both content and tool_calls is fully processed, and uses index-based slot allocation to reconstruct multi-tool streams with complete id/type/function structure.
_extract_responses_prompt is extended to embed tools/tool_choice/parallel_tool_calls in the Langfuse input, and _get_structured_output_metadata generalises the existing response_format metadata capture to also include text_format.

Confidence Score: 4/5

Safe to merge; the core streaming and serialisation logic is correct and well-covered by the new unit tests.

The index-based tool-call accumulation and the elif→if conversion are logically sound and directly verified by the new test fixture. The two minor concerns are in non-critical fallback/helper paths: _serialize_openai_value's bare except Exception can silently return non-JSON-serialisable Python objects on the second model_dump attempt, and _get_structured_output_metadata uses model_dump() without mode='json' for caller-supplied metadata BaseModels, leaving the same exposure for datetime/UUID fields. Both are low-probability edge cases but worth hardening before they surface in production traces.

langfuse/openai.py around _serialize_openai_value (lines 215–220) and _get_structured_output_metadata (lines 247–248).

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant OpenAI as OpenAI SDK
    participant Wrapper as langfuse wrapper
    participant Extract as _extract_streamed_openai_response
    participant Helper as _serialize_openai_value

    OpenAI->>Wrapper: stream chunks (content + tool_calls possible in same delta)
    Wrapper->>Extract: chunks

    loop per chunk
        Extract->>Extract: process content (if present)
        Extract->>Extract: process function_call (if present)
        Extract->>Extract: process tool_calls by index (if present)
        Note over Extract: All three now run as independent if-blocks, not elif-chain
    end

    Extract->>Extract: "get_response_for_chat() priority: tool_calls > function_call > content"
    Extract-->>Wrapper: (model, completion, usage, metadata)

    Wrapper->>Helper: _extract_response_api_completion(output)
    Helper->>Helper: _serialize_openai_value(output) unwraps Pydantic BaseModel objects
    Helper-->>Wrapper: plain dict/list

    Wrapper->>Wrapper: _get_structured_output_metadata() captures response_format / text_format in metadata
    Wrapper->>Wrapper: _extract_responses_prompt() adds tools/tool_choice/parallel_tool_calls to input

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant OpenAI as OpenAI SDK
    participant Wrapper as langfuse wrapper
    participant Extract as _extract_streamed_openai_response
    participant Helper as _serialize_openai_value

    OpenAI->>Wrapper: stream chunks (content + tool_calls possible in same delta)
    Wrapper->>Extract: chunks

    loop per chunk
        Extract->>Extract: process content (if present)
        Extract->>Extract: process function_call (if present)
        Extract->>Extract: process tool_calls by index (if present)
        Note over Extract: All three now run as independent if-blocks, not elif-chain
    end

    Extract->>Extract: "get_response_for_chat() priority: tool_calls > function_call > content"
    Extract-->>Wrapper: (model, completion, usage, metadata)

    Wrapper->>Helper: _extract_response_api_completion(output)
    Helper->>Helper: _serialize_openai_value(output) unwraps Pydantic BaseModel objects
    Helper-->>Wrapper: plain dict/list

    Wrapper->>Wrapper: _get_structured_output_metadata() captures response_format / text_format in metadata
    Wrapper->>Wrapper: _extract_responses_prompt() adds tools/tool_choice/parallel_tool_calls to input

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
langfuse/openai.py:219-220
**Fallback `model_dump` may return non-serialisable values**

When `model_dump(mode="json", warnings=False)` raises, the bare `except Exception` silently retries with `model_dump(warnings=False)` (no `mode="json"`). That second call can return raw Python objects — `datetime`, `UUID`, `Decimal`, etc. — which then pass through `_serialize_openai_value` unchanged (they are not `NotGiven`, `BaseModel`, `dict`, or `list`) and land in the trace payload. Any downstream JSON serialiser will fail on those values, and the silent catch makes the failure hard to diagnose.

### Issue 2 of 2
langfuse/openai.py:247-248
**`model_dump()` without `mode="json"` may embed non-serialisable objects in metadata**

When the caller passes a Pydantic `BaseModel` as `metadata`, the conversion `metadata.model_dump()` is done without `mode="json"`. This differs from `_serialize_openai_value`'s own Pydantic handling (which uses `mode="json"` as the primary path), so `datetime`, `Enum`, and similar field types could appear as raw Python objects in the merged metadata dict that Langfuse eventually serialises.

_{Reviews (1): Last reviewed commit: "fix(openai): preserve Responses tools an..." | Re-trigger Greptile}

Greptile also left 2 inline comments on this PR.

github-actions · 2026-06-16T15:44:00Z

@claude review

greptile-apps · 2026-06-16T15:48:44Z

+        except Exception:
+            return _serialize_openai_value(value.model_dump(warnings=False))


Fallback model_dump may return non-serialisable values

When model_dump(mode="json", warnings=False) raises, the bare except Exception silently retries with model_dump(warnings=False) (no mode="json"). That second call can return raw Python objects — datetime, UUID, Decimal, etc. — which then pass through _serialize_openai_value unchanged (they are not NotGiven, BaseModel, dict, or list) and land in the trace payload. Any downstream JSON serialiser will fail on those values, and the silent catch makes the failure hard to diagnose.

Prompt To Fix With AI

This is a comment left during a code review. Path: langfuse/openai.py Line: 219-220 Comment: **Fallback `model_dump` may return non-serialisable values** When `model_dump(mode="json", warnings=False)` raises, the bare `except Exception` silently retries with `model_dump(warnings=False)` (no `mode="json"`). That second call can return raw Python objects — `datetime`, `UUID`, `Decimal`, etc. — which then pass through `_serialize_openai_value` unchanged (they are not `NotGiven`, `BaseModel`, `dict`, or `list`) and land in the trace payload. Any downstream JSON serialiser will fail on those values, and the silent catch makes the failure hard to diagnose. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-16T15:48:45Z

+    metadata_dict = (
+        metadata.model_dump() if isinstance(metadata, BaseModel) else metadata


model_dump() without mode="json" may embed non-serialisable objects in metadata

When the caller passes a Pydantic BaseModel as metadata, the conversion metadata.model_dump() is done without mode="json". This differs from _serialize_openai_value's own Pydantic handling (which uses mode="json" as the primary path), so datetime, Enum, and similar field types could appear as raw Python objects in the merged metadata dict that Langfuse eventually serialises.

Prompt To Fix With AI

This is a comment left during a code review. Path: langfuse/openai.py Line: 247-248 Comment: **`model_dump()` without `mode="json"` may embed non-serialisable objects in metadata** When the caller passes a Pydantic `BaseModel` as `metadata`, the conversion `metadata.model_dump()` is done without `mode="json"`. This differs from `_serialize_openai_value`'s own Pydantic handling (which uses `mode="json"` as the primary path), so `datetime`, `Enum`, and similar field types could appear as raw Python objects in the merged metadata dict that Langfuse eventually serialises. How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8681e83b11

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-16T15:49:20Z

+            for key in _STRUCTURED_OUTPUT_METADATA_FIELDS:
+                self.kwargs["metadata"].pop(key, None)


Preserve user text_format metadata when storing

When store=True is used without a structured text_format kwarg, callers can still pass valid OpenAI metadata such as metadata={"text_format": "plain"}. This loop now blindly removes text_format from the metadata dict before forwarding the request, and when no structured output metadata was injected it also mutates the caller's original dict, so valid distillation metadata is silently dropped; only the SDK-injected structured field should be removed when text_format was actually supplied as an OpenAI argument.

Useful? React with 👍 / 👎.

fix(openai): preserve Responses tools and streamed tool calls

8681e83

hassiebp marked this pull request as ready for review June 16, 2026 15:43

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai): preserve Responses tools and streamed tool calls#1715

fix(openai): preserve Responses tools and streamed tool calls#1715
hassiebp wants to merge 1 commit into
mainfrom
hassiebbot/openai-responses-tool-normalization

hassiebp commented Jun 16, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		except Exception:
		return _serialize_openai_value(value.model_dump(warnings=False))

		metadata_dict = (
		metadata.model_dump() if isinstance(metadata, BaseModel) else metadata

		for key in _STRUCTURED_OUTPUT_METADATA_FIELDS:
		self.kwargs["metadata"].pop(key, None)

Conversation

hassiebp commented Jun 16, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Verification

Greptile Summary

Confidence Score: 4/5

Sequence Diagram

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hassiebp commented Jun 16, 2026 •

edited by greptile-apps Bot

Loading