fix(openai): preserve Responses tools and streamed tool calls#1715
fix(openai): preserve Responses tools and streamed tool calls#1715hassiebp wants to merge 1 commit into
Conversation
|
@claude review |
| except Exception: | ||
| return _serialize_openai_value(value.model_dump(warnings=False)) |
There was a problem hiding this comment.
Fallback
model_dump may return non-serialisable values
When model_dump(mode="json", warnings=False) raises, the bare except Exception silently retries with model_dump(warnings=False) (no mode="json"). That second call can return raw Python objects — datetime, UUID, Decimal, etc. — which then pass through _serialize_openai_value unchanged (they are not NotGiven, BaseModel, dict, or list) and land in the trace payload. Any downstream JSON serialiser will fail on those values, and the silent catch makes the failure hard to diagnose.
Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/openai.py
Line: 219-220
Comment:
**Fallback `model_dump` may return non-serialisable values**
When `model_dump(mode="json", warnings=False)` raises, the bare `except Exception` silently retries with `model_dump(warnings=False)` (no `mode="json"`). That second call can return raw Python objects — `datetime`, `UUID`, `Decimal`, etc. — which then pass through `_serialize_openai_value` unchanged (they are not `NotGiven`, `BaseModel`, `dict`, or `list`) and land in the trace payload. Any downstream JSON serialiser will fail on those values, and the silent catch makes the failure hard to diagnose.
How can I resolve this? If you propose a fix, please make it concise.| metadata_dict = ( | ||
| metadata.model_dump() if isinstance(metadata, BaseModel) else metadata |
There was a problem hiding this comment.
model_dump() without mode="json" may embed non-serialisable objects in metadata
When the caller passes a Pydantic BaseModel as metadata, the conversion metadata.model_dump() is done without mode="json". This differs from _serialize_openai_value's own Pydantic handling (which uses mode="json" as the primary path), so datetime, Enum, and similar field types could appear as raw Python objects in the merged metadata dict that Langfuse eventually serialises.
Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/openai.py
Line: 247-248
Comment:
**`model_dump()` without `mode="json"` may embed non-serialisable objects in metadata**
When the caller passes a Pydantic `BaseModel` as `metadata`, the conversion `metadata.model_dump()` is done without `mode="json"`. This differs from `_serialize_openai_value`'s own Pydantic handling (which uses `mode="json"` as the primary path), so `datetime`, `Enum`, and similar field types could appear as raw Python objects in the merged metadata dict that Langfuse eventually serialises.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8681e83b11
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for key in _STRUCTURED_OUTPUT_METADATA_FIELDS: | ||
| self.kwargs["metadata"].pop(key, None) |
There was a problem hiding this comment.
Preserve user text_format metadata when storing
When store=True is used without a structured text_format kwarg, callers can still pass valid OpenAI metadata such as metadata={"text_format": "plain"}. This loop now blindly removes text_format from the metadata dict before forwarding the request, and when no structured output metadata was injected it also mutates the caller's original dict, so valid distillation metadata is silently dropped; only the SDK-injected structured field should be removed when text_format was actually supplied as an OpenAI argument.
Useful? React with 👍 / 👎.
What
Fixes the Python OpenAI integration for the clustered Responses API and streamed tool-call issues:
tools,tool_choice,parallel_tool_calls) in the traced generation inputtext_format/response_formatschema metadata for structured OpenAI callsid,type, function name/arguments, and retain tool calls even when content chunks arrive firstWhy
The Responses API path only traced
inputandinstructions, so available tools and structured output format were invisible in Langfuse. Separately, streamed Chat Completions could drop tool calls when content arrived first, and streamed tool-call output missedid/type, which prevented UI matching against available tools.Relevant Linear/GitHub cluster: LFE-9895, LFE-7512, LFE-7245, LFE-10331, LFE-8780, LFE-8781.
Verification
uv run --frozen pytest tests/unit/test_openai_prompt_extraction.py tests/unit/test_openai.pyuv run --frozen ruff check .uv run --frozen ruff format --check langfuse/openai.py tests/unit/test_openai.py tests/unit/test_openai_prompt_extraction.pyuv run --frozen mypy langfuse --no-error-summaryNote: full
uv run --frozen ruff format --check .currently reports unrelated pre-existing formatting intests/unit/test_experiment.py, so I kept formatting verification scoped to the touched files.Greptile Summary
This PR fixes two classes of bugs in the Python OpenAI integration: (1) the Responses API path was omitting
tools,tool_choice,parallel_tool_calls, and structured-output format fields from the traced generation, and (2) streamed Chat Completions were silently dropping tool calls when a content chunk arrived first, and the accumulated tool-call objects were missing theidandtypefields required by the Langfuse UI._serialize_openai_valuehelper recursively unwraps OpenAI SDK Pydantic wrapper objects into plain JSON-safe dicts, replacing ad-hocmodel_json_schema()calls scattered across the codebase._extract_streamed_openai_responsereplaces theelifchain with independentifblocks so that a delta containing bothcontentandtool_callsis fully processed, and uses index-based slot allocation to reconstruct multi-tool streams with completeid/type/functionstructure._extract_responses_promptis extended to embedtools/tool_choice/parallel_tool_callsin the Langfuse input, and_get_structured_output_metadatageneralises the existingresponse_formatmetadata capture to also includetext_format.Confidence Score: 4/5
Safe to merge; the core streaming and serialisation logic is correct and well-covered by the new unit tests.
The index-based tool-call accumulation and the elif→if conversion are logically sound and directly verified by the new test fixture. The two minor concerns are in non-critical fallback/helper paths: _serialize_openai_value's bare except Exception can silently return non-JSON-serialisable Python objects on the second model_dump attempt, and _get_structured_output_metadata uses model_dump() without mode='json' for caller-supplied metadata BaseModels, leaving the same exposure for datetime/UUID fields. Both are low-probability edge cases but worth hardening before they surface in production traces.
langfuse/openai.py around _serialize_openai_value (lines 215–220) and _get_structured_output_metadata (lines 247–248).
Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant OpenAI as OpenAI SDK participant Wrapper as langfuse wrapper participant Extract as _extract_streamed_openai_response participant Helper as _serialize_openai_value OpenAI->>Wrapper: stream chunks (content + tool_calls possible in same delta) Wrapper->>Extract: chunks loop per chunk Extract->>Extract: process content (if present) Extract->>Extract: process function_call (if present) Extract->>Extract: process tool_calls by index (if present) Note over Extract: All three now run as independent if-blocks, not elif-chain end Extract->>Extract: "get_response_for_chat() priority: tool_calls > function_call > content" Extract-->>Wrapper: (model, completion, usage, metadata) Wrapper->>Helper: _extract_response_api_completion(output) Helper->>Helper: _serialize_openai_value(output) unwraps Pydantic BaseModel objects Helper-->>Wrapper: plain dict/list Wrapper->>Wrapper: _get_structured_output_metadata() captures response_format / text_format in metadata Wrapper->>Wrapper: _extract_responses_prompt() adds tools/tool_choice/parallel_tool_calls to input%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant OpenAI as OpenAI SDK participant Wrapper as langfuse wrapper participant Extract as _extract_streamed_openai_response participant Helper as _serialize_openai_value OpenAI->>Wrapper: stream chunks (content + tool_calls possible in same delta) Wrapper->>Extract: chunks loop per chunk Extract->>Extract: process content (if present) Extract->>Extract: process function_call (if present) Extract->>Extract: process tool_calls by index (if present) Note over Extract: All three now run as independent if-blocks, not elif-chain end Extract->>Extract: "get_response_for_chat() priority: tool_calls > function_call > content" Extract-->>Wrapper: (model, completion, usage, metadata) Wrapper->>Helper: _extract_response_api_completion(output) Helper->>Helper: _serialize_openai_value(output) unwraps Pydantic BaseModel objects Helper-->>Wrapper: plain dict/list Wrapper->>Wrapper: _get_structured_output_metadata() captures response_format / text_format in metadata Wrapper->>Wrapper: _extract_responses_prompt() adds tools/tool_choice/parallel_tool_calls to inputPrompt To Fix All With AI
Reviews (1): Last reviewed commit: "fix(openai): preserve Responses tools an..." | Re-trigger Greptile