Make trajectory serialization JSON-safe by SailorJoe6 · Pull Request #728 · SWE-agent/mini-swe-agent

SailorJoe6 · 2026-02-04T23:24:12Z

Fixes #727

Summary

When a tool‑calling model emits an invalid tool call, the resulting FormatError can include non‑JSON‑serializable data, which causes agent.save() to fail and prevents the trajectory file from being written. This PR makes the error payload JSON‑safe and hardens serialization so trajectories are always persisted.

Changes

Wrap FormatError in LitellmModel with a JSON‑safe debug payload that preserves raw assistant content/tool_calls and response metadata.
Add to_jsonable utility and use it in DefaultAgent.save() to guarantee JSON serialization of trajectories.

Why

Losing the trajectory on invalid tool calls breaks expected SWE‑bench behavior and makes debugging difficult. This ensures trajectories are written even on malformed tool calls.

for more information, see https://pre-commit.ci

codecov · 2026-02-04T23:26:47Z

Codecov Report

❌ Patch coverage is 92.85714% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/minisweagent/utils/serialize.py	90.00%	2 Missing ⚠️

Files with missing lines	Coverage Δ
src/minisweagent/agents/default.py	`100.00% <100.00%> (ø)`
src/minisweagent/models/litellm_model.py	`93.18% <100.00%> (+0.49%)`	⬆️
src/minisweagent/utils/serialize.py	`94.44% <90.00%> (-5.56%)`	⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR addresses issue #727 by making trajectory serialization JSON-safe when FormatError occurs due to invalid tool calls. The changes ensure that trajectories are always persisted, even when model responses contain malformed tool calls or non-JSON-serializable data.

Changes:

Added to_jsonable utility function to convert arbitrary Python objects to JSON-serializable forms with fallback strategies
Modified DefaultAgent.save() to apply to_jsonable before JSON serialization
Enhanced LitellmModel.query() to wrap FormatError with JSON-safe debug payloads containing assistant content, tool calls, and response metadata

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
src/minisweagent/utils/serialize.py	Adds `to_jsonable` utility with comprehensive type handling and fallback mechanisms for non-JSON-serializable objects
src/minisweagent/agents/default.py	Updates `save()` method to apply `to_jsonable` transformation before JSON serialization
src/minisweagent/models/litellm_model.py	Wraps FormatError exceptions with JSON-safe debug messages containing raw response data
tests/utils/test_serialize.py	Comprehensive unit tests for `to_jsonable` covering basic types, collections, model_dump, and fallback scenarios

klieret · 2026-02-05T00:17:36Z

Thanks for raising this. I'm surprised that FormatError would include any non-serializable content. It should be a very simple text message that doesn't take direct model input. Could it be that adding the full response to the data is what caused this?

klieret · 2026-02-05T00:20:40Z

I feel like for this, adding a new kwarg response=response.model_dump() to parse_toolcall_actions and then adding that to the extra field when we raise the FormatError would be more precise.

klieret · 2026-02-05T00:21:12Z

If there's any other serialization issues, I'd be super interested in squashing them. But I don't think the blanket "try to coerce to json" is maybe the best way. It would point more towards other bugs

SailorJoe6 · 2026-02-05T00:32:51Z

I've been using mini-swe-agent to test OSS models hosted locally on a DGX Spark via vLLM, running them against SWE-Bench Multilingual as a first set of tests. I've also been trying a few different configs, adjusting the system prompt, etc. Sometimes the LLM hallucinates pretty badly leading to malformed json as the tool call response, this leads to a hard exception and the FormatError actually becomes a textual representation of the exception name in the trajectory, rather than a safely JSON serialized depiction of what the model tried to do. This code was meant to ensure that no matter what the model tries to do, it's serialized to JSON safely.

TL/DR - it was the most robust fix I could think of that actually allows you to see what the model spit out.

SailorJoe6 · 2026-02-05T00:43:44Z

I do agree that a targeted fix in the tool call parsing path would be better. I'm happy to update this to attach response.model_dump() on FormatError, rather than the raw response object. Is that what you had in mind?

That said, do you think it’s unreasonable to make save() robust against serialization failures? Trajectories are logging artifacts, and we want them to survive even when the model misbehaves. How would you feel about keeping a minimal, safe serialization guard in the error wrapper to ensure we get logs?

SailorJoe6 · 2026-02-05T20:22:28Z

One more comment for you to consider when you decide how we should handle this. The model sometimes returns more than one tool call. I've often seen it return 3 tool calls in one response. If any one of these has malformed JSON, it still throws the error that causes the framework to ignore the good tool calls and only return the ToolCallError response to the LLM.

My patch does not address that. Probably a separate issue.

klieret · 2026-02-06T00:31:19Z

Thanks for all the input!

Let me get back to you for the other comments, soon, but regarding this:

One more comment for you to consider when you decide how we should handle this. The model sometimes returns more than one tool call. I've often seen it return 3 tool calls in one response. If any one of these has malformed JSON, it still throws the error that causes the framework to ignore the good tool calls and only return the ToolCallError response to the LLM.

I think this is the best we can do. If the model issues toolcalls A B C and A is already broken format, I think it's better to just ask it to reconsider the whole thing. Else, if B and C were depending on A to run and we skip A but then execute B and C, I think this would be even more problematic (theoretically it should only issue separate toolcalls for independent things, but that's also not guaranteed).

SailorJoe6 · 2026-02-06T20:14:02Z

I think this is the best we can do. If the model issues toolcalls A B C and A is already broken format, I think it's better to just ask it to reconsider the whole thing. Else, if B and C were depending on A to run and we skip A but then execute B and C, I think this would be even more problematic (theoretically it should only issue separate toolcalls for independent things, but that's also not guaranteed).

Makes sense!

SailorJoe6 and others added 2 commits February 4, 2026 20:50

Make trajectory serialization JSON-safe

9ea6a0f

[pre-commit.ci] auto fixes from pre-commit.com hooks

27fa270

for more information, see https://pre-commit.ci

Add tests for JSON-safe serialization

ee274f3

klieret requested a review from Copilot February 5, 2026 00:11

Copilot started reviewing on behalf of klieret February 5, 2026 00:11 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

Comment thread tests/utils/test_serialize.py

RobinChiu mentioned this pull request Feb 7, 2026

Add the response message and error to LLM server. #734

Closed

yishangupenn mentioned this pull request Mar 10, 2026

Make trajectory serialization JSON-safe perf-ai-example-org/mini-swe-agent#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make trajectory serialization JSON-safe#728

Make trajectory serialization JSON-safe#728
SailorJoe6 wants to merge 3 commits intoSWE-agent:mainfrom
SailorJoe6:bugfix/727-trajectory-serialization

SailorJoe6 commented Feb 4, 2026

Uh oh!

codecov Bot commented Feb 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

klieret commented Feb 5, 2026

Uh oh!

klieret commented Feb 5, 2026

Uh oh!

klieret commented Feb 5, 2026

Uh oh!

SailorJoe6 commented Feb 5, 2026 •

edited

Loading

Uh oh!

SailorJoe6 commented Feb 5, 2026 •

edited

Loading

Uh oh!

SailorJoe6 commented Feb 5, 2026

Uh oh!

klieret commented Feb 6, 2026

Uh oh!

SailorJoe6 commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SailorJoe6 commented Feb 4, 2026

Summary

Changes

Why

Uh oh!

codecov Bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

klieret commented Feb 5, 2026

Uh oh!

klieret commented Feb 5, 2026

Uh oh!

klieret commented Feb 5, 2026

Uh oh!

SailorJoe6 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SailorJoe6 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SailorJoe6 commented Feb 5, 2026

Uh oh!

klieret commented Feb 6, 2026

Uh oh!

SailorJoe6 commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Feb 4, 2026 •

edited

Loading

SailorJoe6 commented Feb 5, 2026 •

edited

Loading

SailorJoe6 commented Feb 5, 2026 •

edited

Loading