Skip to content

Make trajectory serialization JSON-safe#728

Open
SailorJoe6 wants to merge 3 commits intoSWE-agent:mainfrom
SailorJoe6:bugfix/727-trajectory-serialization
Open

Make trajectory serialization JSON-safe#728
SailorJoe6 wants to merge 3 commits intoSWE-agent:mainfrom
SailorJoe6:bugfix/727-trajectory-serialization

Conversation

@SailorJoe6
Copy link
Copy Markdown
Contributor

Fixes #727

Summary

When a tool‑calling model emits an invalid tool call, the resulting FormatError can include non‑JSON‑serializable data, which causes agent.save() to fail and prevents the trajectory file from being written. This PR makes the error payload JSON‑safe and hardens serialization so trajectories are always persisted.

Changes

  • Wrap FormatError in LitellmModel with a JSON‑safe debug payload that preserves raw assistant content/tool_calls and response metadata.
  • Add to_jsonable utility and use it in DefaultAgent.save() to guarantee JSON serialization of trajectories.

Why

Losing the trajectory on invalid tool calls breaks expected SWE‑bench behavior and makes debugging difficult. This ensures trajectories are written even on malformed tool calls.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 92.85714% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/minisweagent/utils/serialize.py 90.00% 2 Missing ⚠️
Files with missing lines Coverage Δ
src/minisweagent/agents/default.py 100.00% <100.00%> (ø)
src/minisweagent/models/litellm_model.py 93.18% <100.00%> (+0.49%) ⬆️
src/minisweagent/utils/serialize.py 94.44% <90.00%> (-5.56%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #727 by making trajectory serialization JSON-safe when FormatError occurs due to invalid tool calls. The changes ensure that trajectories are always persisted, even when model responses contain malformed tool calls or non-JSON-serializable data.

Changes:

  • Added to_jsonable utility function to convert arbitrary Python objects to JSON-serializable forms with fallback strategies
  • Modified DefaultAgent.save() to apply to_jsonable before JSON serialization
  • Enhanced LitellmModel.query() to wrap FormatError with JSON-safe debug payloads containing assistant content, tool calls, and response metadata

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/minisweagent/utils/serialize.py Adds to_jsonable utility with comprehensive type handling and fallback mechanisms for non-JSON-serializable objects
src/minisweagent/agents/default.py Updates save() method to apply to_jsonable transformation before JSON serialization
src/minisweagent/models/litellm_model.py Wraps FormatError exceptions with JSON-safe debug messages containing raw response data
tests/utils/test_serialize.py Comprehensive unit tests for to_jsonable covering basic types, collections, model_dump, and fallback scenarios

Comment thread tests/utils/test_serialize.py
@klieret
Copy link
Copy Markdown
Member

klieret commented Feb 5, 2026

Thanks for raising this. I'm surprised that FormatError would include any non-serializable content. It should be a very simple text message that doesn't take direct model input. Could it be that adding the full response to the data is what caused this?

@klieret
Copy link
Copy Markdown
Member

klieret commented Feb 5, 2026

I feel like for this, adding a new kwarg response=response.model_dump() to parse_toolcall_actions and then adding that to the extra field when we raise the FormatError would be more precise.

@klieret
Copy link
Copy Markdown
Member

klieret commented Feb 5, 2026

If there's any other serialization issues, I'd be super interested in squashing them. But I don't think the blanket "try to coerce to json" is maybe the best way. It would point more towards other bugs

@SailorJoe6
Copy link
Copy Markdown
Contributor Author

SailorJoe6 commented Feb 5, 2026

I've been using mini-swe-agent to test OSS models hosted locally on a DGX Spark via vLLM, running them against SWE-Bench Multilingual as a first set of tests. I've also been trying a few different configs, adjusting the system prompt, etc. Sometimes the LLM hallucinates pretty badly leading to malformed json as the tool call response, this leads to a hard exception and the FormatError actually becomes a textual representation of the exception name in the trajectory, rather than a safely JSON serialized depiction of what the model tried to do. This code was meant to ensure that no matter what the model tries to do, it's serialized to JSON safely.

TL/DR - it was the most robust fix I could think of that actually allows you to see what the model spit out.

@SailorJoe6
Copy link
Copy Markdown
Contributor Author

SailorJoe6 commented Feb 5, 2026

I do agree that a targeted fix in the tool call parsing path would be better. I'm happy to update this to attach response.model_dump() on FormatError, rather than the raw response object. Is that what you had in mind?

That said, do you think it’s unreasonable to make save() robust against serialization failures? Trajectories are logging artifacts, and we want them to survive even when the model misbehaves. How would you feel about keeping a minimal, safe serialization guard in the error wrapper to ensure we get logs?

@SailorJoe6
Copy link
Copy Markdown
Contributor Author

One more comment for you to consider when you decide how we should handle this. The model sometimes returns more than one tool call. I've often seen it return 3 tool calls in one response. If any one of these has malformed JSON, it still throws the error that causes the framework to ignore the good tool calls and only return the ToolCallError response to the LLM.

My patch does not address that. Probably a separate issue.

@klieret
Copy link
Copy Markdown
Member

klieret commented Feb 6, 2026

Thanks for all the input!

Let me get back to you for the other comments, soon, but regarding this:

One more comment for you to consider when you decide how we should handle this. The model sometimes returns more than one tool call. I've often seen it return 3 tool calls in one response. If any one of these has malformed JSON, it still throws the error that causes the framework to ignore the good tool calls and only return the ToolCallError response to the LLM.

I think this is the best we can do. If the model issues toolcalls A B C and A is already broken format, I think it's better to just ask it to reconsider the whole thing. Else, if B and C were depending on A to run and we skip A but then execute B and C, I think this would be even more problematic (theoretically it should only issue separate toolcalls for independent things, but that's also not guaranteed).

@SailorJoe6
Copy link
Copy Markdown
Contributor Author

I think this is the best we can do. If the model issues toolcalls A B C and A is already broken format, I think it's better to just ask it to reconsider the whole thing. Else, if B and C were depending on A to run and we skip A but then execute B and C, I think this would be even more problematic (theoretically it should only issue separate toolcalls for independent things, but that's also not guaranteed).

Makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid tool call FormatError sometimes breaks trajectory serialization

3 participants