Make trajectory serialization JSON-safe#728
Make trajectory serialization JSON-safe#728SailorJoe6 wants to merge 3 commits intoSWE-agent:mainfrom
Conversation
Codecov Report❌ Patch coverage is
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #727 by making trajectory serialization JSON-safe when FormatError occurs due to invalid tool calls. The changes ensure that trajectories are always persisted, even when model responses contain malformed tool calls or non-JSON-serializable data.
Changes:
- Added
to_jsonableutility function to convert arbitrary Python objects to JSON-serializable forms with fallback strategies - Modified
DefaultAgent.save()to applyto_jsonablebefore JSON serialization - Enhanced
LitellmModel.query()to wrap FormatError with JSON-safe debug payloads containing assistant content, tool calls, and response metadata
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/minisweagent/utils/serialize.py | Adds to_jsonable utility with comprehensive type handling and fallback mechanisms for non-JSON-serializable objects |
| src/minisweagent/agents/default.py | Updates save() method to apply to_jsonable transformation before JSON serialization |
| src/minisweagent/models/litellm_model.py | Wraps FormatError exceptions with JSON-safe debug messages containing raw response data |
| tests/utils/test_serialize.py | Comprehensive unit tests for to_jsonable covering basic types, collections, model_dump, and fallback scenarios |
|
Thanks for raising this. I'm surprised that FormatError would include any non-serializable content. It should be a very simple text message that doesn't take direct model input. Could it be that adding the full response to the data is what caused this? |
|
I feel like for this, adding a new kwarg |
|
If there's any other serialization issues, I'd be super interested in squashing them. But I don't think the blanket "try to coerce to json" is maybe the best way. It would point more towards other bugs |
|
I've been using mini-swe-agent to test OSS models hosted locally on a DGX Spark via vLLM, running them against SWE-Bench Multilingual as a first set of tests. I've also been trying a few different configs, adjusting the system prompt, etc. Sometimes the LLM hallucinates pretty badly leading to malformed json as the tool call response, this leads to a hard exception and the FormatError actually becomes a textual representation of the exception name in the trajectory, rather than a safely JSON serialized depiction of what the model tried to do. This code was meant to ensure that no matter what the model tries to do, it's serialized to JSON safely. TL/DR - it was the most robust fix I could think of that actually allows you to see what the model spit out. |
|
I do agree that a targeted fix in the tool call parsing path would be better. I'm happy to update this to attach response.model_dump() on FormatError, rather than the raw response object. Is that what you had in mind? That said, do you think it’s unreasonable to make save() robust against serialization failures? Trajectories are logging artifacts, and we want them to survive even when the model misbehaves. How would you feel about keeping a minimal, safe serialization guard in the error wrapper to ensure we get logs? |
|
One more comment for you to consider when you decide how we should handle this. The model sometimes returns more than one tool call. I've often seen it return 3 tool calls in one response. If any one of these has malformed JSON, it still throws the error that causes the framework to ignore the good tool calls and only return the ToolCallError response to the LLM. My patch does not address that. Probably a separate issue. |
|
Thanks for all the input! Let me get back to you for the other comments, soon, but regarding this:
I think this is the best we can do. If the model issues toolcalls A B C and A is already broken format, I think it's better to just ask it to reconsider the whole thing. Else, if B and C were depending on A to run and we skip A but then execute B and C, I think this would be even more problematic (theoretically it should only issue separate toolcalls for independent things, but that's also not guaranteed). |
Makes sense! |
Fixes #727
Summary
When a tool‑calling model emits an invalid tool call, the resulting FormatError can include non‑JSON‑serializable data, which causes agent.save() to fail and prevents the trajectory file from being written. This PR makes the error payload JSON‑safe and hardens serialization so trajectories are always persisted.
Changes
Why
Losing the trajectory on invalid tool calls breaks expected SWE‑bench behavior and makes debugging difficult. This ensures trajectories are written even on malformed tool calls.