Skip to content

Support messages with images in prepare_multimodal_messages#5474

Merged
albertvillanova merged 10 commits intohuggingface:mainfrom
albertvillanova:support-messages-with-images-prepare_multimodal_messages
Apr 16, 2026
Merged

Support messages with images in prepare_multimodal_messages#5474
albertvillanova merged 10 commits intohuggingface:mainfrom
albertvillanova:support-messages-with-images-prepare_multimodal_messages

Conversation

@albertvillanova
Copy link
Copy Markdown
Member

@albertvillanova albertvillanova commented Apr 8, 2026

Support messages with images in prepare_multimodal_messages.

This PR enhances the handling of multimodal messages by ensuring that existing image payloads are preserved and only unfilled placeholders are populated, preventing accidental overwrites. Additionally, the test suite is expanded to cover this behavior, and prompt processing is streamlined in the trainer.

See related comment in:

CC: @sergiopaniego

Changes

Multimodal message handling improvements:

  • Updated prepare_multimodal_messages to preserve existing image payloads in image blocks and only fill placeholders without an "image" key.

Testing enhancements:

  • Added a new test, test_prepared_image_blocks_without_new_images, to verify that existing image payloads are not overwritten when no new images are provided.

Trainer integration:

  • Simplified prompt normalization for VLMs by directly using prepare_multimodal_messages, ensuring consistent handling of multimodal content during tokenization.

Note

Low Risk
Small, localized change to multimodal message preparation and a trainer call-site refactor; main risk is subtle behavioral differences in placeholder counting for edge-case message formats.

Overview
prepare_multimodal_messages now preserves existing image blocks that already carry an "image" payload, and only counts/fills unfilled {"type": "image"} placeholders from the images argument (avoiding accidental overwrites and placeholder-count mismatches).

Adds a regression test ensuring prepared image payloads remain intact when images is omitted, and simplifies GRPO trainer VLM prompt normalization by delegating to prepare_multimodal_messages instead of ad-hoc string wrapping.

Reviewed by Cursor Bugbot for commit 24bcdd1. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 414deb9. Configure here.

Comment thread tests/test_data_utils.py
Comment thread trl/trainer/grpo_trainer.py
Copy link
Copy Markdown
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @albertvillanova!
I think that once the merge conflicts are resolved and the problem raised by Cursor in grpo_trainer.py is solved, we're good to go 😄

@albertvillanova
Copy link
Copy Markdown
Member Author

This PR requires (see #5474 (comment)):

Copy link
Copy Markdown
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good!

@qgallouedec
Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@albertvillanova albertvillanova force-pushed the support-messages-with-images-prepare_multimodal_messages branch from f083467 to 24bcdd1 Compare April 16, 2026 06:07
@albertvillanova albertvillanova merged commit dc84e41 into huggingface:main Apr 16, 2026
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants