Support messages with images in prepare_multimodal_messages by albertvillanova · Pull Request #5474 · huggingface/trl

albertvillanova · 2026-04-08T05:53:14Z

Support messages with images in prepare_multimodal_messages.

This PR enhances the handling of multimodal messages by ensuring that existing image payloads are preserved and only unfilled placeholders are populated, preventing accidental overwrites. Additionally, the test suite is expanded to cover this behavior, and prompt processing is streamlined in the trainer.

See related comment in:

Support multimodal tool responses in environment_factory for VLM training #5323 (comment)

CC: @sergiopaniego

Changes

Multimodal message handling improvements:

Updated prepare_multimodal_messages to preserve existing image payloads in image blocks and only fill placeholders without an "image" key.

Testing enhancements:

Added a new test, test_prepared_image_blocks_without_new_images, to verify that existing image payloads are not overwritten when no new images are provided.

Trainer integration:

Simplified prompt normalization for VLMs by directly using prepare_multimodal_messages, ensuring consistent handling of multimodal content during tokenization.

Note

Low Risk
Small, localized change to multimodal message preparation and a trainer call-site refactor; main risk is subtle behavioral differences in placeholder counting for edge-case message formats.

Overview
prepare_multimodal_messages now preserves existing image blocks that already carry an "image" payload, and only counts/fills unfilled {"type": "image"} placeholders from the images argument (avoiding accidental overwrites and placeholder-count mismatches).

Adds a regression test ensuring prepared image payloads remain intact when images is omitted, and simplifies GRPO trainer VLM prompt normalization by delegating to prepare_multimodal_messages instead of ad-hoc string wrapping.

^{Reviewed by Cursor Bugbot for commit 24bcdd1. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-04-08T05:55:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 414deb9. Configure here.}

sergiopaniego

Thanks for the update @albertvillanova!
I think that once the merge conflicts are resolved and the problem raised by Cursor in grpo_trainer.py is solved, we're good to go 😄

…th-images-prepare_multimodal_messages

albertvillanova · 2026-04-10T06:22:26Z

This PR requires (see #5474 (comment)):

Fix prepare_multimodal_messages not normalizing empty string content for assistant/tool roles #5496

…th-images-prepare_multimodal_messages

qgallouedec

I think it looks good!

qgallouedec · 2026-04-15T15:09:33Z

@codex review

chatgpt-codex-connector · 2026-04-15T15:14:34Z

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…th-images-prepare_multimodal_messages

albertvillanova added 6 commits April 8, 2026 07:46

Use prepare_multimodal_messages in GRPO _tokenize_prompts

07113e3

Count only unfilled image placeholders in prepare_multimodal_messages

cc7ce4c

Fix fill in of placeholders

3c7a573

Fill in only if images

3d8593d

Add unit test

6dd5e35

Update docstring

414deb9

cursor bot reviewed Apr 8, 2026

View reviewed changes

Comment thread tests/test_data_utils.py

Comment thread trl/trainer/grpo_trainer.py

sergiopaniego reviewed Apr 9, 2026

View reviewed changes

albertvillanova added 2 commits April 9, 2026 15:01

Merge remote-tracking branch 'upstream/main' into support-messages-wi…

521fc03

…th-images-prepare_multimodal_messages

Merge remote-tracking branch 'upstream/main' into support-messages-wi…

b283961

…th-images-prepare_multimodal_messages

albertvillanova mentioned this pull request Apr 10, 2026

Fix prepare_multimodal_messages not normalizing empty string content for assistant/tool roles #5496

Merged

Merge remote-tracking branch 'upstream/main' into support-messages-wi…

8850fb3

…th-images-prepare_multimodal_messages

albertvillanova mentioned this pull request Apr 13, 2026

Simplify role handling in prepare_multimodal_messages #5508

Merged

qgallouedec approved these changes Apr 15, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into support-messages-wi…

24bcdd1

…th-images-prepare_multimodal_messages

albertvillanova force-pushed the support-messages-with-images-prepare_multimodal_messages branch from f083467 to 24bcdd1 Compare April 16, 2026 06:07

albertvillanova merged commit dc84e41 into huggingface:main Apr 16, 2026
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support messages with images in prepare_multimodal_messages#5474

Support messages with images in prepare_multimodal_messages#5474
albertvillanova merged 10 commits intohuggingface:mainfrom
albertvillanova:support-messages-with-images-prepare_multimodal_messages

albertvillanova commented Apr 8, 2026 •

edited by cursor bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

sergiopaniego left a comment

Uh oh!

albertvillanova commented Apr 10, 2026

Uh oh!

qgallouedec left a comment

Uh oh!

qgallouedec commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

albertvillanova commented Apr 8, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

albertvillanova commented Apr 10, 2026

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

albertvillanova commented Apr 8, 2026 •

edited by cursor bot

Loading