FIX: Use sequence=0 for both pieces in multimodal dataset loaders by romanlutz · Pull Request #1756 · microsoft/PyRIT

romanlutz · 2026-05-19T02:13:40Z

Description

Four multimodal remote dataset loaders were assigning sequence=0 to one piece (image or text) and sequence=1 to the other while sharing the same prompt_group_id. Per SeedPrompt.sequence (pyrit/models/seeds/seed_prompt.py:43-44), prompts are only grouped into a single multimodal user message when they share both prompt_group_id and sequence. With mismatched sequences, the image and text were being delivered as two separate turns rather than as a single multimodal message, which defeats the purpose of these datasets (the model is supposed to reason over image + text together).

This PR brings the four affected loaders in line with the correct pattern already used by harmbench_multimodal_dataset.py and the recently added msts_dataset.py: both pieces share prompt_group_id and sequence=0.

Loader changes (pyrit/datasets/seed_datasets/remote/):

vlguard_dataset.py - image sequence=1 -> 0.
vlsu_multimodal_dataset.py - image sequence=1 -> 0.
visual_leak_bench_dataset.py - text sequence=1 -> 0. Reworded class and fetch_dataset_async docstrings that described the old behavior.
comic_jailbreak_dataset.py - text sequence=1 -> 0. Reworded fetch_dataset_async and _build_seed_group docstrings. The SeedObjective in the group is unchanged - only the image+text pair needs to share sequence=0.

Tests and Documentation

Updated the four corresponding unit tests under tests/unit/datasets/ to assert the new shared sequence == 0 for both pieces (one assertion change per file).

uv run ruff format pyrit tests - clean
uv run ruff check pyrit tests - clean
uv run -m ty check pyrit/datasets/seed_datasets/remote - clean
uv run pytest tests/unit/datasets -q - 422 passed
pre-commit hooks passed on commit

No JupyText/doc changes needed (no docs reference these sequence numbers).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix sequence numbers for multimodal dataset loaders

76c7c43

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Use sequence=0 for both pieces in multimodal dataset loaders#1756

FIX: Use sequence=0 for both pieces in multimodal dataset loaders#1756
romanlutz wants to merge 1 commit into
microsoft:mainfrom
romanlutz:romanlutz/fix-multimodal-sequence

romanlutz commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

romanlutz commented May 19, 2026

Description

Tests and Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant