FEAT: Add SGXSTest dataset loader by romanlutz · Pull Request #1754 · microsoft/PyRIT

romanlutz · 2026-05-18T23:13:40Z

Description

Adds a remote seed dataset loader for SGXSTest (Singapore eXaggerated Safety Test), a 200-prompt benchmark of safe/unsafe prompt pairs that probes over-refusal behavior of LLMs in Singaporean cultural context. It adapts the 10 hazard categories of Roettger et al.'s XSTest to homonyms, figurative language, safe targets/contexts, definitions, discrimination, historical events, and privacy variants.

The dataset is HuggingFace-gated, so the loader mirrors the existing _SorryBenchDataset token pattern: the constructor accepts token: str | None = None and falls back to the HUGGINGFACE_TOKEN environment variable. The class docstring documents the gating requirement.

Because consumers typically only want one side of the pair, the loader exposes a SGXSTestLabel enum (UNSAFE, SAFE, ALL) on the constructor and defaults to UNSAFE so red-teaming flows don't have to post-filter. The enum is validated via the base class's _validate_enum helper, matching the VLGuardSubset pattern. An empty result after filtering raises ValueError, matching _SorryBenchDataset. Per-prompt metadata["label"] and metadata["category"] are preserved so users can still slice the data after loading.

Verified live against the HuggingFace dataset: default returns 100 unsafe prompts, SAFE returns 100 safe, ALL returns the full 200. The class-level harm_categories list mirrors the 10 actual category values from the live data (lower-cased to match PyRIT's tag normalization).

Tests and Documentation

New unit tests in tests/unit/datasets/test_sgxstest_dataset.py cover the three filter modes, empty-after-filter raise, invalid-label raise, token + split forwarding, env-var fallback, explicit-token override, and dataset_name. All 9 tests pass; the broader tests/unit/datasets suite (431 tests) is green.
doc/references.bib and doc/bibliography.md get an entry for the WalledEval paper (@gupta2024walledeval) used in the loader's attribution.
No notebook changes, so no JupyText run needed.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Adds a loader for the walledai/SGXSTest dataset (200 prompts, Singaporean exaggerated-safety pairs from WalledEval), wires it into the remote dataset package, registers the WalledEval citation, and covers it with unit tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Live fetch revealed 10 distinct categories (e.g. 'Homonym', 'Privacy (fiction)', 'Real discrimination, nonsense group') rather than the 9 approximate names in the original spec. Updates the class-level harm_categories list, docstring, and test fixture to mirror the real data. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds an SGXSTestLabel enum (UNSAFE, SAFE, ALL) and a label constructor parameter to _SGXSTestDataset, defaulting to UNSAFE so red-teaming consumers get just the 100 truly-harmful prompts. Filtering happens during seed construction; an empty result raises ValueError. The upstream dataset only publishes a 'train' split, so the split parameter is retained but documented as such. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Roman Lutz and others added 3 commits May 18, 2026 07:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add SGXSTest dataset loader#1754

FEAT: Add SGXSTest dataset loader#1754
romanlutz wants to merge 3 commits into
microsoft:mainfrom
romanlutz:romanlutz/add-sgxstest-dataset

romanlutz commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

romanlutz commented May 18, 2026

Description

Tests and Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant