FEAT: Add HiXSTest (Hindi exaggerated-safety) dataset loader by romanlutz · Pull Request #1755 · microsoft/PyRIT

romanlutz · 2026-05-18T23:30:24Z

Description

Adds a remote seed dataset loader for HiXSTest (walledai/HiXSTest), the Hindi companion to SGXSTest from the WalledEval paper. HiXSTest is a manually-curated set of 50 exaggerated-safety prompts in Hindi, each paired with an English translation, a safe/unsafe label, and a polysemous Hindi trigger word as category (e.g. मारना is both "to kill" and "to overcome"). It lets us measure whether models over-refuse benign Hindi prompts that happen to share surface tokens with harmful ones, in a way English-only XSTest variants can't capture.

Approach

New _HiXSTestDataset(_RemoteDatasetLoader) in pyrit/datasets/seed_datasets/remote/hixstest_dataset.py (dataset_name="hixstest", single train split, 50 rows).
The dataset is gated (gated="auto") on HuggingFace, so users must accept the terms on the dataset page and provide a token. Token handling mirrors sorry_bench_dataset.py: explicit token= kwarg, falling back to the HUGGINGFACE_TOKEN environment variable. Gating is documented in the class docstring.
A HiXSTestLanguage enum (HINDI, ENGLISH) controls which text becomes the SeedPrompt.value. The other-language text is always preserved in metadata as hindi_prompt / english_prompt, so consumers can switch without re-fetching. Validation goes through the existing _validate_enum helper, matching the VLGuard pattern.
category is mirrored into both harm_categories and metadata["category"]. label is stored as metadata so callers can filter safe vs unsafe.
Class-level metadata: modalities=["text"], size="small" (50 prompts), tags={"default", "safety", "multilingual"}.
Exported _HiXSTestDataset and HiXSTestLanguage from pyrit/datasets/seed_datasets/remote/__init__.py.
Added the WalledEval paper (@gupta2024walledeval, arXiv:2408.03837) to doc/references.bib and doc/bibliography.md.

Tests and Documentation

New tests/unit/datasets/test_hixstest_dataset.py (8 tests): dataset_name, default-Hindi language, non-enum rejection, token-from-env, explicit-token override, Hindi-mode fetch (asserts Hindi value + both translations in metadata), English-mode fetch (asserts English value + Hindi preserved in metadata), and that token/split/cache are forwarded to _fetch_from_huggingface.
Live-verified end-to-end against the real HuggingFace dataset in both language modes: all 50 prompts load, value switches correctly, metadata stays consistent.
uv run ruff format, uv run ruff check, uv run ty check, and the full tests/unit/datasets suite (430 tests) all pass.
The class docstring documents gating, token handling, the Hindi/English language modes, and the metadata layout. The bibliography update covers the citation. No notebook changes, so no JupyText run was needed.

Adds the _HiXSTestDataset remote loader for the walledai/HiXSTest HuggingFace dataset (50 Hindi exaggerated-safety prompts with English translations). The dataset is gated; the loader mirrors the SorryBench token-handling pattern (constructor argument with HUGGINGFACE_TOKEN env fallback). Hindi prompt is the SeedPrompt value; english_prompt, label, category, and language are stored in metadata. Adds the gupta2024walledeval citation to references.bib and bibliography.md, plus unit tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz and others added 2 commits May 18, 2026 07:07

Add language parameter to HiXSTest loader

368ec26

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add HiXSTest (Hindi exaggerated-safety) dataset loader#1755

FEAT: Add HiXSTest (Hindi exaggerated-safety) dataset loader#1755
romanlutz wants to merge 2 commits into
microsoft:mainfrom
romanlutz:romanlutz/add-hixstest-dataset

romanlutz commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

romanlutz commented May 18, 2026

Description

Approach

Tests and Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant