Skip to content

FEAT: Add HiXSTest (Hindi exaggerated-safety) dataset loader#1755

Open
romanlutz wants to merge 2 commits into
microsoft:mainfrom
romanlutz:romanlutz/add-hixstest-dataset
Open

FEAT: Add HiXSTest (Hindi exaggerated-safety) dataset loader#1755
romanlutz wants to merge 2 commits into
microsoft:mainfrom
romanlutz:romanlutz/add-hixstest-dataset

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

Description

Adds a remote seed dataset loader for HiXSTest (walledai/HiXSTest), the Hindi companion to SGXSTest from the WalledEval paper. HiXSTest is a manually-curated set of 50 exaggerated-safety prompts in Hindi, each paired with an English translation, a safe/unsafe label, and a polysemous Hindi trigger word as category (e.g. मारना is both "to kill" and "to overcome"). It lets us measure whether models over-refuse benign Hindi prompts that happen to share surface tokens with harmful ones, in a way English-only XSTest variants can't capture.

Approach

  • New _HiXSTestDataset(_RemoteDatasetLoader) in pyrit/datasets/seed_datasets/remote/hixstest_dataset.py (dataset_name="hixstest", single train split, 50 rows).
  • The dataset is gated (gated="auto") on HuggingFace, so users must accept the terms on the dataset page and provide a token. Token handling mirrors sorry_bench_dataset.py: explicit token= kwarg, falling back to the HUGGINGFACE_TOKEN environment variable. Gating is documented in the class docstring.
  • A HiXSTestLanguage enum (HINDI, ENGLISH) controls which text becomes the SeedPrompt.value. The other-language text is always preserved in metadata as hindi_prompt / english_prompt, so consumers can switch without re-fetching. Validation goes through the existing _validate_enum helper, matching the VLGuard pattern.
  • category is mirrored into both harm_categories and metadata["category"]. label is stored as metadata so callers can filter safe vs unsafe.
  • Class-level metadata: modalities=["text"], size="small" (50 prompts), tags={"default", "safety", "multilingual"}.
  • Exported _HiXSTestDataset and HiXSTestLanguage from pyrit/datasets/seed_datasets/remote/__init__.py.
  • Added the WalledEval paper (@gupta2024walledeval, arXiv:2408.03837) to doc/references.bib and doc/bibliography.md.

Tests and Documentation

  • New tests/unit/datasets/test_hixstest_dataset.py (8 tests): dataset_name, default-Hindi language, non-enum rejection, token-from-env, explicit-token override, Hindi-mode fetch (asserts Hindi value + both translations in metadata), English-mode fetch (asserts English value + Hindi preserved in metadata), and that token/split/cache are forwarded to _fetch_from_huggingface.
  • Live-verified end-to-end against the real HuggingFace dataset in both language modes: all 50 prompts load, value switches correctly, metadata stays consistent.
  • uv run ruff format, uv run ruff check, uv run ty check, and the full tests/unit/datasets suite (430 tests) all pass.
  • The class docstring documents gating, token handling, the Hindi/English language modes, and the metadata layout. The bibliography update covers the citation. No notebook changes, so no JupyText run was needed.

romanlutz and others added 2 commits May 18, 2026 07:07
Adds the _HiXSTestDataset remote loader for the walledai/HiXSTest

HuggingFace dataset (50 Hindi exaggerated-safety prompts with English

translations). The dataset is gated; the loader mirrors the SorryBench

token-handling pattern (constructor argument with HUGGINGFACE_TOKEN env

fallback). Hindi prompt is the SeedPrompt value; english_prompt, label,

category, and language are stored in metadata. Adds the gupta2024walledeval

citation to references.bib and bibliography.md, plus unit tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant