Skip to content

FEAT: Add DangerousQA dataset loader#1751

Open
romanlutz wants to merge 3 commits into
microsoft:mainfrom
romanlutz:romanlutz/add-dangerous-qa-dataset
Open

FEAT: Add DangerousQA dataset loader#1751
romanlutz wants to merge 3 commits into
microsoft:mainfrom
romanlutz:romanlutz/add-dangerous-qa-dataset

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

Description

Adds a remote seed dataset loader for DangerousQA (Shaikh et al., 2022 — On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning, arXiv:2212.08061). The dataset is ~200 harmful questions generated from a single seed prompt and is widely reused as a baseline in subsequent red-teaming work, e.g., Bhardwaj & Poria's Red-Eval (2023).

Approach

  • New class _DangerousQADataset(_RemoteDatasetLoader) in pyrit/datasets/seed_datasets/remote/dangerous_qa_dataset.py, registered in the package __init__.
  • The source JSON is a flat list[str] rather than the list[dict] shape _fetch_from_url expects. To avoid touching the shared base class for a one-off shape, the loader fetches and caches questions itself via small private helpers (_fetch_questions / _load_raw_questions) while still reusing _get_cache_file_name and the JSON read/write helpers — each string is wrapped as {"question": s} on disk so cache I/O stays compatible.
  • Pinned to commit 445568d3b73f81a9054f51c739172186d5648157 of SALT-NLP/chain-of-thought-bias for reproducibility, matching how HarmBench pins its source.
  • harm_categories is intentionally left empty on every SeedPrompt and is not set at the class level. The paper describes the dataset as covering racist/stereotypical/sexist/illegal/toxic/harmful content, but those labels apply in aggregate — the source JSON has no per-item categorisation, so any class-level list would mis-label individual prompts. The docstring and the description field document this explicitly.

Tests and Documentation

  • New unit tests at tests/unit/datasets/test_dangerous_qa_dataset.py (9 tests) cover fetch behaviour, the cache flag, dataset name, the pinned-commit default source, class-level metadata (tags/size/modalities), and error paths for HTTP failure, non-list payloads, and non-string items.
  • Documentation: added @shaikh2022second to doc/references.bib (next to other dataset citations) and to the hidden-citations list in doc/bibliography.md.
  • Verified locally: uv run ruff format, uv run ruff check, uv run ty check, uv run pytest tests/unit/datasets/test_dangerous_qa_dataset.py -v (9 passed), full uv run pytest tests/unit/datasets -v (no regressions). Also smoke-tested a live fetch against the pinned URL — returns 200 seeds and cache round-trip is stable. No notebooks added, so no JupyText run needed.

romanlutz and others added 3 commits May 18, 2026 12:06
Adds a remote seed dataset loader for the DangerousQA dataset from

Shaikh et al. (2022), 'On Second Thought, Let's Not Think Step by Step!

Bias and Toxicity in Zero-Shot Reasoning'. The dataset contains ~200

harmful questions spanning racist, stereotypical, sexist, illegal,

toxic, and harmful categories and is widely used as a baseline in

Bhardwaj & Poria's Red-Eval (2023) benchmark.

The source JSON at https://github.com/SALT-NLP/chain-of-thought-bias

is a flat list of strings, so the loader handles fetch and on-disk

caching directly while still reusing the base class's cache-key helper.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolves conflict in doc/bibliography.md hidden-citations list against microsoft#1747 (DOC: Correct citations) by keeping main's citation-key renames and re-adding @shaikh2022second in alphabetical order.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant