FEAT: Add DangerousQA dataset loader#1751
Open
romanlutz wants to merge 3 commits into
Open
Conversation
Adds a remote seed dataset loader for the DangerousQA dataset from Shaikh et al. (2022), 'On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning'. The dataset contains ~200 harmful questions spanning racist, stereotypical, sexist, illegal, toxic, and harmful categories and is widely used as a baseline in Bhardwaj & Poria's Red-Eval (2023) benchmark. The source JSON at https://github.com/SALT-NLP/chain-of-thought-bias is a flat list of strings, so the loader handles fetch and on-disk caching directly while still reusing the base class's cache-key helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolves conflict in doc/bibliography.md hidden-citations list against microsoft#1747 (DOC: Correct citations) by keeping main's citation-key renames and re-adding @shaikh2022second in alphabetical order. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a remote seed dataset loader for DangerousQA (Shaikh et al., 2022 — On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning, arXiv:2212.08061). The dataset is ~200 harmful questions generated from a single seed prompt and is widely reused as a baseline in subsequent red-teaming work, e.g., Bhardwaj & Poria's Red-Eval (2023).
Approach
_DangerousQADataset(_RemoteDatasetLoader)inpyrit/datasets/seed_datasets/remote/dangerous_qa_dataset.py, registered in the package__init__.list[str]rather than thelist[dict]shape_fetch_from_urlexpects. To avoid touching the shared base class for a one-off shape, the loader fetches and caches questions itself via small private helpers (_fetch_questions/_load_raw_questions) while still reusing_get_cache_file_nameand the JSON read/write helpers — each string is wrapped as{"question": s}on disk so cache I/O stays compatible.445568d3b73f81a9054f51c739172186d5648157ofSALT-NLP/chain-of-thought-biasfor reproducibility, matching how HarmBench pins its source.harm_categoriesis intentionally left empty on everySeedPromptand is not set at the class level. The paper describes the dataset as covering racist/stereotypical/sexist/illegal/toxic/harmful content, but those labels apply in aggregate — the source JSON has no per-item categorisation, so any class-level list would mis-label individual prompts. The docstring and thedescriptionfield document this explicitly.Tests and Documentation
tests/unit/datasets/test_dangerous_qa_dataset.py(9 tests) cover fetch behaviour, the cache flag, dataset name, the pinned-commit default source, class-level metadata (tags/size/modalities), and error paths for HTTP failure, non-list payloads, and non-string items.@shaikh2022secondtodoc/references.bib(next to other dataset citations) and to the hidden-citations list indoc/bibliography.md.uv run ruff format,uv run ruff check,uv run ty check,uv run pytest tests/unit/datasets/test_dangerous_qa_dataset.py -v(9 passed), fulluv run pytest tests/unit/datasets -v(no regressions). Also smoke-tested a live fetch against the pinned URL — returns 200 seeds and cache round-trip is stable. No notebooks added, so no JupyText run needed.