Skip to content

FEAT: Add CategoricalHarmfulQA (CatQA) dataset loader#1749

Open
romanlutz wants to merge 2 commits into
microsoft:mainfrom
romanlutz:romanlutz/categorical-harmfulqa-review
Open

FEAT: Add CategoricalHarmfulQA (CatQA) dataset loader#1749
romanlutz wants to merge 2 commits into
microsoft:mainfrom
romanlutz:romanlutz/categorical-harmfulqa-review

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

Description

Adds a loader for declare-lab/CategoricalHarmfulQA (CatQA), a 550-question safety evaluation dataset hand-authored against the combined prohibited-use lists from OpenAI's usage policies and Meta's Llama2 acceptable use policy.

CatQA complements the existing harmful_qa loader rather than duplicating it:

  • Real harm taxonomy mapped to harm_categories (11 main categories: Illegal Activity, Child Abuse, Hate/Harass/Violence, Malware Viruses, Physical Harm, Economic Harm, Fraud/Deception, Adult Content, Political Campaigning, Privacy Violation Activity, Tailored Financial Advice), each with 5 sub-categories surfaced via per-prompt metadata. HarmfulQA's "topics" (Social Sciences, Computer Science, ...) are academic disciplines, not harm categories.
  • Multilingual. Same prompts across English, Chinese, and Vietnamese splits via language={"en","zh","vi"} (default "en"). HarmfulQA is English-only.
  • Different construction methodology. Hand-authored against published policy lists vs HarmfulQA's auto-generated Chain of Utterances approach.

The loader follows the same patterns as the existing remote dataset providers (_HarmfulQADataset, _AyaRedteamingDataset, etc.): inherits _RemoteDatasetLoader, fetches via the datasets library, exposes class-level metadata (tags={"safety","multilingual"}, size="large", modalities=["text"], full harm_categories list) for filterable discovery, and registers automatically through the __init_subclass__ hook.

Tests and Documentation

  • New unit tests in tests/unit/datasets/test_categorical_harmful_qa_dataset.py cover default English split, all three language splits (parametrized), empty-category handling, and dataset_name. All 6 pass; the full tests/unit/datasets tier (428 tests) is also green.
  • Added the bhardwaj2024homer bibliography entry (arXiv 2402.11746) and listed CatQA in doc/code/datasets/1_loading_datasets.{py,ipynb}.
  • Re-executed 1_loading_datasets.ipynb via jupytext --to ipynb --execute so the printed dataset roster includes categorical_harmful_qa.
  • pre-commit (ruff-format, ruff-check, ty, nbstripout, link-checker) is clean.

romanlutz and others added 2 commits May 18, 2026 06:35
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant