Pinned Loading
-
bio-overrefusal-v0.1
bio-overrefusal-v0.1 PublicDomain-expert-authored benchmark for LLM over-refusal on legitimate biology research queries.
Python
-
narrow-model-safety-eval
narrow-model-safety-eval PublicEmpirical dual-use risk assessment of protein language models (ESM-2) and structure-based design tools (ProteinMPNN)
Python
-
constitutional-bioguard
constitutional-bioguard PublicBiological dual-use content classifier using Constitutional Classifiers methodology — biosafety constitution, synthetic data pipeline, DeBERTa-v3-base classifier
Python
-
GeneLab_benchmark
GeneLab_benchmark PublicGeneLab Spaceflight Transcriptomics Benchmark — NASA OSDR mouse bulk RNA-seq LOMO benchmark for AI/ML and Foundation Models
Python
-
SpaceOmicsBench
SpaceOmicsBench PublicA multi-omics AI benchmark for spaceflight biomedical data — 21 ML tasks across 9 modalities + 100-question LLM evaluation (Inspiration4, NASA Twins, JAXA)
Python
If the problem persists, check the GitHub status page or contact support.


