The interaction control harness for customer-facing AI agents - optimized for building controlled, consistent, and predictable customer interactions with LLMs.
-
Updated
Apr 17, 2026 - Python
The interaction control harness for customer-facing AI agents - optimized for building controlled, consistent, and predictable customer interactions with LLMs.
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
A curated list of trustworthy deep learning papers. Daily updating...
📚 A curated list of papers & technical articles on AI Quality & Safety
Code accompanying the paper Pretraining Language Models with Human Preferences
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".
Reading list for adversarial perspective and robustness in deep reinforcement learning.
A curated list of awesome academic research, books, code of ethics, courses, databases, data sets, frameworks, institutes, maturity models, newsletters, principles, podcasts, regulations, reports, responsible scale policies, tools and standards related to Responsible, Trustworthy, and Human-Centered AI.
A curated list of awesome resources for Artificial Intelligence Alignment research
A rationalist ruleset for "debugging" LLMs, auditing their internal reasoning and uncovering biases; also a jailbreak.
Sparse probing paper full code.
Official Implementation of Nabla-GFlowNet (ICLR 2025)
[ICLR 2026] - Official repo for the paper: "RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models"
Website to track people, organizations, and products (tools, websites, etc.) in AI safety
Educational analysis of LLM alignment, safety behavior, and framing-sensitive response patterns.
[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
Just like the elite potential of a high-drive Belgian Malinois, an AI system's raw capabilities are wasted when deployed without proper structure. The technological value is no longer found in creating the drive, but in mastering the leash. Synapptic gives your AI Assistant persistent memory that updates in real time, saving you tokens and time.
An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal and moving the community towards finding and building solutions.
Add a description, image, and links to the ai-alignment topic page so that developers can more easily learn about it.
To associate your repository with the ai-alignment topic, visit your repo's landing page and select "manage topics."