Skip to content
Change the repository type filter

All

    Repositories list

    • Official Codebase for Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders (EACL 2026)
      Python
      MIT License
      1300Updated Feb 14, 2026Feb 14, 2026
    • MedSafetyBench: Evaluating and Improving the Medical Safety of LLMs, NeurIPS 2024
      Python
      MIT License
      54500Updated Dec 4, 2025Dec 4, 2025
    • Codebase for Temporal SAEs paper
      Python
      Apache License 2.0
      21200Updated Nov 14, 2025Nov 14, 2025
    • SpLiCE

      Public
      Sparse Linear Concept Embeddings
      Python
      Apache License 2.0
      1413230Updated Mar 27, 2025Mar 27, 2025
    • Python
      0310Updated Dec 21, 2024Dec 21, 2024
    • Code for "Towards Unifying Interpretability and Control: Evaluation via Intervention"
      Python
      0200Updated Nov 8, 2024Nov 8, 2024
    • Source code for ROCERF
      Jupyter Notebook
      MIT License
      0000Updated Sep 2, 2024Sep 2, 2024
    • OpenXAI

      Public
      OpenXAI : Towards a Transparent Evaluation of Model Explanations
      JavaScript
      MIT License
      4825471Updated Aug 17, 2024Aug 17, 2024
    • Code for paper: Are Large Language Models Post Hoc Explainers?
      Jupyter Notebook
      MIT License
      53410Updated Jul 22, 2024Jul 22, 2024
    • Characterizing Data Point Vulnerability via Average-Case Robustness, UAI 2024
      Python
      MIT License
      0000Updated May 7, 2024May 7, 2024
    • The Disagreement Problem in Explainable ML, TMLR 2025
      Jupyter Notebook
      MIT License
      0200Updated Apr 16, 2024Apr 16, 2024
    • Fair Machine Unlearning: Data Removal while Mitigating Disparities
      Python
      Apache License 2.0
      2300Updated Feb 15, 2024Feb 15, 2024
    • DiET

      Public
      Code for "Discriminative Feature Attributions via Distractor Erasure Tuning"
      Python
      1200Updated Dec 12, 2023Dec 12, 2023
    • amplify

      Public
      Python
      0100Updated Nov 27, 2023Nov 27, 2023
    • Jupyter Notebook
      0000Updated Nov 7, 2023Nov 7, 2023
    • lcnn

      Public
      Low Curvature Neural Networks (NeurIPS 2022)
      Python
      0000Updated Nov 6, 2023Nov 6, 2023
    • "Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness". M. Pawelczyk, T. Datta, J. v.d Heuvel, G. Kasneci, H. Lakkaraju. In…
      Python
      MIT License
      0000Updated Oct 19, 2023Oct 19, 2023
    • "On the Privacy Risks of Algorithmic Recourse". Martin Pawelczyk, Himabindu Lakkaraju* and Seth Neel*. In International Conference on Artificial Intelligence an…
      Jupyter Notebook
      0000Updated Oct 19, 2023Oct 19, 2023
    • "In-Context Unlearning: Language Models as Few Shot Unlearners". Martin Pawelczyk, Seth Neel* and Himabindu Lakkaraju*; arXiv preprint: arXiv:2310.07579; 2023.
      Jupyter Notebook
      0000Updated Oct 19, 2023Oct 19, 2023
    • Code for https://arxiv.org/abs/2306.06716
      Python
      0100Updated Jun 22, 2023Jun 22, 2023
    • DOPE: Data Poisoning Attacks on Off-Policy Policy Evaluation Methods
      Python
      0000Updated May 9, 2023May 9, 2023
    • GraphXAI

      Public
      GraphXAI: Resource to support the development and evaluation of GNN explainers
      Python
      MIT License
      37100Updated Mar 18, 2023Mar 18, 2023
    • lfa

      Public
      Local function approximation (LFA) framework, NeurIPS 2022
      Python
      4500Updated Feb 6, 2023Feb 6, 2023
    • arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
      Python
      Apache License 2.0
      395000Updated Sep 21, 2022Sep 21, 2022
    • ROAR

      Public
      Jupyter Notebook
      2500Updated Jan 26, 2022Jan 26, 2022
    • Python
      MIT License
      0100Updated Dec 17, 2021Dec 17, 2021
    • AIES 2021 Paper: Does Fair Ranking Imporve Minority Outcomes?
      Jupyter Notebook
      0100Updated Dec 5, 2021Dec 5, 2021
    • Code base for robust learning for an intersection of causal and adversarial shifts
      Python
      0300Updated Nov 25, 2021Nov 25, 2021
    • nifty

      Public
      Code for paper https://arxiv.org/abs/2102.13186
      Python
      MIT License
      13100Updated Apr 3, 2021Apr 3, 2021
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.