Beyond Additivity: Zero-shot Methods Cannot Predict Impact of Epistasis on Protein Properties and Function
Authors:
Anastasia Kolchina¹˒², Igors Dubanevics³, Fyodor A. Kondrashov³, Olga V. Kalinina¹˒²˒⁴
Affiliations:
¹ Research Group Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), 66123 Saarbrücken, Germany
² Center for Bioinformatics, Saarland University, Saarbrücken, Germany
³ Evolutionary and Synthetic Biology Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
⁴ Medical Faculty, Saarland University, Homburg, Germany
This repository contains the code accompanying our Nature Communications submission:
Beyond Additivity: Zero-shot Methods Cannot Predict Impact of Epistasis on Protein Properties and Function
We benchmark 95 zero-shot variant effect prediction (VEP) models from ProteinGym for their ability to predict epistatic effects — cases where the combined impact of multiple mutations deviates from the sum of individual effects.
While zero-shot models perform reasonably well on non-epistatic variant combinations, their predictive power drops substantially for strongly epistatic variants, highlighting current limitations of unsupervised protein language models in multi-mutation regimes.
.
├── external/
│ └── tsuboyama/
│ ├── additive_model.py
│ ├── protease-pipeline.yml
│ └── README.md
│
├── notebooks/
│ ├── 01_datasets_exploration.ipynb
│ ├── 02_epistasis_detection.ipynb
│ └── 03_model_performance.ipynb
│
├── results/
│ ├── figures/
│ │ ├── main/
│ │ └── supplementary/
│ └── tables/
│ ├── final/
│ │ ├── main/
│ │ └── supplementary/
│ └── intermediate/
│
├── scripts/
│ ├── download_data.sh
│ └── prepare_data.py
│
├── src/
│ ├── models/
│ │ ├── linear_regression.py
│ │ └── mlp.py
│ ├── analysis.py
│ ├── constants.py
│ ├── data_processing.py
│ ├── paths.py
│ ├── plotting.py
│ └── utils.py
│
├── environment.yml
├── LICENSE
└── README.md
Clone the repository:
git clone https://github.com/kalininalab/epistasis_proteingym.git
cd epistasis_proteingym
Create the conda environments:
conda env create -f environment.yml
conda env create -f external/tsuboyama/protease-pipeline.yml
conda activate epi_env
python -m ipykernel install --user --name=epi_env --display-name "Python (epi_env)"
pip install -e .
Download datasets:
bash scripts/download_data.sh
Prepare data:
python scripts/prepare_data.py
Run notebooks in order using kernel epi_env:
1. 01_datasets_exploration.ipynb
2. 02_epistasis_detection.ipynb
3. 03_model_performance.ipynb
results/tables/final
Tables used directly in the manuscript and supplementary material.
results/tables/intermediate
Intermediate results generated during the pipeline (fully reproducible).
This module reproduces ΔG reconstruction and epistasis thresholds introduced in Tsuboyama et al.
Code is adapted from the original publication repository
(see external/tsuboyama/README.md for attribution and license).
It is used automatically within the analysis notebooks.
MIT License — see LICENSE.
If you use this work, please cite:
Kolchina A., Dubanevics I., Kondrashov F.A., Kalinina O.V. Beyond Additivity: Zero-shot Methods Cannot Predict Impact of Epistasis on Protein Properties and Function, 2025.