A comprehensive repository of scripts, software, and tutorials for molecular evolution, computational protein design, and structural bioinformatics.
EvoSite3D integrates evolutionary sequence analysis, structural bioinformatics, and deep learning approaches to understand protein function, evolution, and design. This repository contains:
- Production-ready tools for evolutionary analysis and drug target validation
- In-depth tutorials covering sequence analysis, protein structure prediction and design, molecular dynamics, and computational biology
- Research applications analyzing viral evolution and genome datasets
Whether you're detecting positive selection, designing novel proteins, predicting mutation effects, or validating drug targets—this repository provides code, workflows, and educational materials to support your research.
For detailed installation and setup, see INSTALL.md.
All Python scripts and tools can be run using uv run (see CLAUDE.md for environment details).
A Claude Code-powered assistant that validates therapeutic targets against diseases using real-time data from Open Targets Platform.
Features:
- Resolves gene symbols → Ensembl IDs and disease names → EFO IDs
- Fetches evidence from Open Targets (genetics, tractability, pathways, safety)
- Scores targets on Clinical Evidence, Druggability, and Pathway/Biology (0–5 scale each)
- Generates interactive HTML matrices and Markdown reports
Location: ./software/target_ai/
Getting Started:
# Install Claude Code
npm install -g @anthropic-ai/claude-code
# Navigate to the tool
cd ./software/target_ai
claudeClaude Code reads the embedded CLAUDE.md file and becomes your target validation assistant. No additional API keys needed—Open Targets Platform is free and open.
Learn to detect continuous positive selection using CodeML/PAML. Covers dN/dS ratios, site-specific models (M0, M1a, M2a, M7, M8), and includes a complete HLA_DQB1 example with sequence alignment, phylogenetic trees, and structural visualisation.
Understand branch-site models for detecting positive selection in specific lineages. Includes real datasets and tree interpretation.
Reconstruct ancestral sequences at internal nodes of phylogenetic trees using maximum-likelihood methods.
Learn to compute free energy changes (ΔG, ΔΔG) for protein mutations. Covers both FoldX 3 and FoldX 4 approaches. Suitable for understanding stability–activity trade-offs and mutation effects.
A comprehensive three-part tutorial on de novo protein design:
- Structure Generation with RFdiffusion — Generate novel protein backbones using diffusion models
- Sequence Design with ProteinMPNN — Design amino acid sequences for your generated structures
- Full Atomic Modelling with MODELLER — Build complete 3D models with all-atom resolution
Learn the full pipeline from backbone generation through atomic-level detail.
Design mutations to enhance protein-protein binding affinity using computational methods.
Use pre-trained protein language models (Facebook's ESM-2) to predict the functional impact of amino acid mutations without experimental training data. Covers the stability–activity trade-off in protein engineering.
Build a multi-task graph neural network to predict small-molecule selectivity across six steroid hormone nuclear receptors (ERα, ERβ, AR, PR, GR, MR). Learn multi-task learning and latent space chemistry.
Run production-ready molecular dynamics simulations of Ubiquitin using OpenMM. Includes system setup, equilibration, and analysis workflows.
Leverage graph databases to query and analyse biological networks and relationships.
Detect and localise phosphorylation sites in proteins using mass spectrometry data and OpenMS workflows. Learn phosphoproteomics from data processing through site assignment.
Statistical foundations for multiple testing correction and false discovery rates.
Working with NCBI taxonomy for phylogenetic and evolutionary studies.
Location: ./research/2026_ebola_outbreak/
Comparative sequence analysis of Ebola virus glycoprotein (GP) between Zaire and Bundibugyo strains. Identifies 8 sequence differences at 21 antibody-contacting positions in the 3CSY crystal structure. Reveals regional clustering at the GP1-GP2 junction and conserved structural elements (disulfide bonds). Candidates for further experimental investigation.
- 3D Structure Integration — Visualise evolutionary sites in protein structures
- Evolutionary Analysis — dN/dS ratios, positive selection detection, ancestral reconstruction
- Mutation Prediction — Predict stability and functional effects of mutations
- Protein Design — De novo design, sequence optimisation, affinity improvement
- Molecular Simulation — Molecular dynamics with OpenMM
- Systems Biology — Multi-omics integration and network analysis
- AI-Powered Tools — Leverage Claude Code for interactive target validation and analysis
- Production Software — Specialised tools for common computational biology tasks
Several tools in this repository are designed to work with Claude Code, Anthropic's official CLI for Claude:
- Target AI (
./software/target_ai/) — Start Claude Code in the directory; it readsCLAUDE.mdand becomes your target validation assistant - Interactive workflows — Use Claude Code to explore data, refine analyses, and iterate on research questions
To get started with Claude Code:
npm install -g @anthropic-ai/claude-code
claude --helpFor detailed installation instructions, environment setup, and dependency management (using uv), see INSTALL.md.
All Python code in this repository follows the guidelines in CLAUDE.md:
- Use
uv runto execute Python scripts and tools - Dependencies are managed via
uv add/uv remove - Tests run with
uv run pytest - Code is linted with
uv run ruff
If you use EvoSite3D in your research, please cite:
@software{studer2024evosite3d,
author = {Romain Studer},
title = {EvoSite3D: Analysing Evolutionary Sites in 3D Protein Structures},
year = {2024},
url = {https://github.com/romainstuder/evosite3d}
}Romain A. Studer
- Senior Bioinformatics Data Scientist
- Previously affiliated with BenevolentAI, EMBL-EBI, UCL, and UNIL
- Focus: Protein and nucleotide analysis, computational biology, machine learning
- LinkedIn: romainstuder
- GitHub: romainstuder
- Computational biology community for feedback and testing
- Built with BioPython, PAML, PyMOL, OpenMM, ESM-2, RFdiffusion, and other excellent open-source tools
- Claude Code for interactive analysis and AI-assisted research workflows
If you encounter issues or have questions:
- Search existing issues for solutions
- Create a new issue with a minimal reproducible example and relevant details
- Contributions welcome — pull requests, tutorials, and feedback are appreciated
This project is licensed under the MIT License — see the LICENSE file for details.
This project is under active development. Please report bugs, suggest features, or contribute improvements via the GitHub issue tracker.