It contains the notebooks, input data, ranked compound predictions, FS-Mol training tasks, and generated figures used for zero-shot antibacterial compound prioritization.
.
├── experiments/
│ └── Smiles and Activity.xlsx # Experimental compounds and activity annotations
├── fsmol/
│ └── train/
│ └── CHEMBL*.jsonl.gz # FS-Mol training tasks
├── ranking/
│ └── predictions_lifechem_ecoli.csv # Ranked LifeChem predictions for E. coli activity
├── scripts/
│ ├── predict.ipynb # Primary prediction workflow
│ ├── analysis.ipynb # Assay analysis and candidate extraction
│ ├── comp.ipynb # Comparative analysis
│ └── fs_mol_similarity.ipynb # FS-Mol similarity analysis
├── figures/
│ ├── plots.ipynb # Figure generation notebook
│ ├── hit_sar_cluster_plot.ipynb # Hit/SAR cluster figure workflow
│ ├── derivatives_umap.* # Derivative UMAP figure
│ └── prediction_distributions_umap.* # Prediction UMAP figure
└── README.md
Run the notebooks from the repository root in this order:
scripts/predict.ipynbscripts/analysis.ipynbscripts/comp.ipynbscripts/fs_mol_similarity.ipynbfigures/plots.ipynbfigures/hit_sar_cluster_plot.ipynb
The notebooks expect a Python/Jupyter environment with common cheminformatics and machine-learning packages, including pandas, numpy, matplotlib, seaborn, scikit-learn, rdkit, umap-learn, lightgbm, xgboost, tqdm, fsmol, and twinbooster.
If you use this repository, please cite the associated publication:
Zero-shot modelling discovers structurally unprecedented antibiotics against Escherichia coli.