Skip to content

TLutchyn/LOD-library-191-molecules-_LC_MS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 

Repository files navigation

Graph machine learning can estimate drug concentrations in whole blood from forensic screening results

This repository implements a chemistry-informed Graph Neural Network (GNN) that predict the LC-HRMS signal-to-concentration ratio library for drugs in whole blood, trained on a dataset of 191 different molecules. The data is in the notebook and can also be accessed on DOI.

The GNN model is directly inspired by TChemGNN. Molecules are converted from SMILES into graphs where each atom node carries rich structural information (i.e., aromaticity, charge, valence, hybridization, mass-based descriptors), and each node is additionally augmented with global geometry features (molecular volume, length, width, height) to give the model full-molecule context beyond connectivity. A multi-layer Graph Attention Network (GAT) learns both local substructure effects and broader molecular shape.

The workflow includes graph construction, feature assembly, and a LOOCV training strategy optimized for our small chemical dataset.

Publication

The code reproduces the experiments in the paper "Graph machine learning can estimate drug concentrations in whole blood from forensic screening results" available on ChemRxiv and under review for publication.

Notebooks

The notebook in this repository run the training of the GNN model to reproduce the results in the publication. The notebook can be run in Google Colab or on a stand alone computer but a GPU is highly recommended for faster training of the GNN.

📚 Related Work & Inspiration: ChemGNN

The work “Efficient Learning of Molecular Properties Using Graph Neural Networks Enhanced with Chemistry Knowledge” by the same authors demonstrates that combining classical chemical insight with graph neural networks (GNNs) can substantially improve molecular property prediction.

✅ What We Adopt / Extend from TChemGNN in This Repository

In our project (own LOD-library-191-molecules-LC–MS + GNN), we draw strong inspiration from these ideas — and integrate them into a specialized LC–MS context:

- Rich atom + molecular-level features

We encode atom-level descriptors (aromaticity, ring membership, degree, valence, formal charge, atomic number, hybridization, hydrogen count, mass-based scaling, etc.) — capturing electronic, steric, and topological aspects. This mirrors and extends the philosophy of combining local and global chemical features.

- Bond-based graph structure + molecular shape/size descriptors

As in Efficient-ChemGNN, we acknowledge that bonds alone may not capture all relevant chemical context for LC–MS signal behavior — therefore we allow inclusion of molecular‐level descriptors such as volumes, widths, lengths, heights (if available). This helps the GNN to “see” beyond just connectivity and get structural context relevant for ionization or fragmentation in LC–MS.

- Graph Neural Network architecture using attention layers (GAT)

Our use of GAT layers aligns with the attention-based message-passing architecture favored in GNN chemical modeling. Attention allows the network to weigh different atoms/substructures differently, analogous to how certain functional groups or atom environments contribute more strongly to LC–MS response.

- Designed for small-library, small-data regimes (≈ 191 molecules)

Our model is built to work with limited data, leveraging chemistry-informed features and architecture choices to maximize predictive power despite small sample size.

- Focus on chemically meaningful predictions (LC–MS response, intensity, concentration)

Unlike many generic molecular‐property predictors, our target is the signal/concentration behavior in LC–MS on a new original dataset.

🧭 Summary

This codes builds on the foundational ideas demonstrated in TChemGNN, applying them to an LC–MS–oriented molecular library. By combining atom-level chemical descriptors, global molecular features, and a GAT-based graph architecture, we aim to deliver a data-efficient, chemically informed, and practically usable GNN-based prediction framework for LC–HRMS concentration.

About

Chemistry-informed Graph Attention Network that models structure–property relationships by integrating atom-level descriptors with global molecular geometry features for 191 small molecules (LC–MS)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors