Graph machine learning can estimate drug concentrations in whole blood from forensic screening results

This repository implements a chemistry-informed Graph Neural Network (GNN) that predict the LC-HRMS signal-to-concentration ratio library for drugs in whole blood, trained on a dataset of 191 different molecules. The data is in the notebook and can also be accessed on .

The GNN model is directly inspired by TChemGNN. Molecules are converted from SMILES into graphs where each atom node carries rich structural information (i.e., aromaticity, charge, valence, hybridization, mass-based descriptors), and each node is additionally augmented with global geometry features (molecular volume, length, width, height) to give the model full-molecule context beyond connectivity. A multi-layer Graph Attention Network (GAT) learns both local substructure effects and broader molecular shape.

The workflow includes graph construction, feature assembly, and a LOOCV training strategy optimized for our small chemical dataset.

Publication

The code reproduces the experiments in the paper "Graph machine learning can estimate drug concentrations in whole blood from forensic screening results" available on ChemRxiv and under review for publication.

Notebooks

The notebook in this repository run the training of the GNN model to reproduce the results in the publication. The notebook can be run in Google Colab or on a stand alone computer but a GPU is highly recommended for faster training of the GNN.

📚 Related Work & Inspiration: ChemGNN

The work “Efficient Learning of Molecular Properties Using Graph Neural Networks Enhanced with Chemistry Knowledge” by the same authors demonstrates that combining classical chemical insight with graph neural networks (GNNs) can substantially improve molecular property prediction.

✅ What We Adopt / Extend from TChemGNN in This Repository

In our project (own LOD-library-191-molecules-LC–MS + GNN), we draw strong inspiration from these ideas — and integrate them into a specialized LC–MS context:

- Rich atom + molecular-level features

We encode atom-level descriptors (aromaticity, ring membership, degree, valence, formal charge, atomic number, hybridization, hydrogen count, mass-based scaling, etc.) — capturing electronic, steric, and topological aspects. This mirrors and extends the philosophy of combining local and global chemical features.

- Bond-based graph structure + molecular shape/size descriptors

As in Efficient-ChemGNN, we acknowledge that bonds alone may not capture all relevant chemical context for LC–MS signal behavior — therefore we allow inclusion of molecular‐level descriptors such as volumes, widths, lengths, heights (if available). This helps the GNN to “see” beyond just connectivity and get structural context relevant for ionization or fragmentation in LC–MS.

- Graph Neural Network architecture using attention layers (GAT)

Our use of GAT layers aligns with the attention-based message-passing architecture favored in GNN chemical modeling. Attention allows the network to weigh different atoms/substructures differently, analogous to how certain functional groups or atom environments contribute more strongly to LC–MS response.

- Designed for small-library, small-data regimes (≈ 191 molecules)

Our model is built to work with limited data, leveraging chemistry-informed features and architecture choices to maximize predictive power despite small sample size.

- Focus on chemically meaningful predictions (LC–MS response, intensity, concentration)

Unlike many generic molecular‐property predictors, our target is the signal/concentration behavior in LC–MS on a new original dataset.

🧭 Summary

This codes builds on the foundational ideas demonstrated in TChemGNN, applying them to an LC–MS–oriented molecular library. By combining atom-level chemical descriptors, global molecular features, and a GAT-based graph architecture, we aim to deliver a data-efficient, chemically informed, and practically usable GNN-based prediction framework for LC–HRMS concentration.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
LOD_library_LOOCV_8Kepochs_tricks.ipynb		LOD_library_LOOCV_8Kepochs_tricks.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph machine learning can estimate drug concentrations in whole blood from forensic screening results

Publication

Notebooks

📚 Related Work & Inspiration: ChemGNN

✅ What We Adopt / Extend from TChemGNN in This Repository

- Rich atom + molecular-level features

- Bond-based graph structure + molecular shape/size descriptors

- Graph Neural Network architecture using attention layers (GAT)

- Designed for small-library, small-data regimes (≈ 191 molecules)

- Focus on chemically meaningful predictions (LC–MS response, intensity, concentration)

🧭 Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Graph machine learning can estimate drug concentrations in whole blood from forensic screening results

Publication

Notebooks

📚 Related Work & Inspiration: ChemGNN

✅ What We Adopt / Extend from TChemGNN in This Repository

- Rich atom + molecular-level features

- Bond-based graph structure + molecular shape/size descriptors

- Graph Neural Network architecture using attention layers (GAT)

- Designed for small-library, small-data regimes (≈ 191 molecules)

- Focus on chemically meaningful predictions (LC–MS response, intensity, concentration)

🧭 Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages