Expanding the small molecule repertoire towards precision drugs through AI generative models
Supervision
Patrick Aloy
IRB, 1st Supervisor
Katja Luck
IMB, 2nd Supervisor
Objectives
The main objective of the project is to incorporate detailed data on how the cell proteomes react to pharmacological perturbations into the Bioteque. We will then leverage chemical and biological descriptors in an AI-based strategy for the de novo design of chemical compounds with specific pharmacological properties (i.e. revert disease-related cell changes). If successful, these could overcome the limitations of target-centric approaches and smooth the transition from blockbuster drugs to personalized medicine, where we might be able to design specific drugs for each patient
Metodology
Data integration: Integration of proteomics data into the Bioteque will require to acquire basic knowledge on proteomics, drug-induced cell perturbations and multi-omics data. Knowledge graphs (including BioCypher and Neo4j) and embeddings generation and evaluation.
AI-based small molecules generation: Reinforcement learning, transformers, variational autoencoders and diffusion models.
General: Python programming, with emphasis on AI/ML packages. RDkit and other chemoinformatics tools.
The laboratory already has wide experience on all the required methodologies, so that the student will be (painlessly) guided through them all.
Required skills
Education: Engineering degree in Computer Sciences, Data Sciences or a bachelor in Biosciences (e.g. biology, biotechnology, pharmacy, chemistry).
Required skills: Strong programming and scripting skills, with good knowledge of Python. Excellent interpersonal and communication skills. Highly motivated. Fluency in English.
Desired experience: Knowledge of machine learning packages (Scikit, Keras, Pytorch, etc). Competent in the use of HPC systems, virtual machines (OpenNebula) and Grid Containers (Docker, Singularity).
Expected results
We will incorporate drug-induced proteomics changes in the knowledge graph of the Bioteque and will generate small molecule bioactivity and biological embeddings, a format readily amenable for modern AI, which will allow the contextualization of the generated data in the corpus of current biomedical knowledge. This will be an ideal framework to assess the added value of proteomics measurements versus i.e. transcriptional readouts. Finally, we will also deliver a validated set of NCEs that selectively inhibit growth in pancreatic cancer cell lines, with a clear description of their mechanisms of action.
Planned secondments
Host: IMB (K. Luck), Duration: 2 Months; When: Year 1, Goal: Include dynamics in the proteomics data of the Bioteque KG and assess its added value potential in the AI-guided generation of precision drugs.
Host: EMBL-EBI (J. Saez), Duration: 1 Month, When: Year2, Goal: Use BioCypher to complement the Bioteque Knowledge Graph by including proteomics data and contextualize it with the current biomedical knowledge.
Host: UCAM (K. Lilley), Duration: 1 Month; When: Year 3, Goal: Include spatial proteomics data in the Bioteque KG to identify cell-specific vulnerabilities and use them in the AI-guided generation of precision drugs.
Enrolment in doctoral programs
PhD in Biomedicine by the Universitat Pompeu Fabra
References
Comajuncosa-Creus et al. Integration of diverse bioactivity data into the Chemical Checker compound universe. 2025. Nature Protocols. 10.1038/s41596-025-01167-3
Comajuncosa-Creus et al. Comprehensive detection and characterization of human druggable pockets through binding descriptotrs. 2024. Nature Communications 15, 7917.
Comajuncosa-Creus et al. Stereochemically-aware bioactivity descriptors for uncharacterized chemical compounds. 2023. J Cheminformatics 16, 70.
Fernández-Torras et al. Integrating and formatting biomedical data in the Bioteque, a comprehensive repository of pre-calculated knowledge graph embeddings. 2022. Nature Communications. 13: 5304.
Fernández-Torras et al. Connecting chemistry and biology through molecular descriptors. 2022. Curr Opin Chem Biol, 66: 102090.
Bertoni et al. Bioactivity descriptors for uncharacterized chemical compounds. 2021. Nature Communications. 12: 3932.
Pauls et al. Identification and drug-induced reversion of molecular signatures of Alzheimer’s disease onset and progression in AppNL-G-F, AppNL-F, and 3xTg-AD mouse models. 2021. Genome Medicine. 13:168.
Duran-Frigola et al. Extending the small molecule similarity principle to all levels of biology with the Chemical Checker. 2020. Nature Biotechnology. 38: 1087-1096.
Chemical Checker: https://chemicalchecker.com/
Bioactivity Signaturizers: https://bioactivitysignatures.org/
Bioteque: https://bioteque.irbbarcelona.org/