Knowledge-assisted prediction of clinical outcomes from proteomics data
Supervision
Julio Saez Rodriguez
EMBL-EBI, 1st Supervisor
Juho Hamari
TAU, 2nd Supervisor
Objectives
Gather available prior knowledge that can enhance the interpretability of predictive models through feature grouping (e.g. pathways, sub-cellular location). Compare knowledge-driven dimensionality reduction to data-driven dimensionality reduction. Apply combined models to predict clinical outcomes from proteomics data.
Methodology
Integrate relevant database information into OmniPath. Define key clinical variables for prediction and assess the efficacy of basic AI models after compressing the proteomics data using both data-driven and knowledge-driven techniques.
Leverage the superior interpretability of successful knowledge-driven models to identify new biomarkers and drug-targets, corroborating findings with data from external cohorts.
Required Skills
Knowledge of machine learning algorithm, proteomic data analysis, biological ontologies.
Expected results
A well-calibrated knowledge-driven model, which will combine the predictive power of data-driven approaches with the interpretability of prior biological knowledge.
Planned Secondments
Host: TAU (J. Hamari), Duration: 2 Months; When: Year 1, Goal: Development of high-content data visualization systems.
Host: DTU (A. Santos), Duration: 1 Month; When: Year2, Goal: Integration of systems modelling with graph theory.
Host: CRG (E. Sabidó), Duration: 1 Month; When: Year 3, Goal: Experimental skills development on proteomics data acquisition.
Enrolment in doctoral programs
THE UNIVERSITY OF CAMBRIDGE
Project specific requirements
If you are interested in this project, make sure you also apply to the current EMBL International PhD Programme (EIPP) call of the host institute, which closes October 13th. This project has also been advertised there. PhD students can only be admitted at the host institute, if they passed the EIPP selection process.
References
1 Kuppe C, Ramirez Flores RO, Li Z [35 authors] Milting H#, Costa IG#, Saez-Rodriguez J#, Kramann R#. Spatial multi-omic map of human myocardial infarction. Nature, 2022
2 Dugourd A, Kuppe C, Sciacovelli M [12 authors] Frezza C#, Kramann R#, Saez‐Rodriguez J# Causal integration of multi‐omics data with prior knowledge to generate mechanistic hypotheses. MSB, 2021
3 Burtscher ML [11 authors] Fälth-Savitski M#, Saez-Rodriguez J# Network integration of thermal proteome profiling with multi-omics data decodes PARP inhibition. MSB 2024
4 BioCypher – www.biocypher.org Unifying framework for biomedical knowledge graphs. Lobentanzer S#, [31 authors] Saez-Rodriguez J#. Democratizing knowledge representation with BioCypher. Nature Biotech, 2023
5 LIANA – https://saezlab.github.io/liana/ Cell-cell communication from single-cell and spatial transcriptomics using different resources and methods. Dimitrov, D [8 authors] Saez-Rodriguez J.. Nat Cell Biol (2024)