DC13-KTH – ProtAIomics

Supervision

Lukas Käll
KTHZ,1st Supervisor
David Bouyssié
CNRS-IPBS, 2nd Supervisor

Project objectives

This project aims to develop robust methods for assessing identification and quantification errors in shotgun proteomics. We will construct probabilistic models to characterize errors in peptide and protein identification from mass spectrometry data. Additionally, we will extend these models to quantify how identification errors propagate to protein-level quantification. By evaluating existing methods and proposing novel approaches, the project seeks to improve the reliability of proteomic data analysis, ultimately enhancing downstream biological interpretations.

Methodology

The project will utilize improved probabilistic regression models to estimate identification and quantification error rates in proteomics data. These error rates will be modeled using hierarchical Bayesian networks to capture dependencies between peptides and proteins. We will implement and test these models on benchmark datasets, comparing their performance against existing approaches. Additionally, we will develop scalable computational tools to integrate these models into current proteomics workflows, ensuring practical applicability.

Expected results

Enhanced software for the analysis of data from mass spectrometry-based proteomics experiments. The enhancements will provide more accurate peptide and protein identifications as well as reliable protein-level quantifications, enabling more robust analyses in mass spectrometry-based proteomics.

Required skills

A suitable background for this position would be a master’s degree in Computer Science, Physics, Statistics, or any other discipline with a large component of quantitative science. Programming skills and language skills are required. Knowledge of biology and computational biology is regarded as an advantageous qualification.

Planned Secondments

Host: CNRS (D. Bouyssie): Duration: 2 Months; When: Year 1, Goal: Combination of pretrained DIA models with intelligent acquisition methods.

Host: ETHZ (P. Beltrao), Duration: 1 Month, When: Year 2, Goal: Integration of phosphoproteomics data in DIA pretrained models.

Host: DTU (A.Santos); Duration: 1 Month; When: Year 3, Goal: Inclusion of DIA data in graphs.

Enrolment in doctoral programs

School of Engineering Sciences in Chemistry, Biotechnology and Health at KTH – the Royal Institute of Technology

1 BA Neely, V Dorfer, …., L Käll, …, Palmblad, M. Toward an integrated machine learning model of a proteomics experiment. (2023)
Journal of Proteome Research. https://doi.org/10.1021/acs.jproteome.2c00711.

2 K Jamali, L Käll, R Zhang, A Brown, D Kimanius, SHW Scheres. Automated model building and protein identification in cryo-EM
maps, bioRxiv https://doi.org/10.1101%2F2023.05.16.541002.

3 M Ekvall, P Truong, W Gabriel, M Wilhelm, L Käll. Prosit transformer: A transformer for prediction of ms2 spectrum intensities.Journal of Proteome Research (2022) https://doi.org/10.1021/acs.jproteome.1c00870.

4 M The, L Käll. Focus on the spectra that matter by clustering of quantification data in shotgun proteomics, Nature Communications (2020) https://doi.org/10.1038/s41467-020-17037-3.
5 L Käll, JD Canterbury, J Weston, WS Noble, MJ MacCoss. Semi-supervised learning for peptide identification from shotgun
proteomics datasets. Nature Methods (2007) https://doi.org/10.1038/nmeth1113.

Assessing identification and quantification errors in mass spectrometry-based proteomics