Bringing AI to the machine through real-time instrument control: towards intelligent acquisition methods tailored for single-cell proteomics

Supervision

David Bouyssié
CNRS-IPBS, 1st Supervisor
Petr Novák
BIOCEV, 2nd Supervisor

Objectives

Mass spectrometry (MS)-based proteomics has entered the single-cell era, offering unprecedented opportunities to study proteome heterogeneity at the cellular level. However, existing acquisition methods face significant limitations in sensitivity, reproducibility, and throughput, especially when dealing with ultra-low input samples.
In this context, real-time control of MS instruments opens the door to smart, adaptive data acquisition. By integrating prior knowledge of peptide behavior with on-the-fly data analysis, such intelligent methods promise to improve precursor selection, maximize information content, and reduce missing values in large-scale single-cell proteomics experiments.
This project aims at developing next-generation intelligent mass spectrometry (MS) acquisition strategies by combining real-time instrument control with AI-based decision-making. Specifically, the objective is to create adaptive acquisition workflows capable of optimizing MS/MS event scheduling and tuning acquisition parameters dynamically, to push the limits of sensitivity and reproducibility in single-cell proteomics. The anticipated outcomes will lay the groundwork for future integration of AI-assisted, real-time decision-making in proteomics instrumentation and workflows.

Methodology

The PhD candidate will build upon our in-house real-time instrument control software, MSReact, to develop adaptive mass spectrometry acquisition methods tailored for single-cell proteomics. The project will focus on enhancing the instrument’s ability to make informed decisions during acquisition by integrating real-time retention time alignment, on-the-fly MS/MS data processing in DIA and PRM modes, and dynamic adjustment of acquisition parameters. A particular emphasis will be placed on the application of AI algorithms to optimize precursor selection and fine-tune key parameters such as collision energy, based on predicted peptide properties. The developed acquisition methods will be implemented and evaluated using both label-free and isobaric labeling approaches.

Required skills

Background in bioinformatics or computer science
Programming skills: proficiency in Python, knowledge of C#/C++ would be useful
Data analysis skills: machine learning methodologies

Expected Results

This project is expected to deliver novel intelligent acquisition methods specifically optimized for the analysis of single-cell proteomes. These methods will significantly enhance detection sensitivity and improve data completeness by reducing stochastic variation in precursor selection and acquisition scheduling. A reference dataset will be generated using both label-free and label-based strategies to demonstrate the practical benefits of the developed approaches. In addition, the project will result in the dissemination of open-source software tools and workflows, as well as scientific publications detailing the methodology and key findings.

Planned Secondments

Host: BIOCEV (P.Novák), Duration: 2 Months; When: Year 1, Goal: Prioritized acquisition methods for structural proteomics.

Host: FHOOE (V. Dorfer), Duration: 1 Month, When: Year2, Goal: Integration of real-time search engines for real-time acquisition methods.

Host: UKHD (I. Bludau), Duration: 1 Month; When: Year 3, Goal: Proteoform complexity and tailored mass spectrometric methods.

Enrolment in doctoral programs

Doctoral School in Biology-Health-Biotechnologies, University of Toulouse

References

1 Bouyssié et al. WOMBAT-P: Benchmarking Label-Free Proteomics Data Analysis Workflows. J Proteome Res. 2024, 23, 418-429. doi:10.1021/acs.jproteome.3c00636.

2 Bouyssié D et al. Proline: an efficient and user-friendly software suite for large-scale proteomics. Bioinformatics 2020, 36, 3148-3155. doi:10.1093/bioinformatics/btaa118.

3 Bouyssié D et al. HDX-Viewer: interactive 3D visualization of hydrogen-deuterium exchange data. Bioinformatics 2019, 35, 5331-5333. doi:10.1093/bioinformatics/btz550.

4 Van Puyvelde B et al. A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics. Sci Data. 2022, 9, 126. doi:10.1038/s41597-022-01216-6.

5 Voisinne G et al. Kinetic proofreading through the multi-step activation of the ZAP70 kinase underlies early T cell ligand discrimination. Nat Immunol. 2022, 23, 1355-1364. doi:10.1038/s41590-022-01288-x.

Call for applicants