Novel Data Analysis Approaches for Cross-linking Mass Spectrometry Proteomics and Glycoproteomics
Author | : Lei Lu |
Publisher | : |
Total Pages | : 0 |
Release | : 2021 |
ISBN-10 | : OCLC:1266282506 |
ISBN-13 | : |
Rating | : 4/5 ( Downloads) |
Download or read book Novel Data Analysis Approaches for Cross-linking Mass Spectrometry Proteomics and Glycoproteomics written by Lei Lu and published by . This book was released on 2021 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Bottom-up proteomics has emerged as a powerful technology for biological studies. The technique is used for a myriad of purposes, including among others protein identification, post-translational modification identification, protein-protein interaction analysis, protein quantification analysis, and protein structure analysis. The data analysis approaches of bottom-up proteomics have evolved over the past two decades, and many different algorithms and software programs have been developed for these varied purposes. In this thesis, I have focused on improving the database search strategies for the important special applications of bottom-up proteomics, including cross-linking mass spectrometry proteomics and O-glycoproteomics. In cross-linking mass spectrometry proteomics, a sample of proteins is treated with a chemical cross-linking reagent. This causes peptides within the proteins to be cross-linked to one another, forming peptide doublets that are released by treatment of the sample with a protease such as trypsin. The data analysis tools are designed to identify the cross-linked peptides. In O-glycoproteomics, the peptides that are released by protease digestion of the protein sample can be modified with any of or even multiple distinct O-glycans, and the data analysis tools should be able to identify all of the glycans and the modification sites at which they are located. In both cases, traditional database searching strategies which try to match the experimental spectra to all potential theoretical spectra is not practical due to the large increases in search space. Researchers suffered from a lack of efficient data analysis tools for these two applications. Here we successfully devised new search algorithms to address these problems, and impemented them in two new software modules in our laboratories' bottom-up software engine MetaMorpheus (Crosslinking data analysis via MetaMorpheusXL and O-glycoproteomics data analysis via O-Pair Search). The new search strategies used in the software program are both based on ion-indexed open search, which was first developed for large scale proteomic studies in the programs MSFragger and Open-pFind. The ion-indexed open search was optimized for cross-linking mass spectrometry proteomics and O-glycoproteomics in this study, and combined with other algorithms. In O-glycoproteomics, a graph-based algorithm is used to speed up the identification and localization of O-glycans. Other useful features have been added in the software program, such as enabling analysis of both cleavable cross-links and non-cleavable cross-links in the cross-link search module, and calculating localization probabilities in the O-glyco search module. Further optimizations including machine learning methods for false discovery rate (FDR) analysis, retention time prediction and spectral prediction could further improve the current best search approaches for cross-link proteomics and O-glycoproteomics data analysis. Chapter 1 provides an overview of bottom-up proteomics data analysis methods and outlines how ion-indexed open search could be useful for special bottom-up proteomics studies. Chapter 2 describes the development of a cross-linking mass spectrometry proteomics search module, resulting in efficiency improvements for both cleavable and non-cleavable cross-link proteomics data analysis. Chapter 3 describes the development of an O-glycoproteomics search module; by combining the ion-indexed open search algorithm with the graph-based localization algorithm, the O-pair Search is more than 2000 times faster than the currently widely used software program Byonic. In Chapter 4, a novel top-down data acquisition method is described. Chapter 5 provides conclusions and future directions.