Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a “bird’s eye view” of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry data sets showed that the proposed methods outperformed existing methods in identifying complex proteoforms.
Dr. Xiaowen Liu joined the School of Informatics and Computing at Indiana University-Purdue University Indianapolis (IUPUI) in 2012. He worked as a Postdoctoral Fellow at the University of Western Ontario, the University of Waterloo, and the University of California, San Diego from 2008 to 2012.
Xiaowen's personal website is available at: