Proteins function in living organisms as enzymes, antibodies, sensors, and transporters, among myriad other roles. The understanding of protein functions has great implications for the study of biological and medical sciences. It has been widely accepted that protein functions are largely determined by protein structures, and proteins with similar sequences tend to fold into similar structures. Thus, protein sequences and structures become the primary evidences used to find homologous proteins that share a common ancestry. Moreover, it is known that protein structures are more conserved than protein sequences over the course of evolution. Therefore, finding remote homologous proteins with limited sequence similarities becomes a fundamental yet challenging problem in computational biology, and it is also an indispensable step towards understanding protein functions.
Here, three different novel methods are presented for finding remote homologous proteins with different goals: (a) the PROtein STructure Alignment (PROSTA) family that automatically determines and aligns structures of protein pockets and interaction interfaces; (b) the ContactLib method that scans tens of thousands of protein structures for homologous structures in seconds; (c) the CMsearch method that simultaneously explore the protein sequence space and the protein structure space, and performs cross-modal search for homologous proteins. Multiple experiments on finding homologous proteins and protein structure prediction have been conducted showing significant performance improvements over state-of-the-art methods. Moreover, case studies where our method discovers, for the first time, structural similarities between pairs of functionally related protein-DNA complexes are presented.
Dr. Xuefeng Cui is currently a Post-Doc Fellow at the Computational Bioscience Research Centre (CBRC) and the Computer, Electrical and Mathematical Science and Engineering (CEMSE) division, the King Abdullah University of Science and Technology, Saudi Arabia. Dr. Cui obtained his bachelor degree, master degree and Ph.D. degree at the David R. Cheriton School of Computer Science, the University of Waterloo, Canada. His research interests include finding homologous genes, finding homologous protein, General Purpose GPU (GPGPU) algorithms, and applied machine learning. His outstanding research work has been published in top journals and top conferences in computational biology.