Nuclear magnetic resonance (NMR) is one of the main experimental techniques for protein structure determination. The biggest advantage of this approach is that the protein structure is determined in its native environment and thus, the natural dynamics of the protein can be studied. However, NMR protein structure determination is an expertise intensive and time consuming process. If the structure determination process can be accelerated or even automated by computational methods, it will significantly advance the structural biology field. Our goal is to propose highly efficient and error tolerant methods that can work well on real and noisy data sets of NMR.
Our first contribution in this work is the development of a novel peak picking method (WaVPeak). First, WaVPeak denoises the NMR spectra using wavelet smoothing. A brute force method is then used to identify all the candidate peaks. After that, the volume of each candidate peak is estimated. Finally, the peaks are sorted according to their volumes. WaVPeak is tested on the same benchmark data set that was used to test the state-of-the-art method, PICKY. WaVPeak shows significantly better performance than PICKY in terms of recall and precision.
Our second contribution is to propose an automatic method to select peaks produced by peak picking methods. This automatic method is used to overcome the limitations of fixed number-based methods. Our method is based on the Benjamini-Hochberg (B-H) algorithm. The method is used with both WaVPeak and PICKY to automatically select the number of peaks to return out of hundreds of candidate peaks. The volume (in WaVPeak) and the intensity (in PICKY) are converted into p-values. Peaks that have p-values below some certain threshold are selected. Experimental results show that the new method is better than the fixed number-based method in terms of recall. To improve precision, we tried to eliminate false peaks using consensus of the B-H selected peaks from both PICKY and WaVPeak. On average, the consensus method is able to identify more than 88% of the expected true peaks, whereas less than 17% of the selected peaks are false ones.
Our third contribution is to propose for the first time, the 3D extension of the Median-Modified-Wiener-Filter (MMWF), and its novel variation named MMWF*. These spatial filters have only one parameter to tune: the window-size. Unlike wavelet denoising, the higher dimensional extension of the newly proposed filters is relatively easy. Thus, they can be applied to denoise multidimensional NMR-spectra. We tested the proposed filters and the Wiener-filter, an adaptive variant of the mean-filter, on a benchmark set that contains 16 two-dimensional and three-dimensional NMR-spectra extracted from eight proteins. Our results demonstrate that the adaptive spatial filters significantly outperform their non-adaptive versions. The performance of the new MMWF* on 2D/3D-spectra is even better than wavelet-denoising.
Finally, we propose a novel framework that simultaneously conducts slice picking and spin system forming, an essential step in resonance assignment. Our framework then employs a genetic algorithm, directed by both connectivity information and amino acid typing information from the spin systems to assign the spin systems to residues. The inputs to our framework can be as few as two commonly used spectra, i.e., CBCA(CO)NH and HNCACB. Different from existing peak picking and resonance assignment methods that treat peaks as the units, our method is based on ‘slices’, which are one-dimensional vectors in three-dimensional spectra that correspond to certain (N, H) values. Experimental results on both benchmark simulated data sets and four real protein data sets demonstrate that our method significantly outperforms the state-of-the-art methods especially on the more challenging real protein data sets, while using a less number of spectra than those methods. Furthermore, we show that using the chemical shift assignments predicted by our method for the four real proteins can lead to accurate calculation of their final three-dimensional structures by using the CS-ROSETTA server.
Ahmed Abbas is an assistant professor in Computer and Systems Engineering Department, Faculty of Engineering, Ain Shams University, Egypt. He received his Master's degree in Computer Science from Ain Shams University in May 2010. He joined KAUST as a Ph.D. student in August 2010 and got his Ph.D. certificate in December 2015. During his Ph.D. study, he has been working on developing new methods to solve key open problems in automated NMR protein structure determination. His papers have been published in several journals such as Bioinformatics, PLoS ONE, Journal of Biomolecular NMR, and Scientific Reports.