Acta Phys. -Chim. Sin. ›› 2009, Vol. 25 ›› Issue (12): 2558-2564.doi: 10.3866/PKU.WHXB20091122

• ARTICLE • Previous Articles     Next Articles

Classification Modeling and Recognition of Protein Fold Type

LIU Yue, LI Xiao-Qin, XU Hai-Song, QIAO Hui   

  1. School of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, P. R. China
  • Received:2009-05-07 Revised:2009-08-28 Published:2009-11-27
  • Contact: LI Xiao-Qin E-mail:lxq0811@bjut.edu.cn

Abstract:

The mechanism of how protein amino acid sequences determine protein structure is a core issue in biology. The protein fold type reflects the topological pattern of the structure's core. Fold recognition is an important method in protein sequence-structure research. This article focuses on the 36 fold types that are not incorporated into the unified hidden Markov model (HMM) model but that account for 41.8% of α, β, and α/β protein's in the Astral 1.65 sequence database. The training set contains samples that have less than 25% sequence identity with each other. We applied the hierarchical clustering method according to root mean square deviation (RMSD) and fold subgroups were generated. A profile-HMM based on a multiple structural alignment algorithm (MUSTANG) structure alignment was then built for each subgroup. After testing 9505 proteins with less than 95% sequence identity from the Astral 1.65 database, the average sensitivity, specificity and Matthew's correlation coefficient (MCC) of the 36 fold types were found to be 90%, 99% and 0.95, respectively. These results show that classification modeling according to RMSD is able to achieve precise fold recognition while a unified HMM cannot be built because there are too many elements in the training set. We have developed a new method and novel ideas to enable profile-HMMprotein fold recognition and have laid the foundation for further research.

Key words: Protein fold type, RMSD, Hierarchical clustering, Profile-HMM, Fold recognition