Acta Phys. -Chim. Sin. ›› 2013, Vol. 29 ›› Issue (08): 1639-1647.doi: 10.3866/PKU.WHXB201305171


Quantitative Structure-Activity Relationship Study of the Non-Nucleoside Inhibitors of HCV NS5B Polymerase by Machine Learning Methods

CONG Yong1, XUE Ying1,2   

  1. 1 College of Chemistry, Key Laboratory of Green Chemistry and Technology, Ministry of Education, Sichuan University, Chengdu 610064, P. R. China;
    2 State Key Laboratory of Biotherapy, Sichuan University, Chengdu 610041, P. R. China
  • Received:2013-02-01 Revised:2013-05-17 Published:2013-07-09
  • Contact: XUE Ying
  • Supported by:

    The project was supported by the National Natural Science Foundation of China (21173151) and Open Research Fund of Key Laboratory of Advanced Scientific Computation, Xihua University, China (szjj2011-029).


The quantitative structure-activity relationship (QSAR) approach was used to predict the activity of two different scaffolds (benzoisothiazole and benzothiazine) of 89 non-nucleoside inhibitors of hepatitis c virus (HCV) NS5B polymerase. Two selection methods, linear stepwise regression analysis (LSRA) and genetic algorithm-partial least squares (GA-PLS), were used to select appropriate descriptor subsets for QSAR modeling with linear models. The genetic algorithm-support vector machine (GA-SVM) approach was first used to build nonlinear models with six LSRA- and seven GA-PLS-selected descriptors. Three QSAR models built with the six LSRA-selected descriptors gave correlation coefficients of 0.958-0.962 for the training set. GA-SVM provided the highest prediction accuracy of the models of 0.962. Three QSAR models built with the seven GA-PLS-selected descriptors gave correlation coefficients of 0.918-0.960 for the training set, of which the partial least squares (PLS) model was the best (0.960). The investigated models gave satisfactory prediction results and can be extended to other QSAR studies.

Key words: HCV NS5B polymerase, Non-nucleoside Inhibitor, Linear stepwise regression analysis, Partial least square, Genetic algorithm, Support vector machine