物理化学学报 >> 2013, Vol. 29 >> Issue (01): 217-223.doi: 10.3866/PKU.WHXB201211122

生物物理化学 上一篇    下一篇

基于机器学习方法的H1N1神经氨酸苷酶抑制剂的分类预测

吕巍1,2, 薛英3,4, 孟庆伟1,2   

  1. 1 山东农业大学生命科学学院, 作物生物学国家重点实验室, 泰安 271018;
    2 山东农业大学生物学博士后科研流动站, 泰安 271018;
    3 四川大学化学学院, 教育部绿色化学与技术重点实验室, 成都 610064;
    4 四川大学生物治疗国家重点实验室, 成都 610041
  • 收稿日期:2012-09-13 修回日期:2012-11-12 发布日期:2012-12-14
  • 通讯作者: 孟庆伟 E-mail:qwmeng@sdau.edu.cn
  • 基金资助:

    国家重点基础研究发展规划项目(973) (2009CB118500)资助

Classification Prediction of Inhibitors of H1N1 Neuraminidase by Machine Learning Methods

LÜ Wei1,2, XUE Ying3,4, MENG Qing-Wei1,2   

  1. 1 College of Life Sciences, State Key Laboratory of Crop Biology, Shandong Agricultural University, Tai’an, Shandong 271018, P. R. China;
    2 Postdoctoral Research Bachelor of Biology, Shandong Agricultural University, Tai’an, Shandong 271018, P. R. China;
    3 College of Chemistry, Key Laboratory of Green Chemistry and Technology, Ministry of Education, Sichuan University, Chengdu 610064, P. R. China;
    4 State Key Laboratory of Biotherapy, Sichuan University, Chengdu 610041, P. R. China
  • Received:2012-09-13 Revised:2012-11-12 Published:2012-12-14
  • Supported by:

    The project was supported by the National Key Basic Research Program of China (973) (2009CB118500).

摘要:

流感是一种主要的呼吸道传染病, 在普通人群中有着较高的发病率, 而对于一些年老和高危病人还有较高的死亡率. 研究显示抑制神经氨酸苷酶(NA)可以阻断病毒RNA复制, 因此NA是有效治疗H1N1型流感病毒的重要药物靶标. 通过计算机方法进行虚拟筛选和预测NA抑制剂已经变得越来越重要. 针对酶活性位点进行基于结构的合理药物设计, 开发H1N1 病毒神经氨酸苷酶抑制剂, 已成为药物研究的热点之一. 本文通过多种机器学习方法(支持向量机(SVM)、k-最近相邻法(k-NN)和C4.5决策树(C4.5DT))对已知的神经氨酸苷酶抑制剂(NAIs)与非神经氨酸苷酶抑制剂(non-NAIs)建立分类预测模型. 其中227个结构多样性化合物(72个NAIs与155个non-NAIs)被用于测试分类预测系统, 并用递归变量消除法选择与神经氨酸苷酶抑制剂分类相关的性质描述符以提高预测精度. 本研究对独立验证集的总预测精度为75.9%-92.6%, NA 抑制剂的预测精度为64.3%-78.6%, 非H1N1抑制剂的预测精度为77.5%-97.5%. SVM法给出最好的总预测精度(92.6%). 本研究表明支持向量机等机器学习方法可以有效预测未知数据集中潜在的NA抑制剂, 并有助于发现与其相关的分子描述符.

关键词: 机器学习方法, H1N1型流感病毒, 神经酰胺酶抑制剂, 支持向量机

Abstract:

Influenza is a major respiratory infection associated with significant morbidity in the general population and mortality in elderly and high-risk patients. Research has shown that inhibiting neuraminidase (NA) prevents RNA replication, so NA is an important drug target in the treatment of H1N1 influenza virus. It is becoming increasingly important to screen and predict molecules that have NA inhibitory activity by computational methods. In this work, we explored several machine learning methods (support vector machine (SVM), k-nearest neighbor (k-NN), and C4.5 decision tree (C4.5 DT)) for predicting NA inhibitors (NAIs). These predictive systems were tested using 227 compounds (72 NAIs and 155 non-NAIs), which were significantly more diverse in chemical structure than those used in other studies. A feature selection method was used to improve the accuracy of the predictions and the selection of molecular descriptors responsible for distinguishing between NAIs and non-NAIs. The prediction accuracies were 75.9%-92.6% for all the compounds, 64.3%-78.6% for NAIs, and 77.5%-97.5% for non-NAIs. The SVM method gave the best total accuracy of 92.6% for all of methods. This work suggests that machine learning methods can be useful to predict potential NAIs from unknown sets of compounds and to determine molecular descriptors associated with NAIs.

Key words: Machine learning method, H1N1 influenza virus, Neuraminidase inhibitor, Support vector machine