物理化学学报 >> 2011, Vol. 27 >> Issue (02): 343-351.doi: 10.3866/PKU.WHXB20110219

理论与计算化学 上一篇    下一篇

支持向量机与KStar模型预测细胞色素P450酶催化的氧脱烃反应

王丹, 张燕玲, 乔延江   

  1. 北京中医药大学中药学院, 北京 100102
  • 收稿日期:2010-09-13 修回日期:2010-11-18 发布日期:2011-01-25
  • 通讯作者: 乔延江 E-mail:yjqiao@263.net
  • 基金资助:

    教育部博士点基金(20092213120006)和中医药行业专项(200707010)资助项目

Support Vector Machine and KStar Models Predict the o-Dealkylation Reaction Mediated by Cytochrome P450

WANG Dan, ZHANG Yan-Ling, QIAO Yan-Jiang   

  1. School of Chinese Pharmacy, Beijing University of Chinese Medicine, Beijing 100102, P. R. China
  • Received:2010-09-13 Revised:2010-11-18 Published:2011-01-25
  • Contact: QIAO Yan-Jiang E-mail:yjqiao@263.net
  • Supported by:

    The project was supported by the Specialized Research Fund for the Doctoral Program of Higher Education, China (20092213120006) and Research Fund of State Administration of TCM of People’s Republic of China (200707010).

摘要:

分别以支持向量机(SVM)和KStar方法为基础, 构建了代谢产物的分子形状判别和代谢反应位点判别的嵌套预测模型. 分子形状判别模型是以272个分子为研究对象, 计算了包括分子拓扑、二维自相关、几何结构等在内的1280个分子描述符, 考查了支持向量机、决策树、贝叶斯网络、k最近邻这四种机器学习方法建立分类预测模型的准确性. 结果表明, 支持向量机优于其他方法, 此模型可用于预测分子能否被细胞色素P450酶催化发生氧脱烃反应. 代谢反应位点判别模型以538个氧脱烃反应代谢位点为研究对象, 计算了表征原子能量、价态、电荷等26个量子化学特征, 比较了决策树、贝叶斯网络、KStar、人工神经网络建模的准确率. 结果显示, KStar模型的准确率、敏感性、专一性均在90%以上, 对分子形状判别模型筛选出的分子, 此模型能较好地判断出哪个C―O键发生断裂. 本文以15个代谢反应明确的中药分子为验证集, 验证模型准确性, 研究结果表明基于SVM和KStar的嵌套预测模型具有一定的准确性, 有助于开展中药分子氧脱烃代谢产物的预测研究.

关键词: 支持向量机, 细胞色素P450酶, KStar, 氧脱烃反应

Abstract:

We constructed a nested prediction model based on support vector machines (SVM) and the KStar method. The models consisted of a molecular shape discriminative model for metabolites, which was used to predict the o-dealkalytion reaction mediated by cytochrome P450, in addition to the metabolic site discriminative model, which was used to judge C―O bond breaking in molecules. We calculated 1280 molecular descriptors including topological descriptors, 2D autocorrelation descriptors, and geometric descriptors to characterize the physicochemical properties of 272 molecules. A molecular shape discriminative model, represented by the classification models, was constructed by machine learning methods including SVM, decision tree, Bayesian network, and k nearest neighbors method. The results showed that the SVM model was superior to the other methods. Twenty-six quantum chemical features including charge-related, valency-related, and energy-related features were calculated for the 538 metabolism sites for the o-dealkylation reaction in the metabolic site discriminative model. Machine learning methods including decision tree, Bayesian network, KStar, and the artificial neural network method were also used to develop classification models. It showed that the KStar model with its prediction accuracy, sensitivity, and specificity of more than 90% outperformed the other classification models. Fifteen traditional Chinese medicine medicinal molecules were used to validate the model. The results showed that the nested models had a certain accuracy and could contribute to the prediction of metabolites from traditional Chinese medicines.

Key words: Support vector machine, Cytochrome P450 enzyme, KStar, o-Dealkalytion reaction

MSC2000: 

  • O641