物理化学学报 >> 2010, Vol. 26 >> Issue (02): 471-477.doi: 10.3866/PKU.WHXB20100125

生物物理化学 上一篇    下一篇

基于机器学习方法的激素敏感脂肪酶抑制剂活性预测

吕巍, 薛英   

  1. 四川大学化学学院,教育部绿色化学与技术重点实验室, 成都 610064; 四川大学生物治疗国家重点实验室, 成都 610041
  • 收稿日期:2009-09-17 修回日期:2009-10-25 发布日期:2010-01-26
  • 通讯作者: 薛英 E-mail:yxue@scu.edu.cn

Activity Prediction of Hormone-Sensitive Lipase Inhibitors Based on Machine Learning Methods

LV Wei, XUE Ying   

  1. Key Laboratory of Green Chemistry and Technology, Ministry of Education, College of Chemistry, Sichuan University, Chengdu 610064, P. R. China; State Key Laboratory of Biotherapy, Sichuan University, Chengdu 610041, P. R. China
  • Received:2009-09-17 Revised:2009-10-25 Published:2010-01-26
  • Contact: XUE Ying E-mail:yxue@scu.edu.cn

摘要:

脂肪组织中, 激素敏感脂肪酶(HSL)被认为是调节脂肪酸代谢的关键限速酶. HSL在糖尿病的发病过程中起重要作用, 抑制HSL活性有助于糖尿病的治疗, 因此探索新颖的HSL抑制剂成为当前研究的热门. 在激素敏感脂肪酶的作用机制和三维结构缺乏的情况下, 需要发展预测HSL抑制剂的方法. 本文采用几种机器学习方法(支持向量机(SVM)、k-最近相邻法(k-NN)和C4.5 决策树(C4.5 DT))对已知的HSL抑制剂与非抑制剂建立分类预测模型. 252个结构多样性化合物(123个HSL抑制剂与129个HSL非抑制剂)被用于测试分类预测系统, 并用递归变量消除法选择与HSL抑制剂相关的性质描述符以提高预测精度. 本研究对独立验证集的总预测精度为75.0%-80.0%, HSL抑制剂的预测精度为85.7%-90.5%, 非HSL抑制剂的预测精度为63.2%-68.4%. 支持向量机方法给出最好的总预测精度(80.0%). 本研究表明支持向量机等机器学习方法可以有效预测未知数据集中潜在的HSL抑制剂, 并有助于发现与其相关的分子描述符.

关键词: 支持向量机激, 素敏感脂肪酶, 机器学习方法, 分子描述符, 递归变量消除法

Abstract:

Hormone-sensitive lipase (HSL) is known as the key rate-limiting enzyme responsible for regulating free fatty acids (FFAs) metabolismin adipose tissue. Recently, HSLhas been found to be useful in the treatment of diabetes so the discovery of new HSL inhibitors (HSLIs) is of interest. Methods for the prediction of HSLIs are highly desired to facilitate the design of novel diabetes therapeutic agents because limited knowledge exists concerning the mechanism and three dimensional (3D) structure of hormone-sensitive lipase. We have explored several machine learning methods (support vectormachines (SVM), k-nearest neighbor (k-NN), and C4.5 decision tree (C4.5 DT)) to predict desirable HSLIs from a comprehensive set of known HSLIs and non-HSLIs. Our prediction system was tested using 252 compounds (123 HSLIs and 129 non-HSLIs) and these are significantly more diverse in chemical structure than those in other studies. The recursive feature elimination selection method was used to improve the prediction accuracy and to select the molecular descriptors responsible for distinguishing HSLIs and non-HSLIs. Prediction accuracies were 85.7%-90.5% for HSLIs, 63.2%-68.4% for non-HSLIs, and 75.0%-80.0% for all structures based on three kinds of machine learning methods using an independent validation set. SVMgave the best total accuracy of 80.0% for all the structures. This work suggests that machine learning methods such as SVM are useful to predict the potential HSLIs among unknown sets of compounds and to characterize the molecular descriptors associated with HSLIs.

Key words: Support vector machine, Hormone-sensitive lipase, Machine learning method, Molecular descriptor, Recursive feature elimination

MSC2000: 

  • O641