物理化学学报 >> 2020, Vol. 36 >> Issue (1): 1907006.doi: 10.3866/PKU.WHXB201907006

所属专题: 庆祝唐有祺院士百岁华诞专刊

论文 上一篇    下一篇

应用机器学习方法构建药物分子解离速率常数的预测模型

苏敏仪1,2,刘慧思3,林海霞3,*(),王任小1,2,*()   

  1. 1 中国科学院上海有机化学研究所, 生命有机化学国家重点实验室, 上海 200032
    2 中国科学院大学,北京 100049
    3 上海大学理学院化学系,上海 200444
  • 收稿日期:2019-07-01 录用日期:2019-08-30 发布日期:2019-09-03
  • 通讯作者: 林海霞,王任小 E-mail:haixialin@staff.shu.edu.cn;wangrx@mail.sioc.ac.cn
  • 基金资助:
    中国科技部重点研发项目(2016YFA0502302);国家自然科学基金(81725022);国家自然科学基金(81430083);国家自然科学基金(21661162003);国家自然科学基金(21673276);国家自然科学基金(21472227);国家自然科学基金(21472226);中国科学院先导项目(XDB20000000)

Machine-Learning Model for Predicting the Rate Constant of ProteinLigand Dissociation

Minyi Su1,2,Huisi Liu3,Haixia Lin3,*(),Renxiao Wang1,2,*()   

  1. 1 State Key Laboratory of Bioorganic and Natural Products Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, P. R. China
    2 University of Chinese Academy of Sciences, Beijing 100049, P. R. China
    3 Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, P. R. China
  • Received:2019-07-01 Accepted:2019-08-30 Published:2019-09-03
  • Contact: Haixia Lin,Renxiao Wang E-mail:haixialin@staff.shu.edu.cn;wangrx@mail.sioc.ac.cn
  • Supported by:
    the National Key Research Program of Ministry of Science and Technology of China(2016YFA0502302);National Natural Science Foundation of China(81725022);National Natural Science Foundation of China(81430083);National Natural Science Foundation of China(21661162003);National Natural Science Foundation of China(21673276);National Natural Science Foundation of China(21472227);National Natural Science Foundation of China(21472226);Strategic Priority Research Program of Chinese Academy of Sciences(XDB20000000)

摘要:

越来越多的研究表明:药物分子与靶标分子的结合动力学性质与其在体内的药效有很强的相关性。因此,以改善结合动力学性质为导向的分子设计为药物研发提供了新的思路。本工作的研究目标在于得出预测药物分子解离速率常数(koff)的通用型定量结构-动力学关系(QSKR)模型。我们从文献中收集了406个配体分子的解离速率常数实验值,采用分子模拟方法构建了所有配体与靶蛋白复合物的三维结构模型。然后基于蛋白-配体原子对描述符,采用随机森林算法来构建预测配体分子解离速率常数的QSKR模型。通过探索不同条件(如距离区间,划分区间宽度和特征选择标准)下产生的描述符集合对模型预测精度的影响,确定当采用距离阈值为15 Å、划分区间宽度为3 Å、特征选择方差水平为2时得到的QSKR模型为最优,在两个独立测试集上获得良好的预测精度(相关系数为0.62)。本工作对预测药物分子解离速率常数这一关键科学问题进行了有益的探索,可为后续研究提供思路。

关键词: 解离速率常数, 配体结合动力学, 随机森林模型, 蛋白-配体相互作用, 基于结构的药物设计

Abstract:

An increasing number of recent studies have shown that the binding kinetics of a drug molecule to its target correlates strongly with its efficacy in vivo. Therefore, ligand optimization oriented to improved binding kinetics provides new ideas for rational drug design. Currently, ligand binding kinetics is modeled mainly through extensive molecular dynamics simulations, which limits its application to real-world problems. The present study aimed at obtaining a general-purpose quantitative structure-kinetics relationship (QSKR) model for predicting the dissociation rate constant (koff) of a ligand based on its complex structure. This type of model is expected to be suitable for high-throughput tasks in structure-based drug design. We collected the experimentally measured koff values for 406 ligand molecules from literature, and then constructed a three-dimensional structural model for each protein-ligand complex through molecular modeling. A training set was compiled using 60% of those complexes while the remaining 40% were assigned to two test sets. Based on distance-dependent protein-ligand atom pair descriptors, a random forest algorithm was adopted to derive a QSKR model. Various random forest models were then generated based on the descriptor sets obtained under different conditions, such as distance cutoff, bin width, and feature selection criteria. The cross-validation results of those models were then examined. It was observed that the optimal model was obtained when the distance cutoff was 15 Å (1 Å = 0.1 nm), the bin width was 3 Å, and feature selection variance level was 2. The final QSKR model produced correlation coefficients around 0.62 on the two independent test sets. This level of accuracy is at least comparable to that of the predictive models described in literature, which are typically computationally much more expensive. Our study attempts to address the issue of predicting koff values in drug design. We hope that it can provide inspiration for further studies by other researchers.

Key words: Dissociation rate constant, Ligand binding kinetics, Random forest model, Protein-ligand interaction, Structure-based drug design