Acta Physico-Chimica Sinica ›› 2020, Vol. 36 ›› Issue (1): 1907006.doi: 10.3866/PKU.WHXB201907006

Special Issue: Special Issue in Honor of Academician Youqi Tang on the Occasion of His 100th Birthday

• Article • Previous Articles     Next Articles

Machine-Learning Model for Predicting the Rate Constant of ProteinLigand Dissociation

Minyi Su1,2,Huisi Liu3,Haixia Lin3,*(),Renxiao Wang1,2,*()   

  1. 1 State Key Laboratory of Bioorganic and Natural Products Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, P. R. China
    2 University of Chinese Academy of Sciences, Beijing 100049, P. R. China
    3 Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, P. R. China
  • Received:2019-07-01 Accepted:2019-08-30 Published:2019-09-03
  • Contact: Haixia Lin,Renxiao Wang;
  • Supported by:
    the National Key Research Program of Ministry of Science and Technology of China(2016YFA0502302);National Natural Science Foundation of China(81725022);National Natural Science Foundation of China(81430083);National Natural Science Foundation of China(21661162003);National Natural Science Foundation of China(21673276);National Natural Science Foundation of China(21472227);National Natural Science Foundation of China(21472226);Strategic Priority Research Program of Chinese Academy of Sciences(XDB20000000)


An increasing number of recent studies have shown that the binding kinetics of a drug molecule to its target correlates strongly with its efficacy in vivo. Therefore, ligand optimization oriented to improved binding kinetics provides new ideas for rational drug design. Currently, ligand binding kinetics is modeled mainly through extensive molecular dynamics simulations, which limits its application to real-world problems. The present study aimed at obtaining a general-purpose quantitative structure-kinetics relationship (QSKR) model for predicting the dissociation rate constant (koff) of a ligand based on its complex structure. This type of model is expected to be suitable for high-throughput tasks in structure-based drug design. We collected the experimentally measured koff values for 406 ligand molecules from literature, and then constructed a three-dimensional structural model for each protein-ligand complex through molecular modeling. A training set was compiled using 60% of those complexes while the remaining 40% were assigned to two test sets. Based on distance-dependent protein-ligand atom pair descriptors, a random forest algorithm was adopted to derive a QSKR model. Various random forest models were then generated based on the descriptor sets obtained under different conditions, such as distance cutoff, bin width, and feature selection criteria. The cross-validation results of those models were then examined. It was observed that the optimal model was obtained when the distance cutoff was 15 Å (1 Å = 0.1 nm), the bin width was 3 Å, and feature selection variance level was 2. The final QSKR model produced correlation coefficients around 0.62 on the two independent test sets. This level of accuracy is at least comparable to that of the predictive models described in literature, which are typically computationally much more expensive. Our study attempts to address the issue of predicting koff values in drug design. We hope that it can provide inspiration for further studies by other researchers.

Key words: Dissociation rate constant, Ligand binding kinetics, Random forest model, Protein-ligand interaction, Structure-based drug design