Please wait a minute...
Acta Physico-Chimica Sinca  2017, Vol. 33 Issue (5): 918-926    DOI: 10.3866/PKU.WHXB201701163
ARTICLE     
Developing a Support Vector Machine Based QSPR Model to PredictGas-to-Benzene Solvation Enthalpy of Organic Compounds
Hassan GOLMOHAMMADI1,*(),Zahra DASHTBOZORGI2,Sajad KHOOSHECHIN2
1 Young Researchers and Elite Club, Yadegar-e-Imam Khomeini (RAH) Shahr-e-Rey Branch, Islamic Azad University, Tehran, Iran
2 Young Researchers and Elite Club, Central Tehran Branch, Islamic Azad University, Tehran, Iran
Download: HTML     PDF(1991KB) Export: BibTeX | EndNote (RIS)       Supporting Info

Abstract  

The purpose of this paper is to present a novel way to building quantitative structure-propertyrelationship (QSPR) models for predicting the gas-to-benzene solvation enthalpy (ΔHSolv) of 158 organiccompounds based on molecular descriptors calculated from the structure alone. Different kinds of descriptorswere calculated for each compounds using dragon package. The variable selection technique of enhancedreplacement method (ERM) was employed to select optimal subset of descriptors. Our investigation revealsthat the dependence of physico-chemical properties on solvation enthalpy is a nonlinear observable fact andthat ERM method is unable to model the solvation enthalpy accurately. The standard error value of predictionset for support vector machine (SVM) is 1.681 kJ·mol-1 while it is 4.624 kJ·mol-1 for ERM. The resultsestablished that the calculated ΔHSolv values by SVM were in good agreement with the experimental ones, andthe performances of the SVM models were superior to those obtained by ERM one. This indicates that SVMcan be used as an alternative modeling tool for QSPR studies.



Key wordsQuantitative structure-property relationship      Gas-to-benzene solvation enthalpy      Descriptor      Enhanced replacement method      Support vector machine     
Received: 13 December 2016      Published: 16 January 2017
Corresponding Authors: Hassan GOLMOHAMMADI     E-mail: hassan.gol@gmail.com
Cite this article:

Hassan GOLMOHAMMADI,Zahra DASHTBOZORGI,Sajad KHOOSHECHIN. Developing a Support Vector Machine Based QSPR Model to PredictGas-to-Benzene Solvation Enthalpy of Organic Compounds. Acta Physico-Chimica Sinca, 2017, 33(5): 918-926.

URL:

http://www.whxb.pku.edu.cn/10.3866/PKU.WHXB201701163     OR     http://www.whxb.pku.edu.cn/Y2017/V33/I5/918

 
Descriptor Notation Coefficient Mean effect VIF
Complementary information content (neighborhood symmetry of 1-order) CIC1 3.358 8.403 2.225
Solvation connectivity index chi-1 X1Sol -3.473 -12.326 2.197
R maximal autocorrelation of lag 1/weighted by atomic Sanderson electronegativities R1e+ -29.550 -4.529 1.560
Geary autocorrelation-lag 1/weighted by atomic Sanderson electronegativities GATS1e 5.925 6.129 1.657
Constant -13.044
 
 
CIC1 X1Sol R1e+ GATS1e
CIC1 1 0.626 0.308 -0.462
X1Sol 1 -0.130 -0.462
R1e+ 1 -0.430
GATS1e 1
 
 
 
 
ModelSEtSEpRtRpFtFp
ERM3.5034.6240.9670.9651708493
SVM1.0671.6810.9970.995195713964
 
 
 
 
R2 Rcv2 k k' R2-R02 R2-R'02
R2 R2
SVM 0.994 0.990 0.988 1.011 -0.005 -0.005
ERM 0.936 0.935 0.980 1.010 -0.065 -0.068
 
1 Duffy E. M. ; Jorgensen W. L. J. Am. Chem. Soc. 2000, 122, 2878.
2 Cornell W. E. ; Cieplak P. ; Bayly C. I. ; Merz K. M. ; Ferguson D. M. ; Spellmayer D. C. ; Fox T. ; Caldwell J. W. ; Kollman P.A. J. Am. Chem. Soc. 1995, 117, 5179.
3 Graziano G. Can. J. Chem. 2000, 78, 1233.
4 Graziano G. Biophys. Chem. 1999, 82, 69.
5 Graziano G. J. Phys. Chem. B 2000, 104, 9249.
6 Garde S. ; Garcia A. E. ; Pratt L. R. ; Hummer G. Biophys.Chem. 1999, 78, 21.
7 Mintz C. ; Burton K. ; Acree W. E. Jr. ; Abraham M. H. FluidPhase Equilibr. 2007, 258, 191.
8 Chickos, J. S.; Acr
9 ee W. E. Jr. J. Phys. Chem. Ref. Data 2003, 32, 519.
10 Chickos J. S. ; Acree W. E. Jr. J. Phys. Chem. Ref. Data 2002, 31, 537.
11 Borges does Santos R. M. ; Muralha V. S. F. ; Correia C. F. ; Sim?es J. A. M. J. Am. Chem. Soc. 2001, 123, 12670.
12 Laarhoven L. J. J. ; Mulder P. ; Wayner D. D. M. Acc. Chem.Res. 1999, 32, 342.
13 Hansch, C. ; Leo, A. Exploring QSAR: Fundamentals andApplications in Chemistry and Biology, American ChemicalSociety, Washington DC, 1995. doi: 10.1021/jm950902o
14 Bao L. ; Sun Z. R. FEBS Lett. 2002, 521, 109.
15 Belousov A. I. ; Verzakov S. A. ; Von Frese J. Chemom. Intell.Lab. Syst. 2002, 64, 15.
16 Cai Y. D. ; Liu X. J. ; Xu X. B. ; Chou K. C. Comput. Chem. 2002, 26, 293.
17 Morris C. W. ; Autret A. ; Boddy L. Ecol. Model. 2001, 146, 57.
18 Song M. H. ; Breneman C. M. ; Bi J. B. ; Sukumar N. ; Bennett K. P. ; Cramer S. ; Tugcu N. J. Chem. Inf. Comput. Sci. 2002, 42, 1347.
19 Liu H. X. ; Zhang R. S. ; Luan F. ; Yao X. J. ; Liu M. C. ; Hu Z.D. ; Fan B. T. J. Chem. Inf. Comput. Sci. 2003, 43, 900.
20 Liu H. X. ; Zhang R. S. ; Yao X. J. ; Liu M. C. ; Hu Z. D. ; Fan B. T. J. Chem. Inf. Comput. Sci. 2003, 43, 1288.
21 Golmohammadi H. ; Dashtbozorgi Z. ; Acree W. E. Jr. Struct.Chem. 2013, 24, 1799.
22 Golmohammadi H. ; Dashtbozorgi Z. ; Acree W. E. Jr. Phys.Chem. Liq. 2013, 51, 182.
23 Dashtbozorgi Z. ; Golmohammadi H. ; Acree W. E. Jr.Thermochim. Acta 2012, 539, 7.
24 Golmohammadi H. ; Dashtbozorgi Z. ; Acree W. E. Jr. Mol.Inf. 2012, 31, 867.
25 Dashtbozorgi Z. ; Golmohammadi H. ; Acree W. E. Jr. Eur. J.Pharm. Sci. 2012, 47, 421.
26 Mintz C. ; Clark M. ; Burton K. ; Acree W. E. Jr. ; Abraham M.H. QSAR Comb. Sci. 2007, 26, 881.
27 Toubaei A. ; Golmohammadi H. ; Dashtbozorgi Z. ; Acree W.E. Jr. J. Mol. Liq. 2012, 175, 24.
28 Todeschini, R. ; Consonni, V. Molecular Descriptors forChemoinformatics. Wiley VCH: Weinheim, 2009. doi: 10.1002/9783527628766.ch22
29 Hyperchem, re. 4. for Windows, Autodesk, Sansalito, CA, 1995.
30 Todeschini R. ; Consonni V. ; Pavan M. Dragon Software, Milano 2002.
31 Mercader A. G. ; Duchowicz P. R. ; Fernández F. M. ; Castro E.A. J. Chem. Inf. Model. 2011, 51, 1575.
32 MATLAB 7.0, The Mathworks Inc., Natick, MA, USA, 2005, http://www.mathworks.com.
33 Baghban A. ; Ahmadi M. A. ; Pouladi B. ; Amanna B.J. Supercrit. Fluids 2015, 101, 184.
34 Vapnik V. N. ; Lerner A. Autom. Remote Control 1963, 24, 774.
35 Vapnik V. N. ; Chervonenkis A. Y. Autom. Remote Control 1964, 25, 821.
36 Rojas C. ; Duchowicz P. R. ; Tripaldi P. ; Pis Diez R. Chemometr. Intell. Lab. Syst. 2015, 140, 126.
37 Mercader G. ; Duchowicz P. R. ; Fernández F. M. ; Castro E. A. Chemometr. Intell. Lab. Syst. 2008, 92, 138.
38 Gramatica P. QSAR Comb. Sci. 2007, 26, 694.
39 Cao D. S. ; Liang Y. Z. ; Xu Q. S. ; Li H. D. ; Chen X.J. Comput.Chem. 2010, 31, 592.
40 Yan J. ; Huang J. H. ; He M. ; Lu H. B. ; Yang R. ; Kong B. ; Xu Q. S. ; Liang Y. Z. J. Sep. Sci. 2013, 36, 2464.
41 Cao D. S. ; Liang Y. Z. ; Xu Q. S. ; Yun Y. H. ; Li H. D.J. Comput. Aided Mol. Des. 2011, 25, 67.
42 Eriksson L. ; Jaworska J. ; Worth A. P. ; Cronin M. T. ; McDowell R. M. ; Gramatica P. Health Perspect. 2003, 111, 1361.
43 Golbraikh A. ; Shen M. ; Xiao Z. ; Xiao Y. ; Lee K. H. ; Tropsha A. J. Comput. Aided Mol. Des. 2003, 17, 241.
44 Golbraikh A. ; Tropsha A. J. Mol. Graph. Model. 2002, 20, 269.
45 Agrawal V. K. ; Khadikar P.V. Bioorg. Med. Chem. 2001, 911, 3035.
46 Pourbasheer E. ; Riahi S. ; Ganjali M. R. ; Norouzi P.J. Enzyme. Inhib. Med. Chem. 2010, 256, 844.
47 Antipin I. S. ; Arslanov N. A. ; Palyulin V. A. ; Konovalov A. I. ; Zefirov N. S. Dokl. Akad. Nauk. SSSR 1991, 316, 925.
48 Sarkar R. ; Roy A. B. ; Sarkar P. K. Math. Biosci. 1978, 39, 299.
49 Geary R.C. Incorp. Statist. 1954, 5, 15.
50 Moreau G. ; Broto P. Nouv. J. Chim. 1980, 4, 757.
51 Todeschini, R. ; Consonni, V. Handbook of MolecularDescriptors, In: Methods and Principles in MedicinalChemistry; Mannhold, R. , Kubinyi, H. , Timmerman, H. Eds. ; Wiley-VCH: Weinheim, 2000. doi: 10.1002/9783527613106
52 Ma S. ; Lv M. ; Deng F. ; Zhang X. ; Zhai H. ; Lv W. J. Hazard.Mater. 2015, 283, 591.
[1] Chaoxian YAN,Fan YANG,Ruizhi WU,Dagang ZHOU,Xing YANG,Panpan ZHOU. Application of Natural Orbital Fukui Functions and Bonding Reactivity Descriptor in Understanding Bond Formation Mechanisms Underlying [2+4] and [4+2] Cycloadditions of o-Thioquinones with 1, 3-Dienes[J]. Acta Physico-Chimica Sinca, 2018, 34(5): 497-502.
[2] Xiaoqin DING,Junjie DING,Dayu LI,Li PAN,Chengxin PEI. Toxicity Prediction of Organoph Osphorus Chemical Reactivity Compounds Based on Conceptual DFT[J]. Acta Physico-Chimica Sinca, 2018, 34(3): 314-322.
[3] Manas GHARA,Pratim K. CHATTARAJ. Bonding and Reactivity in RB-AsR Systems (R=H, F, OH, CH3, CMe3, CF3, SiF3, BO):Substituent Effects[J]. Acta Physico-Chimica Sinca, 2018, 34(2): 201-207.
[4] Hassan GOLMOHAMMADI,Zahra DASHTBOZORGI,Sajad KHOOSHECHIN. Prediction of Blood-to-Brain Barrier Partitioning of Drugs and Organic Compounds Using a QSPR Approach[J]. Acta Physico-Chimica Sinca, 2017, 33(6): 1160-1170.
[5] HE Bing, LUO Yong, LI Bing-Ke, XUE Ying, YU Luo-Ting, QIU Xiao-Long, YANG Teng-Kuei. Predicting and Virtually Screening Breast Cancer Targeting Protein HEC1 Inhibitors by Molecular Descriptors and Machine Learning Methods[J]. Acta Physico-Chimica Sinca, 2015, 31(9): 1795-1802.
[6] LIU Hai-Chun, LU Shuai, RAN Ting, ZHANG Yan-Min, XU Jin-Xing, XIONG Xiao, XU An-Yang, LU Tao, CHEN Ya-Dong. Accurate Activity Predictions of B-Raf Type II Inhibitors via Molecular Docking and QSAR Methods[J]. Acta Physico-Chimica Sinca, 2015, 31(11): 2191-2206.
[7] LIU Fen, ZOU Jian-Wei, HU Gui-Xiang, JIANG Yong-Jun. Quantitative Structure-Property Relationship Studies on the Adsorption of Aromatic Contaminants by Carbon Nanotubes[J]. Acta Physico-Chimica Sinca, 2014, 30(9): 1616-1624.
[8] SHI Jing-Jie, CHEN Li-Ping, CHEN Wang-Hua. QSPR Models of Compound Viscosity Based on Iterative Self-Organizing Data Analysis Technique and Ant Colony Algorithm[J]. Acta Physico-Chimica Sinca, 2014, 30(5): 803-810.
[9] FU Rong, LU Tian, CHEN Fei-Wu. Comparing Methods for Predicting the Reactive Site of Electrophilic Substitution[J]. Acta Physico-Chimica Sinca, 2014, 30(4): 628-639.
[10] LI Bing-Ke, CONG Yong, TIAN Zhi-Yue, XUE Ying. Predicting and Virtually Screening the Selective Inhibitors of MMP-13 over MMP-1 by Molecular Descriptors and Machine Learning Methods[J]. Acta Physico-Chimica Sinca, 2014, 30(1): 171-182.
[11] HAN Na, YUAN Zhe-Ming, CHEN Yuan, DAI Zhi-Jun, WANG Zhi-Ming. Prediction of HLA-A*0201 Restricted Cytotoxic T Lymphocyte Epitopes Based on High-Dimensional Descriptor Nonlinear Screening[J]. Acta Physico-Chimica Sinca, 2013, 29(09): 1945-1953.
[12] CONG Yong, XUE Ying. Quantitative Structure-Activity Relationship Study of the Non-Nucleoside Inhibitors of HCV NS5B Polymerase by Machine Learning Methods[J]. Acta Physico-Chimica Sinca, 2013, 29(08): 1639-1647.
[13] WANG Zhi-Ming, HAN Na, YUAN Zhe-Ming, WU Zhao-Hua. Feature Selection for High-Dimensional Data Based on Ridge Regression and SVM and Its Application in Peptide QSAR Modeling[J]. Acta Physico-Chimica Sinca, 2013, 29(03): 498-507.
[14] LÜ Wei, XUE Ying, MENG Qing-Wei. Classification Prediction of Inhibitors of H1N1 Neuraminidase by Machine Learning Methods[J]. Acta Physico-Chimica Sinca, 2013, 29(01): 217-223.
[15] SHI Jing-Jie, CHEN Li-Ping, CHEN Wang-Hua, SHI Ning, YANG Hui, XU Wei. Prediction of the Thermal Conductivity of Organic Compounds Using Heuristic and Support Vector Machine Methods[J]. Acta Physico-Chimica Sinca, 2012, 28(12): 2790-2796.