Please wait a minute...
物理化学学报  2017, Vol. 33 Issue (5): 918-926    DOI: 10.3866/PKU.WHXB201701163
论文     
Developing a Support Vector Machine Based QSPR Model to PredictGas-to-Benzene Solvation Enthalpy of Organic Compounds
GOLMOHAMMADI Hassan1,*(),DASHTBOZORGI Zahra2,KHOOSHECHIN Sajad2
1 Young Researchers and Elite Club, Yadegar-e-Imam Khomeini (RAH) Shahr-e-Rey Branch, Islamic Azad University, Tehran, Iran
2 Young Researchers and Elite Club, Central Tehran Branch, Islamic Azad University, Tehran, Iran
Developing a Support Vector Machine Based QSPR Model to PredictGas-to-Benzene Solvation Enthalpy of Organic Compounds
Hassan GOLMOHAMMADI1,*(),Zahra DASHTBOZORGI2,Sajad KHOOSHECHIN2
1 Young Researchers and Elite Club, Yadegar-e-Imam Khomeini (RAH) Shahr-e-Rey Branch, Islamic Azad University, Tehran, Iran
2 Young Researchers and Elite Club, Central Tehran Branch, Islamic Azad University, Tehran, Iran
 全文: PDF(1991 KB)   HTML 输出: BibTeX | EndNote (RIS) | Supporting Info
摘要:

The purpose of this paper is to present a novel way to building quantitative structure-propertyrelationship (QSPR) models for predicting the gas-to-benzene solvation enthalpy (ΔHSolv) of 158 organiccompounds based on molecular descriptors calculated from the structure alone. Different kinds of descriptorswere calculated for each compounds using dragon package. The variable selection technique of enhancedreplacement method (ERM) was employed to select optimal subset of descriptors. Our investigation revealsthat the dependence of physico-chemical properties on solvation enthalpy is a nonlinear observable fact andthat ERM method is unable to model the solvation enthalpy accurately. The standard error value of predictionset for support vector machine (SVM) is 1.681 kJ·mol-1 while it is 4.624 kJ·mol-1 for ERM. The resultsestablished that the calculated ΔHSolv values by SVM were in good agreement with the experimental ones, andthe performances of the SVM models were superior to those obtained by ERM one. This indicates that SVMcan be used as an alternative modeling tool for QSPR studies.

关键词: Quantitative structure-property relationshipGas-to-benzene solvation enthalpyDescriptorEnhanced replacement methodSupport vector machine    
Abstract:

The purpose of this paper is to present a novel way to building quantitative structure-propertyrelationship (QSPR) models for predicting the gas-to-benzene solvation enthalpy (ΔHSolv) of 158 organiccompounds based on molecular descriptors calculated from the structure alone. Different kinds of descriptorswere calculated for each compounds using dragon package. The variable selection technique of enhancedreplacement method (ERM) was employed to select optimal subset of descriptors. Our investigation revealsthat the dependence of physico-chemical properties on solvation enthalpy is a nonlinear observable fact andthat ERM method is unable to model the solvation enthalpy accurately. The standard error value of predictionset for support vector machine (SVM) is 1.681 kJ·mol-1 while it is 4.624 kJ·mol-1 for ERM. The resultsestablished that the calculated ΔHSolv values by SVM were in good agreement with the experimental ones, andthe performances of the SVM models were superior to those obtained by ERM one. This indicates that SVMcan be used as an alternative modeling tool for QSPR studies.

Key words: Quantitative structure-property relationship    Gas-to-benzene solvation enthalpy    Descriptor    Enhanced replacement method    Support vector machine
收稿日期: 2016-12-13 出版日期: 2017-01-16
通讯作者: GOLMOHAMMADI Hassan     E-mail: hassan.gol@gmail.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
GOLMOHAMMADI Hassan
DASHTBOZORGI Zahra
KHOOSHECHIN Sajad

引用本文:

GOLMOHAMMADI Hassan,DASHTBOZORGI Zahra,KHOOSHECHIN Sajad. Developing a Support Vector Machine Based QSPR Model to PredictGas-to-Benzene Solvation Enthalpy of Organic Compounds[J]. 物理化学学报, 2017, 33(5): 918-926.

Hassan GOLMOHAMMADI,Zahra DASHTBOZORGI,Sajad KHOOSHECHIN. Developing a Support Vector Machine Based QSPR Model to PredictGas-to-Benzene Solvation Enthalpy of Organic Compounds. Acta Physico-Chimica Sinca, 2017, 33(5): 918-926.

链接本文:

http://www.whxb.pku.edu.cn/CN/10.3866/PKU.WHXB201701163        http://www.whxb.pku.edu.cn/CN/Y2017/V33/I5/918

Fig 1  Results of diversity analysis
Descriptor Notation Coefficient Mean effect VIF
Complementary information content (neighborhood symmetry of 1-order) CIC1 3.358 8.403 2.225
Solvation connectivity index chi-1 X1Sol -3.473 -12.326 2.197
R maximal autocorrelation of lag 1/weighted by atomic Sanderson electronegativities R1e+ -29.550 -4.529 1.560
Geary autocorrelation-lag 1/weighted by atomic Sanderson electronegativities GATS1e 5.925 6.129 1.657
Constant -13.044
Table 1  Specification of descriptors selected by the enhanced replacement method (ERM)
Fig 2  Plot of descriptor′s mean effects
CIC1 X1Sol R1e+ GATS1e
CIC1 1 0.626 0.308 -0.462
X1Sol 1 -0.130 -0.462
R1e+ 1 -0.430
GATS1e 1
Table 2  Correlation matrix for descriptors employed in this work
Fig 3  Gamma versus RMS error on LOO cross-validation [C = 100, ε = 0.1]
Fig 4  Epsilon versus RMS error on LOO cross-validation [C = 100, γ = 0.5]
Fig 5  Capacity factor versus RMS error on LOO cross-validation [γ = 0.5, ε = 0.06]
ModelSEtSEpRtRpFtFp
ERM3.5034.6240.9670.9651708493
SVM1.0671.6810.9970.995195713964
Table 3  Statistical parameters obtained using the ERM and SVM models
Fig 6  Plot of SVM calculated versus experimental gas-to-benzene solvation enthalpy
Fig 7  Plot of SVM residual versus experimental values of gas-to-benzene solvation enthalpy
Fig 8  Williams plot of standardized residuals versus leverages of descriptors matrix by SVM model The threshold leverage value h* = 0.125.
R2 Rcv2 k k' R2-R02 R2-R'02
R2 R2
SVM 0.994 0.990 0.988 1.011 -0.005 -0.005
ERM 0.936 0.935 0.980 1.010 -0.065 -0.068
Table 4  Statistical criteria of external validation (prediction set) of the proposed QSPR models
1 Duffy E. M. ; Jorgensen W. L. J. Am. Chem. Soc. 2000, 122, 2878.
doi: 10.1021/ja993663t
2 Cornell W. E. ; Cieplak P. ; Bayly C. I. ; Merz K. M. ; Ferguson D. M. ; Spellmayer D. C. ; Fox T. ; Caldwell J. W. ; Kollman P.A. J. Am. Chem. Soc. 1995, 117, 5179.
doi: 10.1021/ja00124a002
3 Graziano G. Can. J. Chem. 2000, 78, 1233.
doi: 10.1139/v00-125
4 Graziano G. Biophys. Chem. 1999, 82, 69.
doi: 10.1016/S0301-4622(99)00063-0
5 Graziano G. J. Phys. Chem. B 2000, 104, 9249.
doi: 10.1016/S0301-4622(99)00018-6
6 Garde S. ; Garcia A. E. ; Pratt L. R. ; Hummer G. Biophys.Chem. 1999, 78, 21.
doi: 10.1016/j.fluid.2007.06.016
7 Mintz C. ; Burton K. ; Acree W. E. Jr. ; Abraham M. H. FluidPhase Equilibr. 2007, 258, 191.
doi: 10.1016/j.fluid.2007.06.016
8 Chickos, J. S.; Acr
9 ee W. E. Jr. J. Phys. Chem. Ref. Data 2003, 32, 519.
doi: 10.1063/1.1529214
10 Chickos J. S. ; Acree W. E. Jr. J. Phys. Chem. Ref. Data 2002, 31, 537.
doi: 10.1063/1.1475333
11 Borges does Santos R. M. ; Muralha V. S. F. ; Correia C. F. ; Sim?es J. A. M. J. Am. Chem. Soc. 2001, 123, 12670.
doi: 10.1021/ja010703w
12 Laarhoven L. J. J. ; Mulder P. ; Wayner D. D. M. Acc. Chem.Res. 1999, 32, 342.
doi: 10.1021/ar9703443
13 Hansch, C. ; Leo, A. Exploring QSAR: Fundamentals andApplications in Chemistry and Biology, American ChemicalSociety, Washington DC, 1995. doi: 10.1021/jm950902o
14 Bao L. ; Sun Z. R. FEBS Lett. 2002, 521, 109.
doi: 10.1016/S0014-5793(02)02835-1
15 Belousov A. I. ; Verzakov S. A. ; Von Frese J. Chemom. Intell.Lab. Syst. 2002, 64, 15.
doi: 10.1016/S0169-7439(02)00046-1
16 Cai Y. D. ; Liu X. J. ; Xu X. B. ; Chou K. C. Comput. Chem. 2002, 26, 293.
doi: 10.1016/S0097-8485(01)00113-9
17 Morris C. W. ; Autret A. ; Boddy L. Ecol. Model. 2001, 146, 57.
doi: 10.1016/S0304-3800(01)00296-4
18 Song M. H. ; Breneman C. M. ; Bi J. B. ; Sukumar N. ; Bennett K. P. ; Cramer S. ; Tugcu N. J. Chem. Inf. Comput. Sci. 2002, 42, 1347.
doi: 10.1021/ci025580t
19 Liu H. X. ; Zhang R. S. ; Luan F. ; Yao X. J. ; Liu M. C. ; Hu Z.D. ; Fan B. T. J. Chem. Inf. Comput. Sci. 2003, 43, 900.
doi: 10.1021/ci0256438
20 Liu H. X. ; Zhang R. S. ; Yao X. J. ; Liu M. C. ; Hu Z. D. ; Fan B. T. J. Chem. Inf. Comput. Sci. 2003, 43, 1288.
doi: 10.1021/ci0340355
21 Golmohammadi H. ; Dashtbozorgi Z. ; Acree W. E. Jr. Struct.Chem. 2013, 24, 1799.
doi: 10.1007/s11224-013-0222-4
22 Golmohammadi H. ; Dashtbozorgi Z. ; Acree W. E. Jr. Phys.Chem. Liq. 2013, 51, 182.
doi: 10.1080/00319104.2012.708932
23 Dashtbozorgi Z. ; Golmohammadi H. ; Acree W. E. Jr.Thermochim. Acta 2012, 539, 7.
doi: 10.1016/j.tca.2012.03.017
24 Golmohammadi H. ; Dashtbozorgi Z. ; Acree W. E. Jr. Mol.Inf. 2012, 31, 867.
doi: 10.1002/minf.201200091
25 Dashtbozorgi Z. ; Golmohammadi H. ; Acree W. E. Jr. Eur. J.Pharm. Sci. 2012, 47, 421.
doi: 10.1016/j.ejps.2012.06.021
26 Mintz C. ; Clark M. ; Burton K. ; Acree W. E. Jr. ; Abraham M.H. QSAR Comb. Sci. 2007, 26, 881.
doi: 10.1002/qsar.200630152
27 Toubaei A. ; Golmohammadi H. ; Dashtbozorgi Z. ; Acree W.E. Jr. J. Mol. Liq. 2012, 175, 24.
doi: 10.1016/j.molliq.2012.08.006
28 Todeschini, R. ; Consonni, V. Molecular Descriptors forChemoinformatics. Wiley VCH: Weinheim, 2009. doi: 10.1002/9783527628766.ch22
29 Hyperchem, re. 4. for Windows, Autodesk, Sansalito, CA, 1995.
30 Todeschini R. ; Consonni V. ; Pavan M. Dragon Software, Milano 2002.
31 Mercader A. G. ; Duchowicz P. R. ; Fernández F. M. ; Castro E.A. J. Chem. Inf. Model. 2011, 51, 1575.
doi: 10.1021/ci200079b
32 MATLAB 7.0, The Mathworks Inc., Natick, MA, USA, 2005, http://www.mathworks.com.
33 Baghban A. ; Ahmadi M. A. ; Pouladi B. ; Amanna B.J. Supercrit. Fluids 2015, 101, 184.
doi: 10.1016/j.supflu.2015.03.004
34 Vapnik V. N. ; Lerner A. Autom. Remote Control 1963, 24, 774.
35 Vapnik V. N. ; Chervonenkis A. Y. Autom. Remote Control 1964, 25, 821.
36 Rojas C. ; Duchowicz P. R. ; Tripaldi P. ; Pis Diez R. Chemometr. Intell. Lab. Syst. 2015, 140, 126.
doi: 10.1016/j.chemolab.2014.09.020
37 Mercader G. ; Duchowicz P. R. ; Fernández F. M. ; Castro E. A. Chemometr. Intell. Lab. Syst. 2008, 92, 138.
doi: 10.1016/j.chemolab.2008.02.005
38 Gramatica P. QSAR Comb. Sci. 2007, 26, 694.
doi: 10.1002/qsar.200610151
39 Cao D. S. ; Liang Y. Z. ; Xu Q. S. ; Li H. D. ; Chen X.J. Comput.Chem. 2010, 31, 592.
doi: 10.1002/jcc.21351
40 Yan J. ; Huang J. H. ; He M. ; Lu H. B. ; Yang R. ; Kong B. ; Xu Q. S. ; Liang Y. Z. J. Sep. Sci. 2013, 36, 2464.
doi: 10.1002/jssc.201300254
41 Cao D. S. ; Liang Y. Z. ; Xu Q. S. ; Yun Y. H. ; Li H. D.J. Comput. Aided Mol. Des. 2011, 25, 67.
doi: 10.1007/s10822-010-9401-1
42 Eriksson L. ; Jaworska J. ; Worth A. P. ; Cronin M. T. ; McDowell R. M. ; Gramatica P. Health Perspect. 2003, 111, 1361.
doi: 10.1289/ehp.5758
43 Golbraikh A. ; Shen M. ; Xiao Z. ; Xiao Y. ; Lee K. H. ; Tropsha A. J. Comput. Aided Mol. Des. 2003, 17, 241.
doi: 10.1023/A:1025386326946
44 Golbraikh A. ; Tropsha A. J. Mol. Graph. Model. 2002, 20, 269.
doi: 10.1016/S1093-3263(01)00123-1
45 Agrawal V. K. ; Khadikar P.V. Bioorg. Med. Chem. 2001, 911, 3035.
doi: 10.1016/S0968-0896(01)00211-5
46 Pourbasheer E. ; Riahi S. ; Ganjali M. R. ; Norouzi P.J. Enzyme. Inhib. Med. Chem. 2010, 256, 844.
doi: 10.3109/14756361003757893
47 Antipin I. S. ; Arslanov N. A. ; Palyulin V. A. ; Konovalov A. I. ; Zefirov N. S. Dokl. Akad. Nauk. SSSR 1991, 316, 925.
48 Sarkar R. ; Roy A. B. ; Sarkar P. K. Math. Biosci. 1978, 39, 299.
doi: 10.1016/0025-5564(78)90060-3
49 Geary R.C. Incorp. Statist. 1954, 5, 15.
doi: 10.2307/2986645
50 Moreau G. ; Broto P. Nouv. J. Chim. 1980, 4, 757.
51 Todeschini, R. ; Consonni, V. Handbook of MolecularDescriptors, In: Methods and Principles in MedicinalChemistry; Mannhold, R. , Kubinyi, H. , Timmerman, H. Eds. ; Wiley-VCH: Weinheim, 2000. doi: 10.1002/9783527613106
52 Ma S. ; Lv M. ; Deng F. ; Zhang X. ; Zhai H. ; Lv W. J. Hazard.Mater. 2015, 283, 591.
doi: 10.1016/j.jhazmat.2014.10.011
[1] GOLMOHAMMADI Hassan,DASHTBOZORGI Zahra,KHOOSHECHIN Sajad. Prediction of Blood-to-Brain Barrier Partitioning of Drugs and Organic Compounds Using a QSPR Approach[J]. 物理化学学报, 2017, 33(6): 1160-1170.
[2] MOHAMED IMRAN P. K., SUBRAMANI K.. L-鸟氨酸及其取代衍生物的结构和性质分析[J]. 物理化学学报, 2009, 25(11): 2357-2365.