物理化学学报 >> 2012, Vol. 28 >> Issue (12): 2790-2796.doi: 10.3866/PKU.WHXB201209273

理论与计算化学 上一篇    下一篇

基于启发式方法和支持向量机方法预测有机物的热导率

时静洁1,2, 陈利平1, 陈网桦1, 石宁2, 杨惠1, 徐伟2   

  1. 1 南京理工大学化工学院安全工程系, 南京 210094;
    2 化学品安全控制国家重点实验室, 山东 青岛 266071
  • 收稿日期:2012-07-16 修回日期:2012-09-10 发布日期:2012-11-14
  • 通讯作者: 陈网桦 E-mail:chenwh_nust@163.com
  • 基金资助:

    国家重点基础研究发展规划项目(973) (2010CB735510)资助

Prediction of the Thermal Conductivity of Organic Compounds Using Heuristic and Support Vector Machine Methods

SHI Jing-Jie1,2, CHEN Li-Ping1, CHEN Wang-Hua1, SHI Ning2, YANG Hui1, XU Wei2   

  1. 1 Department of Safety Engineering, School of Chemical Engineering, Nanjing University of Science & Technology, Nanjing 210094, P R China;
    2 State Key Laboratory of Chemical Safety and Control, Qingdao 266071, Shandong Province, P R China
  • Received:2012-07-16 Revised:2012-09-10 Published:2012-11-14
  • Supported by:

    The project was supported by the National Key Basic Research Program of China (973) (2010CB735510).

摘要:

构建147个有机物分子结构与其热导率值之间的定量结构-性质关系(QSPR)模型, 探讨影响有机物热导率的结构因素. 以147个化合物作为样本集, 随机选择118个作为训练集, 29个作为测试集. 应用CODESSA软件计算了组成、拓扑、几何、静电和量子化学等描述符, 通过启发式方法(HM)筛选得到5个结构参数并建立线性回归模型; 用所选5个结构参数作为支持向量机(SVM)的输入, 建立非线性的支持向量机回归模型. 预测结果表明: 支持向量机回归模型的性能(复相关系数R2=0.9240)虽略低于启发式回归模型的性能(R2=0.9267), 但是支持向量机方法预测性能(R2=0.9682)高于启发式方法的预测性能(R2=0.9574), 对于QSPR模型来说, 预测性能更重要. 因此, 总体来说支持向量机方法优于启发式方法. 支持向量机方法和启发式方法的提出为工程上提供了一种根据分子结构预测有机物热导率的新方法.

关键词: 启发式方法, 支持向量机, 热导率, 预测, 定量结构-性质关系

Abstract:

To build the quantitative structure-property relationship (QSPR) between the molecular structures and the thermal conductivities of 147 organic compounds and investigate which structural factors influence the thermal conductivity of organic molecules, the topological, constitutional, geometrical, electrostatic, quantum-chemical, and thermodynamic descriptors of the compounds were calculated using the CODESSA software package, where these descriptors were pre-selected by the heuristic method (HM). The dataset of 147 organic compounds was randomly divided into a training set (118), and a test set (29). As a result, a five-descriptor linear model was constructed to describe the relationship between the molecular structures and the thermal conductivities. In addition, a non-linear regression model was built based on the support vector machine (SVM) with the same five descriptors. It was concluded that, although the fitting performance of the SVM model (squared correlation coefficient, R2=0.9240) was slightly worse than that of the HM model (R2=0.9267), the predictive performance of the SVM model (R2=0.9682) was better than that of the HM model (R2=0.9574). As the predictive parameter is more important than the fitting parameter, it can be seen that the SVM model is superior to the HM model. The proposed methods (SVM and HM) can be successfully used to predict the thermal conductivity of organic compounds with pre-selected theoretical descriptors, which can be directly calculated solely from the molecular structure.

Key words: Heuristic method, Support vector machine, Thermal conductivity, Prediction, QSPR