物理化学学报 >> 2014, Vol. 30 >> Issue (5): 803-810.doi: 10.3866/PKU.WHXB201403181

理论与计算化学 上一篇    下一篇

基于迭代自组织数据分析算法与蚁群算法建立有机物黏度的QSPR模型

时静洁, 陈利平, 陈网桦   

  1. 南京理工大学化工学院安全工程系, 南京210094
  • 收稿日期:2014-01-13 修回日期:2014-03-18 发布日期:2014-04-25
  • 通讯作者: 陈利平 E-mail:clp2005@hotmail.com

QSPR Models of Compound Viscosity Based on Iterative Self-Organizing Data Analysis Technique and Ant Colony Algorithm

SHI Jing-Jie, CHEN Li-Ping, CHEN Wang-Hua   

  1. Department of Safety Engineering, School of Chemical Engineering, Nanjing University of Science & Technology, Nanjing 210094, P. R. China
  • Received:2014-01-13 Revised:2014-03-18 Published:2014-04-25
  • Contact: CHEN Li-Ping E-mail:clp2005@hotmail.com

摘要:

为了构建310个有机物分子结构与其黏度之间的定量结构-性质关系(QSPR)模型,探讨影响有机物液体黏度的结构因素,首先运用迭代自组织数据分析技术(ISODATA)将样本集初步分类,划分为训练集和测试集,进而应用DRAGON2.1 软件计算310个有机物分子的分子结构描述符,以蚁群算法(ACO)筛选分子描述符,得到5 个参数,随后分别采用多元线性回归法(MLR)和支持向量机法(SVM)建立ACO-MLR模型和ACOSVM模型. 结果表明,非线性ACO-SVM 模型(相关系数Rtrain2=0.9013,Rtest2=0.9026)的性能优于线性ACOMLR模型(Rtrain2=0.7680,Rtest2=0.8725). ACO-MLR模型和ACO-SVM模型对测试集所得预测值与实验值的相关系数分别为0.934和0.950,预测效果令人满意. 本文应用Williams 图对模型的应用域进行了一定的研究,所建立的模型为工程上提供了一种根据分子结构预测有机物黏度的有效方法.

关键词: 黏度, ISODATA, 蚁群算法, 多元线性回归, 支持向量机

Abstract:

The aim of this study was to construct a quantitative structure-property relationship model to identify relationships between the molecular structures and viscosities of 310 compounds, as well as specific structural factors that could affect the viscosities of the compounds. Using an iterative self-organizing data analysis technique, the sample set was preliminarily classified into two sets, including a training set and a test set. The molecular structure descriptors of 310 compounds were calculated using version 2.1 of the Dragon software and subsequently sifted using an ant colony algorithm (ACO), which resulted in the selection of five parameters. Multiple linear regression (MLR) and the support vector machine (SVM) techniques were then used to establish ACO-MLR and ACO-SVMmodels, respectively. The results showed that the performance of the non-linear ACOSVMmodel (correlation coefficient Rtrain2=0.9013, Rtest2=0.9026) was superior to the linearACO-MLRmodel (Rtrain2=0.7680, Rtest2=0.8725). The correlation coefficients between the experimental and predicted values of the ACOMLR and ACO-SVM models for the test set were 0.934 and 0.950, respectively. The predictive properties of the two models were therefore determined to be satisfying. The application domain of the model was also studied using a Williams graph, which demonstrated that the models established in this study provide effective methods for predicting the viscosities of specific compounds based on their molecular structure.

Key words: Viscosity, ISODATA, Ant colony algorithm, Multiple linear regression, Support vector machine