Acta Phys. -Chim. Sin. ›› 2014, Vol. 30 ›› Issue (5): 803-810.doi: 10.3866/PKU.WHXB201403181


QSPR Models of Compound Viscosity Based on Iterative Self-Organizing Data Analysis Technique and Ant Colony Algorithm

SHI Jing-Jie, CHEN Li-Ping, CHEN Wang-Hua   

  1. Department of Safety Engineering, School of Chemical Engineering, Nanjing University of Science & Technology, Nanjing 210094, P. R. China
  • Received:2014-01-13 Revised:2014-03-18 Published:2014-04-25
  • Contact: CHEN Li-Ping


The aim of this study was to construct a quantitative structure-property relationship model to identify relationships between the molecular structures and viscosities of 310 compounds, as well as specific structural factors that could affect the viscosities of the compounds. Using an iterative self-organizing data analysis technique, the sample set was preliminarily classified into two sets, including a training set and a test set. The molecular structure descriptors of 310 compounds were calculated using version 2.1 of the Dragon software and subsequently sifted using an ant colony algorithm (ACO), which resulted in the selection of five parameters. Multiple linear regression (MLR) and the support vector machine (SVM) techniques were then used to establish ACO-MLR and ACO-SVMmodels, respectively. The results showed that the performance of the non-linear ACOSVMmodel (correlation coefficient Rtrain2=0.9013, Rtest2=0.9026) was superior to the linearACO-MLRmodel (Rtrain2=0.7680, Rtest2=0.8725). The correlation coefficients between the experimental and predicted values of the ACOMLR and ACO-SVM models for the test set were 0.934 and 0.950, respectively. The predictive properties of the two models were therefore determined to be satisfying. The application domain of the model was also studied using a Williams graph, which demonstrated that the models established in this study provide effective methods for predicting the viscosities of specific compounds based on their molecular structure.

Key words: Viscosity, ISODATA, Ant colony algorithm, Multiple linear regression, Support vector machine