Acta Phys. -Chim. Sin. ›› 2009, Vol. 25 ›› Issue (08): 1587-1592.

• ARTICLE •

A Novel QSAR Model Based on Geostatistics and Support Vector Regression

CHEN Yuan, YUAN Zhe-Ming, ZHOU Wei, XIONG Xing-Yao

1. College of Bio-safety Science and Technology, Hunan AgriculturalUniversity, Changsha 410128, P. R. China|Hunan Provincial Key Laboratory of Crop GermplasmInnovation andUtilization, Hunan AgriculturalUniversity, Changsha 410128, P. R. China
• Received:2009-03-16 Revised:2009-04-15 Published:2009-07-16
• Contact: YUAN Zhe-Ming E-mail:zhmyuan@sina. com

Abstract:

Based on principal component analysis (PCA), geostatistics (GS) and support vector regression (SVR), a novel individual forecasting method for quantitative structure-activity relationship (QSAR)——Weight-PCA-GS-SVR was proposed. The basic principles were as follows: firstly, dimensions were reduced and redundant information from independent descriptors was eliminated using PCA; secondly, the principal components that have no relationship to activity were removed nonlinearly using SVR; thirdly, weighted distances between samples were calculated by the retained principal components; fourthly, a common range was confirmed using high-dimensional geostatistics; lastly, k nearest neighbors of each test sample were found from the training set with their weighted distances shorter than a common range and then the models were constructed and the individual prediction was found to be feasible using SVR. Weight-PCA-GS-SVR optimized the model along the column direction (descriptor) and row direction (sample), and had all the advantages of SVR. It therefore provides a newway to choose k nearest neighbors in the field as well as being a novel weighted method for determining the retained principal components or the retained descriptors. Predicted results from three data sets all verify that the novel method has the highest prediction precision among all reference models and has a remarkable advantage over reported results. Weight-PCA-GS-SVR, therefore, can be widely used in QSAR and other regression prediction fields.

MSC2000:

• O641