Acta Phys. -Chim. Sin. ›› 2012, Vol. 28 ›› Issue (03): 541-546.doi: 10.3866/PKU.WHXB201112281


MOLMAP Descriptor and Its Application to Mutagenicity Prediction

ZHANG Qing-You1, LONG Hai-Lin1, FENG Xiu-Lin1, SUO Jing-Jie1, ZHANG Dan-Dan1, LI Jing-Ya1, XU Li-Zhuang2, XU Lu3   

  1. 1. Institute of Environmental and Analytical Sciences, College of Chemistry and Chemical Engineering, Henan University, Kaifeng 475004, Henan Province, P. R. China;
    2. Renmin Hospital of Shenzhen, Shenzhen 518020, Guangdong Province, P. R. China;
    3. Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, P. R. China
  • Received:2011-10-27 Revised:2011-12-19 Published:2012-02-23
  • Contact: XU Lu
  • Supported by:

    The project was supported by the National Natural Science Foundation of China (20875022), Scientific Research Foundation for the Returned Overseas Chinese Scholars, Ministry of Education of China (2009(1001)), and International Science and Technology Cooperation of Henan Province, China (114300510009).

Abstract: The molecular mapping of atom-level properties (MOLMAP) descriptor was generated on the basis of chemical bond descriptors of a molecule by Kohonen self-organizing map with a specific algorithm. The bond descriptors were composed of the physiochemical properties of the chemical bond, such as the difference of the charges between the two atoms and topological properties, such as the number of hetero-atoms connected to the two atoms. In this paper, the MOLMAP descriptors were used to predict the mutagenicity of 4075 organic substances (2305 mutagens and 1770 nonmutagens in Ames test). Random forests were used to construct mathematical models with three kinds of descriptors: (1) MOLMAP descriptors of different size; (2) global molecular descriptors; (3) the combination of MOLMAP descriptors and global molecular descriptors. The correct prediction percentage of out of bag (OOB) cross-validation of the whole data set reached 85.4%. To test the stability of the prediction model, it was used to predict the properties of a test set that was composed of 472 compounds collected from another database. The percentage of correct prediction of the test set was 86.7%. The prediction results were improved compared with the results of previous work.

Key words: MOLMAP descriptor, Kohonen self-organizing map, Random forest, Mutagenicity, Structure-activity relationship