物理化学学报 >> 2012, Vol. 28 >> Issue (03): 541-546.doi: 10.3866/PKU.WHXB201112281

理论与计算化学 上一篇    下一篇

MOLMAP指数及其在变异性预测中的应用

张庆友1, 龙海林1, 冯秀林1, 索净洁1, 张丹丹1, 李静亚1, 许力壮2, 许禄3   

  1. 1. 河南大学化学化工学院环境与分析科学研究所, 河南开封 475004;
    2. 深圳市人民医院, 广东深圳 518020;
    3. 中国科学院长春应用化学研究所, 长春 130022
  • 收稿日期:2011-10-27 修回日期:2011-12-19 发布日期:2012-02-23
  • 通讯作者: 许禄 E-mail:luxu@ciac.jl.cn
  • 基金资助:

    国家自然科学基金(20875022), 教育部留学回国人员科研启动基金(2009(1001))及河南省国际科技合作项目(114300510009)资助

MOLMAP Descriptor and Its Application to Mutagenicity Prediction

ZHANG Qing-You1, LONG Hai-Lin1, FENG Xiu-Lin1, SUO Jing-Jie1, ZHANG Dan-Dan1, LI Jing-Ya1, XU Li-Zhuang2, XU Lu3   

  1. 1. Institute of Environmental and Analytical Sciences, College of Chemistry and Chemical Engineering, Henan University, Kaifeng 475004, Henan Province, P. R. China;
    2. Renmin Hospital of Shenzhen, Shenzhen 518020, Guangdong Province, P. R. China;
    3. Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, P. R. China
  • Received:2011-10-27 Revised:2011-12-19 Published:2012-02-23
  • Contact: XU Lu E-mail:luxu@ciac.jl.cn
  • Supported by:

    The project was supported by the National Natural Science Foundation of China (20875022), Scientific Research Foundation for the Returned Overseas Chinese Scholars, Ministry of Education of China (2009(1001)), and International Science and Technology Cooperation of Henan Province, China (114300510009).

摘要: 分子映射(MOLMAP)指数是以分子的化学键描述符为基础, 通过Kohonen自组织映射依据一定的算法而衍生. 化学键描述符是由化学键的物理化学性质, 如两端原子的电荷差和拓扑性质, 键连杂原子数量等所组成. 本文将分子映射指数应用于4075个有机物质(Ames实验结果: 2305个结构有诱变性, 1770个结构无诱变性)的变异性预测. 通过随机森林, 分别采用三种类型的指数建立模型: (1) 采用不同维数的分子映射指数; (2)采用全局分子描述符; (3) 将分子映射指数与全局分子描述符相结合. 整个数据集的集外(out-of-bag)交叉验证的正确预测率达到85.4%. 为了检验模型的稳定性, 采用所建模型预测源于另一数据库的472 个化合物, 正确预测率为86.7%, 与此前的研究相比, 两个预测结果均有所提高.

关键词: 分子映射指数, Kohonen自组织映射, 随机森林, 诱变性, 结构-活性关系

Abstract: The molecular mapping of atom-level properties (MOLMAP) descriptor was generated on the basis of chemical bond descriptors of a molecule by Kohonen self-organizing map with a specific algorithm. The bond descriptors were composed of the physiochemical properties of the chemical bond, such as the difference of the charges between the two atoms and topological properties, such as the number of hetero-atoms connected to the two atoms. In this paper, the MOLMAP descriptors were used to predict the mutagenicity of 4075 organic substances (2305 mutagens and 1770 nonmutagens in Ames test). Random forests were used to construct mathematical models with three kinds of descriptors: (1) MOLMAP descriptors of different size; (2) global molecular descriptors; (3) the combination of MOLMAP descriptors and global molecular descriptors. The correct prediction percentage of out of bag (OOB) cross-validation of the whole data set reached 85.4%. To test the stability of the prediction model, it was used to predict the properties of a test set that was composed of 472 compounds collected from another database. The percentage of correct prediction of the test set was 86.7%. The prediction results were improved compared with the results of previous work.

Key words: MOLMAP descriptor, Kohonen self-organizing map, Random forest, Mutagenicity, Structure-activity relationship

MSC2000: 

  • O641