物理化学学报 >> 2013, Vol. 29 >> Issue (09): 1945-1953.doi: 10.3866/PKU.WHXB201306182

理论与计算化学 上一篇    下一篇

基于高维特征非线性筛选的HLA-A*0201限制性CTL表位预测

韩娜, 袁哲明, 陈渊, 代志军, 王志明   

  1. 湖南农业大学, 湖南省作物种质创新与资源利用重点实验室, 湖南省植物病虫害生物学及防控重点实验室, 长沙 410128
  • 收稿日期:2013-04-24 修回日期:2013-06-14 发布日期:2013-08-28
  • 通讯作者: 袁哲明 E-mail:zhmyuan@sina.com
  • 基金资助:

    湖南省杰出青年科学基金(10JJ1005);教育部博士点基金(20124320110002)项目资助

Prediction of HLA-A*0201 Restricted Cytotoxic T Lymphocyte Epitopes Based on High-Dimensional Descriptor Nonlinear Screening

HAN Na, YUAN Zhe-Ming, CHEN Yuan, DAI Zhi-Jun, WANG Zhi-Ming   

  1. Hunan Provincial Key Laboratory of Crop GermplasmInnovation and Utilization, Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, P. R. China
  • Received:2013-04-24 Revised:2013-06-14 Published:2013-08-28
  • Contact: YUAN Zhe-Ming E-mail:zhmyuan@sina.com
  • Supported by:

    The project was supported by the Science Foundation for Distinguished Young Scholars of Hunan Province, China (10JJ1005) and Specialized Research Fund for the Doctoral Program of Higher Education, China (20124320110002).

摘要:

高活性细胞毒T细胞(CTL)表位鉴定是设计肿瘤疫苗的关键内容.采用天然氨基酸的531个物理化学性质参数表征HLA-A*0201限制性表位9肽, 从531×9个初始描述子出发, 经二元矩阵重排过滤器粗筛和多轮末尾淘汰精细筛选, 获得18个物理化学意义明确的保留描述子. 18个保留描述子主要涉及除1位、5位外各位置残基的疏水性和空间结构特征, 3位残基疏水性对活性影响最大, 且2位、4位、9位残基共占10个保留描述子,支持2位和9位残基为锚点、3位为关键位点以及4位残基为标志链的现有认知. 对18个保留描述子以支持向量回归构建定量序效模型,其拟合、留一法交叉验证决定系数R2Qcv2分别为0.957、0.708; 独立预测决定系数及均方根误差Qext2 、RMSEext分别为0.818、0.366, 明显优于文献报道. 通过对全组合虚拟9肽的预测, 得到了多条预测活性高于已知表位肽的9肽, 可供实验验证. 较全面阐明了特定位置残基对多肽亲和性的影响规律, 为高活性多肽疫苗分子设计提供了切实指导.

关键词: 抗原肽, 定量序效模型, 高维特征, 支持向量回归, 多肽疫苗

Abstract:

Determining highly active epitopes of the cytotoxic T lymphocyte (CTL) is essential for the computational design of peptide vaccines for tumors. In this study, we characterized each residue in the restricted CTL epitopes using 531 physicochemical properties. We selected 18 descriptors with clear meanings from 531×9 descriptors for each peptide of length nine using the binary matrix shuffling filter and worst descriptor elimination multi-round methods. Most of the 18 selected descriptors were the hydrophobic and steric properties of the residues. Among the 18 descriptors, 10 descriptors were related to the second, fourth, and ninth residues, which is consistent with the known facts. We then constructed a support-vectorregression-based quantitative sequence activity model (QSAM) using 18 selected descriptors. The values of the accuracies of fitting (R2), leave-one-out cross validation (Qcv2), and extra-sample prediction (Qext2, RMSEext) were 0.957, 0.708, 0.818, and 0.366, respectively. These results, which were tested on HLAA* 0201 data, showed that our QSAM was superior to those reported in the literature. Finally, we predicted the activities of peptides of all possible combinations of the nine residues. Several peptides were found with higher affinity activities than those of previously reported epitopes. Our study improves the understanding of the relationship between the compositional residues and the affinity activity of the peptide, which provides a valuable guideline for the design of highly active peptide vaccines. Our predicted high affinity peptides are potential candidates for further experimental verification.

Key words: Antigenic peptide, Quantitative sequence activity model, High-dimensional descriptor, Support vector regression, Peptide vaccine