半监督学习算法拉普拉斯支持向量机应用于蛋白质结构类预测

2020-09-02 07:14吴疆董婷蒋平
微型电脑应用 2020年8期
关键词:分类器向量样本

吴疆 董婷 蒋平

摘要:

應用半监督学习方法拉普拉斯支持向量机(Laplace Support Vector Machine, LapSVM)对蛋白质结构类进行预测。首先7个氨基酸理化性质参数作为替代模型将蛋白质序列转换为数字序列,自协方差变换(AutocrossCovariance, AC)用来描述具有一定间隔氨基酸残基之间的相互关系并将数字序列变换为统一长度的向量,构建样本的特征空间。然后在数据集中分别随机挑选20、50、80、110、140、170个样本作为无标签样本构建训练集,一对多分解策略和留一法用来评价LapSVM模型的预报能力。分类器对蛋白质样本类预测正确率为94.12%,与标准支持向量机算法(Support Vector Machine, SVM)方法90.69%的预测精度相比有明显的竞争力。实验结果有效验证了无标签样本的分布信息作为弱规则能有效提升分类器的预报性能。同时提供了一种新颖的思路,应用半监督方法解决全监督学习问题,更小的优化规模,更好的预报能力。

关键词:

半监督学习; 蛋白质结构类; 拉普拉斯支持向量机; 自协方差变换

中图分类号: TP 391

文献标志码: A

Protein Structural Classes Prediction by Using Laplace Support

Vector Machine and Based on Semisupervised Method

WU Jiang1, DONG Ting1, JIANG Ping1,2

(1. Department of Information Engineering ,Yulin University, Yulin, Shanxi  719000, China;

2. School of Computer Science and Technology, Xidian University, Xian, Shanxi 710071, China)

Abstract:

The purpose of the study is to predict protein structural classes by using Laplace support vector machine (LapSVM) which is a novel semisupervised learning method. Firstly, seven amino acid physicochemical properties cited from literature was applied to transform the protein sequences into numeric vectors, and auto covariance (AC) was used in transforming the physicochemical properties of the amino acids of given proteins into features space with the same size, which is suitable for training models. AC focuses on the neighboring effects and the interactions between residues with a certain distance apart in protein sequences. Secondly, 20, 50, 80, 110, 140 and 170 samples were randomly selected as unlabelled samples to construct training datasets, “oneagainstall” strategy and leaveoneout method were employed to estimate the performance. The prediction accuracy 94.12% was obtained, and it is very promising compared with the accuracy 90.69% predicted by Support Vector Machine (SVM). The experimental results proofed that the unlabelled samples input as weak rules can lightly improve the prediction performances, simultaneously, a novel idea is using semisupervised method to solve a supervised learning problem intends to less optimal scale and higher prediction accuracy.

Key words:

semisupervised learning; protein structural class; Laplace support vector machine; auto correlation

猜你喜欢
分类器向量样本
向量的分解
学贯中西(6):阐述ML分类器的工作流程
基于AdaBoost算法的在线连续极限学习机集成算法
一种统计分类方法的学习
直击高考中的用样本估计总体
随机微分方程的样本Lyapunov二次型估计
向量垂直在解析几何中的应用
向量五种“变身” 玩转圆锥曲线
基于支持向量机的测厚仪CS值电压漂移故障判定及处理
基于支持向量机的蛋白质交互界面热点的预测的研究与改进