更新时间:10-07 上传会员:学大教育
分类:计算机信息 论文字数:15792 需要金币:1000个
摘要:软件模块缺陷预测技术在分析软件质量、平衡软件成本方面起着重要的作用。2005年以来,支持向量机(SVM)开始应用到软件模块缺陷预测领域。由于软件模块缺陷度量数据集存在不平衡和噪声等问题,标准的支持向量机建立的预测模型的预测结果并不理想,本文主要对面向软件模块缺陷的支持向量机学习算法做了较为深入的研究,以此提高预测性能。本文的主要工作如下:
1. 对一种已有的基于模糊支持向量机的类不平衡学习方法(FSVM_CIL)进行研究。并将其应用到了软件模块缺陷预测问题上。与标准的支持机相比,FSVM_CIL在分类器性能上有所改进。
2. 提出基于模糊支持向量机和欠抽样的类不平衡学习算法(FSVM_CIL_RUS)。该算法将FSCM_CIL算法和随机欠抽样的算法相结合,在利用FSVM_CIL算法建立缺陷预测模型之前,先采用随机欠抽样的技术,平衡训练数据集的正负类分布。在软件模块缺陷度量数据集上进行实验,结果表明FSVM_CIL_RUS算法能够有效地提高预测性能。
3. 提出基于模糊支持向量机的类不平衡集成学习算法(FSVM_CIL_RBBag)。该算法将FSVM_CIL算法和集成学习方法相结合,利用FSVM_CIL建立基分类器并进行有效的集成,以此提高预测性能。在软件模块缺陷度量数据集上进行实验,结果表明FSVM_CIL_RBBag算法是有效可行的。
关键字:支持向量机,缺陷预测,数据抽样,集成学习
Abstract:Defect prediction techniques for software modules play an important role in software quality analysis and balancing software cost. Since 2005, support vector machine (SVM) has been applied into the area of defect prediction for software modules. Due to the software modules defect metric datasets have the characteristics, such as class imbalance and noise, the prediction models based on the normal SVM can’t get satisfactory results. Therefore, in this paper, we make a relatively in-depth study on support vector machine for predicting software module defects. The main works of this paper are as follows.
1. Study the existing Fuzzy Support Vector Machines for Class Imbalance Learning (FSVM_CIL) algorithm, and use it to build software module defect prediction models. Compared with normal SVM, FSVM_CIL has a best result on the prediction performance.
2. Propose an algorithm called FSCM_CIL_RUS. This algorithm combines the FSVM_CIL algorithm with random under sampling algorithm. Before building software module defect prediction models using FSVM_CIL, we balance the datasets using random under sampling. Experimental results on two software module defect metrics datasets show the effectiveness of the newly proposed algorithm.
3. Propose an ensemble algorithm called FSVM_CIL_RBBag. This algorithm combines the FSVM_CIL algorithm with roughly balanced bagging algorithm. Using FSVM_CIL algorithm to build base classifiers, and then we ensemble the base classifiers to improve the prediction performance. Experimental results on two software module defect metrics datasets validate the performance of proposed algorithm.
Keywords: support vector machine (SVM), defect prediction, data sampling, Ensemble Learning