Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/15803
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKAMAL, SHINE-
dc.date.accessioned2017-07-14T12:01:34Z-
dc.date.available2017-07-14T12:01:34Z-
dc.date.issued2017-07-
dc.identifier.urihttp://dspace.dtu.ac.in:8080/jspui/handle/repository/15803-
dc.description.abstractData imbalancing is becoming a common problem to tackle in different fields like, defect prediction, change prediction, oil spills, medical diagnose etc. Various methods have been developed to handle imbalanced datasets in order to improve accuracy of the prediction models. Software defect prediction is important to identify defects in the early phases of software development life cycle. This early identification and thereby removal of software defects is crucial to yield a cost-effective and good quality software product. Though, previous studies have successfully used machine learning techniques for software defect prediction, these techniques yield biased results when applied on imbalanced data sets. An imbalanced data set has non-uniform class distribution with very few instances of a specific class as compared to that of the other class. Use of imbalanced data sets leads to off-target predictions of the minority class, which is generally considered to be more important than the majority class. Thus, handling imbalanced data effectively is crucial for successful development of a competent defect prediction model. Many studies have been carried out in the field of defect prediction for imbalanced datasets but most of them uses SMOTE oversampling method to handle the imbalanced data problem. There are many other oversampling methods which help to deal with imbalancing problem and are still unexplored particularly in the field of software defect prediction. This study evaluates the effectiveness of machine learning classifiers for software defect prediction on twelve imbalanced NASA datasets by application of nine sampling methods. We also propose a modified version (SPIDER3) of the existing oversampling method SPIDER2 and compare it with the original one. Furthermore, the work evaluates the performance of MetaCost learners on imbalanced datasets. The results show improvement in the prediction capability of machine learning classifiers with the use of sampling methods. MetaCost learners improves the sensitivity and helps to predict defects effectively. Moreover, they advocate the applicability of modified version of SPIDER2 oversampling method as it outperforms the original SPIDER2 method in majority of the cases.en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTD-2774;-
dc.subjectDATA IMBALANCINGen_US
dc.subjectSAMPLING METHODSen_US
dc.subjectMETACOST LEARNERSen_US
dc.subjectDEFECT PREDICTIONen_US
dc.titleA COMPARATIVE ANALYSIS OF VARIOUS SAMPLING METHODS AND METACOST LEARNERS TO IMPROVE SOFTWARE DEFECT PREDICTION FOR IMBALANCED DATAen_US
dc.typeThesisen_US
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
THESIS.pdf1.53 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.