DEVELOPMENT AND VALIDATION OF FEATURE SELECTION TECHNIQUES FOR SOFTWARE DEFECT PREDICTION

KHAN, KISHWAR

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21442

Title:	DEVELOPMENT AND VALIDATION OF FEATURE SELECTION TECHNIQUES FOR SOFTWARE DEFECT PREDICTION
Authors:	KHAN, KISHWAR
Keywords:	FEATURE SELECTION TECHNIQUES SOFTWARE DEFECT PREDICTION MACHINE LEARNING(ML) VALIDATION SMOTE
Issue Date:	Dec-2024
Series/Report no.:	TD-7755;
Abstract:	Software defect prediction is a vital research area focused on improving the reliability and maintainability of software systems. As these systems become increasingly com plex, the demand for accurate predictive models to identify defect-prone components grows more critical. Despite significant advancements in the field, challenges such as imbalanced datasets, feature selection, and the fine-tuning of machine learning algorithms for optimal performance persist. This research tackles these challenges by developing and validating enhanced Machine Learning (ML) techniques specif ically designed for software quality prediction. The primary goal is to elevate the performance of prediction models by addressing essential issues like feature selection, hyperparameter tuning, and data imbalance, thereby enhancing the accuracy and robustness of these models. The research is validated through systematic reviews, empirical studies, and the creation of frameworks and tools applicable in real-world software development settings. The thesis is systematically organized into several phases, each concentrating on different aspects of software defect prediction. The initial phase involves conducting a comprehensive systematic literature review to identify the most effective feature selection and machine learning algorithms currently employed in software defect prediction. This review establishes a foundation for understanding the current state of the field and highlights gaps that this thesis seeks to address. Key research questions examined include determining the most valuable feature selection and hyperparameter tuning technique for predicting defect-prone modules and assessing the effectiveness of various machine learning algorithms. In the subsequent phases, the research focuses on developing and validating software defect prediction models using a range of feature selection and extraction techniques such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Kernel-based Principal Component Analysis (K-PCA) and Autoencoders with Support Vector Machine (SVM) as base machine learning classifier. One of the significant contributions of this thesis is the development of a model for cross-project defect prediction using ten feature reduction techniques and Ma chine learning classifiers. The research explores the use of cross-project validation techniques, which are essential for ensuring that predictive models can generalize across different software projects. This is particularly important in scenarios where project-specific characteristics may vary, such as differences in coding practices or project domains. We analysed the impact of five filter based Feature Subset Selection techniques namely best first, exhaustive search, genetic search, greedy step wise search and random search along with five Feature Reduction techniques namely gain ratio, symmetrical uncertainty, oneR, information gain and reliefF and a no feature selection configuration by utilising the predictive ability of five frequently used classification approaches. Each method undergoes thorough testing and comparison across diverse datasets to confirm its validity and real-world applicability. The need for evolutionary feature selection techniques arises from the complexity and high dimensionality of data in many machines learning tasks, including software defect prediction. Traditional feature selection methods may struggle to efficiently explore the vast search space of possible feature combinations, often leading to suboptimal performance. Evolutionary techniques, inspired by natural selection, offer a robust solution by iteratively optimizing feature subsets to enhance model accuracy and reduce overfitting. To address this, the thesis explores and implements a novel Software defect prediction model based on a variant of Grey Wolf Optimisation paired with Synthetic Minority Oversampling Technique. This model is particularly valuable when dealing with large datasets, complex feature interactions, and the need for balancing multiple objectives, such as maximizing predictive accuracy while minimizing computational cost. This thesis also addresses the issue of imbalanced data, a prevalent challenge in software defect prediction. Imbalanced datasets often cause traditional machine learning models to produce biased predictions. To mitigate this, the study explores and implements techniques like the Synthetic Minority Over-sampling Technique (SMOTE). These methods are assessed based on their effectiveness in enhancing the prediction of defect-prone modules while reducing false positives. Additionally, hyperparameter tuning is a key focus of this research. Achieving optimal model perfor mance often requires careful adjustment of parameters, and this study employs various tuning methods, including evolutionary and Bayesian optimization, to determine the best parameters for each predictive model. The research systematically evaluates the impact of hyperparameter tuning, demonstrating that well-tuned models significantly outperform those using default settings. In conclusion, this thesis contributes substantially to the field of software defect prediction by tackling critical challenges in predictive modelling. By developing and validating advanced machine learning techniques, this work improves the accuracy, robustness, and practical relevance of predictive models in software development. The models and insights generated through this research hold the potential to make a significant impact in both academic and industrial contexts, offering researchers and practitioners new approaches for enhancing software quality. Moreover, by reducing software defects, this study contributes to the development of more reliable and secure software systems, ultimately benefiting society by fostering safer and more efficient technological environments.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/21442
Appears in Collections:	Ph.D. Computer Engineering

Files in This Item:

File	Description	Size	Format
KISHWAR KHAN pH.d..pdf		4.69 MB	Adobe PDF	View/Open

Show full item record