DESIGN AND VALIDATION OF SOFTWARE MAINTAINABILITY PREDICTION MODELS FOR IMBALANCED DATA USING OBJECT ORIENTED METRICS

LATA, KUSUM

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/18767

Title:	DESIGN AND VALIDATION OF SOFTWARE MAINTAINABILITY PREDICTION MODELS FOR IMBALANCED DATA USING OBJECT ORIENTED METRICS
Authors:	LATA, KUSUM
Keywords:	DESIGN AND VALIDATION SOFTWARE MAINTAINABILITY IMBALANCED DATA ORIENTED METRICS
Issue Date:	2020
Publisher:	DELHI TECHNOLOGICAL UNIVERSITY
Series/Report no.:	TD - 5262;
Abstract:	In today's era, software systems are becoming enormously large and complex. The principal challenge confronted by software practitioners and engineers is that such large and complex software projects have to be developed in a specified short period and satisfying the client's requirements. Developing maintainable software not only saves the effort during the maintenance, but it also results in saving the cost. The problem of predicting the maintainability of software is extensively recognized in the industry and much has been done on how maintainability can be predicted with the help of software metrics. Software maintainability prediction involves the use of various software metrics as predictor variables that are representative of software characteristics such as size, coupling, cohesion and inheritance. Furthermore, we need learning techniques for developing efficient prediction models that are able to determine the software parts having low maintainability or high maintainability. The various elements involved in the development of software maintainability prediction models are required to be analysed and improved to yield efficient software maintainability prediction models. This thesis analyses and validates the relationship between various Object-Oriented metrics and software maintainability. The empirical analyses were conducted using machine learning, search-based and hybridized techniques with a goal of developing the effective models to predict software maintainability during the initial phases of software development. For developing the models for the predictive modelling tasks, it is tremendously essential to look into the data distribution of the underlying datasets. Because the imbalanced distribution of the dataset possess enormous hurdle in the training of the models. This thesis is focused on the improvement of software maintainability by developing models by handling imbalanced data. In the imbalanced data problem addressed in this thesis, the classes with low maintainability are regarded as minority classes while the classes with high maintainability are regarded as the majority classes. The prime contribution of this thesis is the investigation of techniques for developing effective software maintainability prediction models from imbalanced data. In practice, researchers might not be able to obtain balanced training data with a proportionate number of low maintainability and high maintainability classes. The data resampling techniques have been applied in this thesis for developing software maintainability prediction models to solve the issue of obtaining impractical models yielded from imbalanced data. After obtaining balanced data by applying data resampling techniques, software maintainability prediction models are developed with various machine learning techniques. Moreover, the use of search-based techniques, a sub-class of machine learning techniques is limited in the domain of maintainability prediction. The search-based techniques are meta heuristic techniques that find an optimal or near-optimal solution amongst a large population of candidate solutions. The research community consistently explores new methods and techniques for developing better and effective prediction models. A promising approach for improvement of existing classifiers is ensemble methodology that aggregates various individual classifiers to provide stable results. In this thesis, we have also investigated the use of ensemble methodology by aggregating them with data resampling techniques. The aggregation of ensemble techniques with data resampling is called as ensemble learners for imbalanced data problem. Another contribution of this thesis is an investigation of hybridized techniques that combine search-based and machine learning techniques into a single approach. The hybridized techniques have been explored after data resampling to develop effective software maintainability prediction models from imbalanced data. To contribute to handle imbalanced data in software maintainability prediction, we also propose a novel oversampling technique termed Modified Safe Level Synthetic Minority Oversampling Technique in this thesis. The data collection for training the prediction model is one of the difficult tasks because in most cases either such data is unavailable or it is difficult to collect. To overcome the limitation of historical data collection, the development of generalized maintainability prediction models with inter-project validation is essential. Thus, the situation in which there is the inadequacy of resources and lack of time to capture training data for the development of maintainability prediction model, inter-project validation can be employed. The applicability of inter-project validation for software maintainability prediction has also been investigated in this thesis.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/18767
Appears in Collections:	Ph.D. Computer Engineering

Files in This Item:

File	Description	Size	Format
Thesis .pdf		3.33 MB	Adobe PDF	View/Open

Show full item record