<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>DSpace Collection:</title>
  <link rel="alternate" href="http://dspace.dtu.ac.in:8080/jspui/handle/repository/18630" />
  <subtitle />
  <id>http://dspace.dtu.ac.in:8080/jspui/handle/repository/18630</id>
  <updated>2026-04-28T06:43:48Z</updated>
  <dc:date>2026-04-28T06:43:48Z</dc:date>
  <entry>
    <title>EMPIRICAL VALIDATION OF OBJECT-ORIENTED METRICS FOR IMBALANCED CLASSIFICATION USING OPEN SOURCE SOFTWARE</title>
    <link rel="alternate" href="http://dspace.dtu.ac.in:8080/jspui/handle/repository/18646" />
    <author>
      <name>JAIN, JUHI</name>
    </author>
    <id>http://dspace.dtu.ac.in:8080/jspui/handle/repository/18646</id>
    <updated>2021-12-08T06:22:29Z</updated>
    <published>2021-01-01T00:00:00Z</published>
    <summary type="text">Title: EMPIRICAL VALIDATION OF OBJECT-ORIENTED METRICS FOR IMBALANCED CLASSIFICATION USING OPEN SOURCE SOFTWARE
Authors: JAIN, JUHI
Abstract: Software are an inextricable part of our lives. With the ever-growing complexity of software,&#xD;
designing and integrating changes in these software is always a tedious task for developers&#xD;
and software practitioners. One of the prime concerns while implementing changes&#xD;
is to maintain the quality of software products as there are fewer resources and rigid deadlines.&#xD;
If defects are uncovered in later stages of software development, the cost of detecting&#xD;
and removing them amplifies exponentially. This may result in poor software development&#xD;
processes and software quality degradation. With the constraints of strict time schedules&#xD;
and limited resources, it becomes the utmost requirement of software developers and practitioners&#xD;
to discover these defects early. Finding defects or faults in the early phases of&#xD;
the software development life cycle leads to better planning and reduced cost, effort, and&#xD;
resources [1].&#xD;
Software metrics are widely used for generating defect prediction models. Different&#xD;
object-oriented (OO) metrics define different internal attributes of the software like cohesion,&#xD;
coupling, size, inheritance, encapsulation, etc. Therefore, these metrics are utilized&#xD;
to envisage whether a software class can be defective or not [2, 3]. Selection of relevant&#xD;
metrics aids in effective predictive modelling for finding defects. We evaluated the&#xD;
correlation-based feature selection for identifying the important metrics that are related to&#xD;
defect-prone areas in the software.&#xD;
Various machine learning (ML) and statistical techniques have been used for developing&#xD;
prediction models to ascertain defect-proneness in the literature. We discovered a new&#xD;
category of classification techniques, search-based techniques (SBTs), that is rarely used&#xD;
in the Software Defect Prediction (SDP) domain. We assessed the effectiveness of ML&#xD;
techniques and SBTs for developing models that predict defective classes in the OO software.&#xD;
We further extended the use of genetic algorithm variants for feature selection and&#xD;
performed the comparative analysis with Correlation Feature Selection (CFS).&#xD;
One of the major issues that have been observed in software data is the imbalanced&#xD;
data problem. If there is a fewer number of instances of one type of class than that of&#xD;
another class, then data is said to have an imbalanced data problem. For our application,&#xD;
if in software defective classes are less than non-defective classes, then it is said to be&#xD;
imbalanced. We conducted a structured review to analyze the ways of tackling imbalanced&#xD;
data problem for developing the defect prediction models. The review results will help in&#xD;
identifying best practices and research gaps if any.&#xD;
Imbalanced data problem can be treated either at the data level or algorithm level. At&#xD;
the data level, we developed ML models using resampling methods to assess their impact&#xD;
on defect-proneness. At the algorithm level, cost-sensitive learning is employed to tackle&#xD;
the imbalanced data issue. The impact of different MetaCost learners was investigated for&#xD;
optimum defect prediction in the software. Studies in literature have advocated the use of&#xD;
ensemble methodology for various software prediction tasks. We evaluated the ensemble&#xD;
methods after treating the data with resampling methods. The incorporation of resampling&#xD;
methods will alleviate the imbalanced data problem resulting in better model prediction.&#xD;
We assessed the effectiveness of OO metrics, ML techniques, SBTs, resampling methods,&#xD;
and MetaCost learners for developing SDP models.</summary>
    <dc:date>2021-01-01T00:00:00Z</dc:date>
  </entry>
</feed>

