Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/16188
Title: IMPROVING CLASSIFICATION ACCURACY OF TEXT CLASSIFIERS
Authors: RASTOGI, SHIVAM
Keywords: CLASSIFICATION ACCURACY
TEXT CLASSIFIERS
INFORMATION GAIN
DISCRIMINATING POWER
TEXT MINING
Issue Date: Jun-2018
Series/Report no.: TD-4086;
Abstract: The number of textual documents are increasing at an incredible rate and very often, there is a need to classify those documents into some fixed predefined categories. The concepts of text mining and machine learning help a lot in this task of automated document classification. Since the classification is being done automatically, the classifier needs to be a good classifier so that there are as less misclassifications as possible. Therefore, the classification accuracy is very important and needs to be taken care of. There are various factors that can affect the classification accuracy of classifiers. One of the factors is the Feature Selection method used to reduce the number of features in the documents. Information Gain (IG) is one of the most popular methods employed for this task but there are few shortcomings in this method of evaluating the better words. In our thesis, we have devised a new formula for evaluating the words in the documents and thus finding the better words which are more useful in the classification task. Our method aims to find those words which have more discriminating power than others and therefore, we have named our formula as Discriminating Power (DP). So, we need to find DP of every word in the document and then select those which have more value of DP as higher the value of DP of a word, the better it is for the classification purpose. We have also compared the results of using Infogain method and our Discriminating Power method for text classification and the results show that our method improves the average classification accuracy of a text classifier and is much more consistent in its classification accuracy for different values of feature counts selected.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/16188
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
Thesis.pdf357.59 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.