Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/17210
Title: ANALYSIS AND IMPROVEMENT OF TEXT CLASSIFIERS
Authors: KUMAWAT, SNEHA
Keywords: TEXT CLASSIFIERS
TFID
BOW
Issue Date: Jun-2019
Series/Report no.: TD-4848;
Abstract: The number of textual documents are increasing at an incredible rate and very often, there is a need to classify those documents into some fixed predefined categories. The concepts of text mining and machine learning help a lot in this task of automated document classification. Since the classification is being done automatically, the classifier needs to be a good classifier so that there are as less misclassifications as possible. Therefore, the classification accuracy is very important and needs to be taken care of. There are various factors that can affect the classification accuracy of classifiers. One of the factors is the Feature Selection method used to reduce the number of features in the documents. Information Gain (IG) is one of the most popular methods employed for this task but there are few shortcomings in this method of evaluating the better words. In our thesis, we have used Term frequency inverse Document (TFID )and Bag of words (BOW) thus finding the better words which are more useful in the classification task. With these techniques we have used ensemble technique that is bagging in order to improve the classification process. We have also compared the results of both these feature selection techniques with and without ensemble learning for text classification and the results show that our method improves the average classification accuracy of a text classifier and is much more consistent in its classification accuracy.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/17210
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
ilovepdf_merged.pdf1.54 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.