Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/17210
Title: | ANALYSIS AND IMPROVEMENT OF TEXT CLASSIFIERS |
Authors: | KUMAWAT, SNEHA |
Keywords: | TEXT CLASSIFIERS TFID BOW |
Issue Date: | Jun-2019 |
Series/Report no.: | TD-4848; |
Abstract: | The number of textual documents are increasing at an incredible rate and very often, there is a need to classify those documents into some fixed predefined categories. The concepts of text mining and machine learning help a lot in this task of automated document classification. Since the classification is being done automatically, the classifier needs to be a good classifier so that there are as less misclassifications as possible. Therefore, the classification accuracy is very important and needs to be taken care of. There are various factors that can affect the classification accuracy of classifiers. One of the factors is the Feature Selection method used to reduce the number of features in the documents. Information Gain (IG) is one of the most popular methods employed for this task but there are few shortcomings in this method of evaluating the better words. In our thesis, we have used Term frequency inverse Document (TFID )and Bag of words (BOW) thus finding the better words which are more useful in the classification task. With these techniques we have used ensemble technique that is bagging in order to improve the classification process. We have also compared the results of both these feature selection techniques with and without ensemble learning for text classification and the results show that our method improves the average classification accuracy of a text classifier and is much more consistent in its classification accuracy. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/17210 |
Appears in Collections: | M.E./M.Tech. Computer Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ilovepdf_merged.pdf | 1.54 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.