Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/15820
Title: DOCUMENT CLASSIFICATION USING UNIQUE AND ELITE KEYWORDS BASED ON ENTROPY BASED PARTITIONING
Authors: KESHARI, JULI
Keywords: DOCUMENT CLASSIFICATION
ELITE KEYWORDS
UNIQUE KEYWORDS
PARTITIONING
Issue Date: Jun-2017
Series/Report no.: TD-2793;
Abstract: In this project, we investigate the selection of significant keywords for document classification. We proposed two different schemes for selecting significant keywords, elite and unique elite. Elite Keywords are those keywords that have high term frequency in each class. This is irrespective of the frequencies of these terms in other classes. To get the high occurring terms in each class, we employ entropy based partitioning technique that is usually used in the field of information theory and coding to generate partition between symbol probabilities. So our method has the advantage as compared to other feature selection schemes that we get the exact subset of significant keywords for each class, and we do not rely on hit and trial methods. Unique elite keywords are those that are elite for a particular class and at the same time have higher occurring frequency only in that class. To measure this, we compute the entropy of each elite keyword across all classes, sort the entropies in ascending order and again employ entropy partitioning to shortlist those elite keywords that occur uniquely in this class. Comparison with the state-of-the-art methods on benchmark data sets establishes the efficiency of our method from the high percentage accuracy obtained.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/15820
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
Complete_Thesis[1].pdf1.4 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.