Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/18906
Title: ANALYSIS OF CLUSTERING ALGORITHMS FOR TEXT CLASSIFICATION
Authors: RAJ, ANSHULA
Keywords: CLUSTERING ALGORITHMS
TEXT CLASSIFICATION
DBSCAN
F1 SCORE
Issue Date: Jul-2021
Publisher: DELHI TECHNOLOGICAL UNIVERSITY
Series/Report no.: TD - 5469;
Abstract: Large datasets are trending in today’s world where data is generated at a swift rate every day and this data will not be of much use until some meaningful information can be obtained. Lots of analysis is done on the data and conclusions are drawn. Different methods are applied on the data after processing like, clustering, classification, regression etc. In this project, we worked on clustering on large dataset which was a text dataset called 20NewsGroups. We implemented different unsupervised clustering algorithms in Python which were K-means, fuzzy c-means, fuzzy co-clustering of documents and keywords, agglomerative clustering and density-based spatial clustering of applications with noise. We run the algorithm on the test dataset consisting of three newsgroups (rec.sport.baseball, sci.space, alt.atheism) and noted the result. We measured accuracy and F1 score. We found out that fuzzy co-clustering of documents and keywords worked best followed by fuzzy c-means. Most ineffectual clustering algorithm for this dataset was DBSCAN. Our conclusion was that for such a large text document most effective algorithm would be the one where fuzzy concept is used because in text documents both the keywords and the individual documents association needs to be taken care of.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/18906
Appears in Collections:M.E./M.Tech. Information Technology

Files in This Item:
File Description SizeFormat 
Major II Report 2K19-ISY-18.pdf1.64 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.