Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/18906
Title: | ANALYSIS OF CLUSTERING ALGORITHMS FOR TEXT CLASSIFICATION |
Authors: | RAJ, ANSHULA |
Keywords: | CLUSTERING ALGORITHMS TEXT CLASSIFICATION DBSCAN F1 SCORE |
Issue Date: | Jul-2021 |
Publisher: | DELHI TECHNOLOGICAL UNIVERSITY |
Series/Report no.: | TD - 5469; |
Abstract: | Large datasets are trending in today’s world where data is generated at a swift rate every day and this data will not be of much use until some meaningful information can be obtained. Lots of analysis is done on the data and conclusions are drawn. Different methods are applied on the data after processing like, clustering, classification, regression etc. In this project, we worked on clustering on large dataset which was a text dataset called 20NewsGroups. We implemented different unsupervised clustering algorithms in Python which were K-means, fuzzy c-means, fuzzy co-clustering of documents and keywords, agglomerative clustering and density-based spatial clustering of applications with noise. We run the algorithm on the test dataset consisting of three newsgroups (rec.sport.baseball, sci.space, alt.atheism) and noted the result. We measured accuracy and F1 score. We found out that fuzzy co-clustering of documents and keywords worked best followed by fuzzy c-means. Most ineffectual clustering algorithm for this dataset was DBSCAN. Our conclusion was that for such a large text document most effective algorithm would be the one where fuzzy concept is used because in text documents both the keywords and the individual documents association needs to be taken care of. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/18906 |
Appears in Collections: | M.E./M.Tech. Information Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Major II Report 2K19-ISY-18.pdf | 1.64 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.