ANALYSIS OF CLUSTERING ALGORITHMS FOR TEXT CLASSIFICATION

RAJ, ANSHULA

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/18906

Title:	ANALYSIS OF CLUSTERING ALGORITHMS FOR TEXT CLASSIFICATION
Authors:	RAJ, ANSHULA
Keywords:	CLUSTERING ALGORITHMS TEXT CLASSIFICATION DBSCAN F1 SCORE
Issue Date:	Jul-2021
Publisher:	DELHI TECHNOLOGICAL UNIVERSITY
Series/Report no.:	TD - 5469;
Abstract:	Large datasets are trending in today’s world where data is generated at a swift rate every day and this data will not be of much use until some meaningful information can be obtained. Lots of analysis is done on the data and conclusions are drawn. Different methods are applied on the data after processing like, clustering, classification, regression etc. In this project, we worked on clustering on large dataset which was a text dataset called 20NewsGroups. We implemented different unsupervised clustering algorithms in Python which were K-means, fuzzy c-means, fuzzy co-clustering of documents and keywords, agglomerative clustering and density-based spatial clustering of applications with noise. We run the algorithm on the test dataset consisting of three newsgroups (rec.sport.baseball, sci.space, alt.atheism) and noted the result. We measured accuracy and F1 score. We found out that fuzzy co-clustering of documents and keywords worked best followed by fuzzy c-means. Most ineffectual clustering algorithm for this dataset was DBSCAN. Our conclusion was that for such a large text document most effective algorithm would be the one where fuzzy concept is used because in text documents both the keywords and the individual documents association needs to be taken care of.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/18906
Appears in Collections:	M.E./M.Tech. Information Technology

Files in This Item:

File	Description	Size	Format
Major II Report 2K19-ISY-18.pdf		1.64 MB	Adobe PDF	View/Open

Show full item record