Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/18906
Full metadata record
DC FieldValueLanguage
dc.contributor.authorRAJ, ANSHULA-
dc.date.accessioned2022-02-21T08:43:14Z-
dc.date.available2022-02-21T08:43:14Z-
dc.date.issued2021-07-
dc.identifier.urihttp://dspace.dtu.ac.in:8080/jspui/handle/repository/18906-
dc.description.abstractLarge datasets are trending in today’s world where data is generated at a swift rate every day and this data will not be of much use until some meaningful information can be obtained. Lots of analysis is done on the data and conclusions are drawn. Different methods are applied on the data after processing like, clustering, classification, regression etc. In this project, we worked on clustering on large dataset which was a text dataset called 20NewsGroups. We implemented different unsupervised clustering algorithms in Python which were K-means, fuzzy c-means, fuzzy co-clustering of documents and keywords, agglomerative clustering and density-based spatial clustering of applications with noise. We run the algorithm on the test dataset consisting of three newsgroups (rec.sport.baseball, sci.space, alt.atheism) and noted the result. We measured accuracy and F1 score. We found out that fuzzy co-clustering of documents and keywords worked best followed by fuzzy c-means. Most ineffectual clustering algorithm for this dataset was DBSCAN. Our conclusion was that for such a large text document most effective algorithm would be the one where fuzzy concept is used because in text documents both the keywords and the individual documents association needs to be taken care of.en_US
dc.language.isoenen_US
dc.publisherDELHI TECHNOLOGICAL UNIVERSITYen_US
dc.relation.ispartofseriesTD - 5469;-
dc.subjectCLUSTERING ALGORITHMSen_US
dc.subjectTEXT CLASSIFICATIONen_US
dc.subjectDBSCANen_US
dc.subjectF1 SCOREen_US
dc.titleANALYSIS OF CLUSTERING ALGORITHMS FOR TEXT CLASSIFICATIONen_US
dc.typeThesisen_US
Appears in Collections:M.E./M.Tech. Information Technology

Files in This Item:
File Description SizeFormat 
Major II Report 2K19-ISY-18.pdf1.64 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.