Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/14749
Full metadata record
DC FieldValueLanguage
dc.contributor.authorNEGI, ROHIT-
dc.date.accessioned2016-05-12T12:48:40Z-
dc.date.available2016-05-12T12:48:40Z-
dc.date.issued2016-05-
dc.identifier.urihttp://dspace.dtu.ac.in:8080/jspui/handle/repository/14749-
dc.description.abstractThe rapid development of the Internet and its impact on every aspect of life has resulted in the size of the data to increase from GB level to TB even PB level. This has brought about new technologies such as Hadoop for efficient storage and analysis the data. Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Cluster is a collection of data members having similar characteristics. The process of establishing a relation or deriving information from raw data by performing some operations on the data set like clustering is known as data mining. Data collected in practical scenarios is more often than not completely random and unstructured. Hence, there is always a need for analysis of unstructured data sets to derive meaningful information. This is where unsupervised algorithms come in to picture to process unstructured or even semi structured data sets by resultant. K-Means Clustering is one such technique used to provide a structure to unstructured data so that valuable information can be extracted. This paper discusses the implementation of the K-Means Clustering Algorithm over a distributed environment using ApacheTM Hadoop. The key to the implementation of the K-Means Algorithm is the design of the Mapper and Reducer routines which has been discussed in the later part of the paper. The steps involved in the execution of the K-Means Algorithm has also been described in this paper based on a small scale implementation of the K-Means Clustering Algorithm on an experimental setup to serve as a guide for practical implementations.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseriesTD NO.2037;-
dc.subjectK-MEANS CLUSTERINGen_US
dc.subjectMAP REDUCE ARCHITECTUREen_US
dc.subjectDATA COLLECTIONen_US
dc.subjectALGORITHMen_US
dc.titleK-MEANS CLUSTERING ALGORITHM ON MAP REDUCE ARCHITECTUREen_US
dc.typeThesisen_US
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
1PG.pdf33.4 kBAdobe PDFView/Open
pre.pdf358.15 kBAdobe PDFView/Open
negi_final.pdf773.13 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.