Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/14749
Title: K-MEANS CLUSTERING ALGORITHM ON MAP REDUCE ARCHITECTURE
Authors: NEGI, ROHIT
Keywords: K-MEANS CLUSTERING
MAP REDUCE ARCHITECTURE
DATA COLLECTION
ALGORITHM
Issue Date: May-2016
Series/Report no.: TD NO.2037;
Abstract: The rapid development of the Internet and its impact on every aspect of life has resulted in the size of the data to increase from GB level to TB even PB level. This has brought about new technologies such as Hadoop for efficient storage and analysis the data. Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Cluster is a collection of data members having similar characteristics. The process of establishing a relation or deriving information from raw data by performing some operations on the data set like clustering is known as data mining. Data collected in practical scenarios is more often than not completely random and unstructured. Hence, there is always a need for analysis of unstructured data sets to derive meaningful information. This is where unsupervised algorithms come in to picture to process unstructured or even semi structured data sets by resultant. K-Means Clustering is one such technique used to provide a structure to unstructured data so that valuable information can be extracted. This paper discusses the implementation of the K-Means Clustering Algorithm over a distributed environment using ApacheTM Hadoop. The key to the implementation of the K-Means Algorithm is the design of the Mapper and Reducer routines which has been discussed in the later part of the paper. The steps involved in the execution of the K-Means Algorithm has also been described in this paper based on a small scale implementation of the K-Means Clustering Algorithm on an experimental setup to serve as a guide for practical implementations.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/14749
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
1PG.pdf33.4 kBAdobe PDFView/Open
pre.pdf358.15 kBAdobe PDFView/Open
negi_final.pdf773.13 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.