Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/15814
Title: PERFORMANCE ANALYSIS OF APRIORI AND FP GROWTH ON DIFFERENT MAPREDUCE FRAMEWORKS
Authors: RANJAN, RAVI
Keywords: APRIORI
FP GROWTH
MAPREDUCE FRAMEWORKS
HADOOP CLUSTER
Issue Date: Jun-2017
Series/Report no.: TD-2788;
Abstract: Association rule mining remains a very popular and effective method to extract meaningful information from large datasets. It tries to find possible associations between items in large transaction based datasets. In order to create these associations, frequent patterns have to be generated. Apriori and FP Growth are the two most popular algorithms for frequent itemset mining. To enhance the efficiency and scalability of Apriori and FP Growth, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. However, the existing parallel versions of Apriori and FP-Growth algorithm implemented with the disk-based MapReduce model are not efficient enough for iterative computation. Hence a number of map reduce based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted lot of attention because of their inbuilt support to distributed computations. But, not much work has been done to test the capabilities of these two platforms in the field of parallel and distributed mining. Therefore, this work helps us to better understand, how the two algorithms perform on three different platforms. We conducted an in-depth experiment to gain insight into the effectiveness, efficiency and scalability of the Apriori and Parallel FP Growth algorithm on Hadoop, Spark and Flink.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/15814
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
thesis-cd.pdf2.01 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.