Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/16597
Title: BIG DATA ANALYSIS USING METAHEURISTIC ALGORITHMS
Authors: TRIPATHI, ASHISH KUMAR
Keywords: BIG DATA ANALYSIS
METAHEURISTIC ALGORITHMS
K-MEANSALGORITHM
Issue Date: Jun-2018
Series/Report no.: TD-4466;
Abstract: BigDatahasgotthehugeattentionoftheresearchersfromacademiaandindustryforthe decisionandstrategymaking. Thus,efficientdataanalysismethodsarerequiredformanagingthebigdatasets. Dataclustering,aprominentanalysismethodofdatamining,isbeing efficiently employed in big data analysis since it does not require labeled datasets, which is not easily available for the big data problems. K-means, one of the simplest and popularalgorithm,hasbeenemployedforunfoldingthevariousclusteringproblems. However, theresultsofK-meansalgorithmarehighlydependentoninitialclustercentroidsandeasily traps into local optima. To mitigate this issue, a novel metaheuristic algorithm named Military Dog Based Optimizer has been introduced and validated against 17 benchmark functions. Theproposedalgorithmhasbeenalsotestedon8benchmarkclusteringdatasets and compared with other 5 recent state-of-the-art algorithms. Though, the proposed algorithm witnessed better clustering in terms of accuracy as compared to the conventional methods. However,thealgorithmfailtoperformefficientlyonthebigdatasetsintermsof memory space and the time complexities, due to their sequential execution. To overcome this issue, four novel methods have been developed for the efficient clustering of the big datasets. The first method is a hybrid of K-means and bat algorithm which run in parallel over a cluster of computers. The proposed method outperformed K-means, PSO and bat algorithmon5benchmarkdatasets. Thesecondmethodisanovelvariantofthegreywolf optimizer for clustering the big data set, in which the exploration and exploitation ability of the grey wolf optimizer is enhanced using the levy flight and binomial crossover. The proposedmethodperformedefficientlyonthe8benchmarkclusteringdatasetsascompared totheconventionalmethods. Moreover,theparallelperformanceofthepresentedmethods hasbeenalsoanalyzedusingthespeedupmeasure. Third,ahybridmethodnamedK-BBO has been developed which utilizes the search ability of the biogeography based optimizer and K-means for better initial population. Fourth, a novel parallel method using MDBO is introduced and tested on four large scale datasets. Furthermore, to test the applicability oftheproposedmethodsinrealworldscenarios,tworeal-worldproblemsnamely,Twitter sentimentanalysisandfakereviewdetectionhavebeensolvedinthebigdataenvironment usingtheproposedmethods.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/16597
Appears in Collections:Ph.D. Computer Engineering

Files in This Item:
File Description SizeFormat 
Final Thesis.pdf9.39 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.