BIG DATA ANALYSIS USING METAHEURISTIC ALGORITHMS

TRIPATHI, ASHISH KUMAR

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/16597

Title:	BIG DATA ANALYSIS USING METAHEURISTIC ALGORITHMS
Authors:	TRIPATHI, ASHISH KUMAR
Keywords:	BIG DATA ANALYSIS METAHEURISTIC ALGORITHMS K-MEANSALGORITHM
Issue Date:	Jun-2018
Series/Report no.:	TD-4466;
Abstract:	BigDatahasgotthehugeattentionoftheresearchersfromacademiaandindustryforthe decisionandstrategymaking. Thus,efficientdataanalysismethodsarerequiredformanagingthebigdatasets. Dataclustering,aprominentanalysismethodofdatamining,isbeing efficiently employed in big data analysis since it does not require labeled datasets, which is not easily available for the big data problems. K-means, one of the simplest and popularalgorithm,hasbeenemployedforunfoldingthevariousclusteringproblems. However, theresultsofK-meansalgorithmarehighlydependentoninitialclustercentroidsandeasily traps into local optima. To mitigate this issue, a novel metaheuristic algorithm named Military Dog Based Optimizer has been introduced and validated against 17 benchmark functions. Theproposedalgorithmhasbeenalsotestedon8benchmarkclusteringdatasets and compared with other 5 recent state-of-the-art algorithms. Though, the proposed algorithm witnessed better clustering in terms of accuracy as compared to the conventional methods. However,thealgorithmfailtoperformefficientlyonthebigdatasetsintermsof memory space and the time complexities, due to their sequential execution. To overcome this issue, four novel methods have been developed for the efficient clustering of the big datasets. The first method is a hybrid of K-means and bat algorithm which run in parallel over a cluster of computers. The proposed method outperformed K-means, PSO and bat algorithmon5benchmarkdatasets. Thesecondmethodisanovelvariantofthegreywolf optimizer for clustering the big data set, in which the exploration and exploitation ability of the grey wolf optimizer is enhanced using the levy flight and binomial crossover. The proposedmethodperformedefficientlyonthe8benchmarkclusteringdatasetsascompared totheconventionalmethods. Moreover,theparallelperformanceofthepresentedmethods hasbeenalsoanalyzedusingthespeedupmeasure. Third,ahybridmethodnamedK-BBO has been developed which utilizes the search ability of the biogeography based optimizer and K-means for better initial population. Fourth, a novel parallel method using MDBO is introduced and tested on four large scale datasets. Furthermore, to test the applicability oftheproposedmethodsinrealworldscenarios,tworeal-worldproblemsnamely,Twitter sentimentanalysisandfakereviewdetectionhavebeensolvedinthebigdataenvironment usingtheproposedmethods.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/16597
Appears in Collections:	Ph.D. Computer Engineering

Files in This Item:

File	Description	Size	Format
Final Thesis.pdf		9.39 MB	Adobe PDF	View/Open

Show full item record