EFFECT ON COMMUNICATION USING N-LIST STRUCTURE FOR DATA MINING IN DISTRIBUTED DATABASE

BATRA, PALLAVI

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/15399

Full metadata record

DC Field	Value	Language
dc.contributor.author	BATRA, PALLAVI	-
dc.date.accessioned	2016-12-15T05:30:48Z	-
dc.date.available	2016-12-15T05:30:48Z	-
dc.date.issued	2016-12	-
dc.identifier.uri	http://dspace.dtu.ac.in:8080/jspui/handle/repository/15399	-
dc.description.abstract	Finding association rules through data mining among different items in a large database distributed over a large number of nodes is one of the challenges in the field of discovery of knowledge. Extraction of frequent patterns in transaction-oriented database is crucial to several data mining tasks such as association rule generation, time series analysis, classification, etc. Most of these mining tasks require multiple passes over the database and if the database size is large, which is usually the case, scalable high performance solutions involving multiple processors are required. When the database is distributed among several different systems with share-nothing memory architecture, the problem of mining data for finding frequent patters can be done using distributed data mining algorithms. One such proposed algorithm is FDM (Fast Distributed Mining) and CD (Count Distribution) which are Apriori based algorithms that generates candidate set on each iteration. The generation of candidate sets is same as that of Apriori algorithm. Once the candidate sets have been generated, two pruning techniques, local pruning and global pruning, are developed to prune away some infrequent candidate sets at each individual sites. All sites share a common globally frequent itemset with identical support counts, so rules that are generated at different participating sites have identical confidence. This approach focuses on a rule's exactness and correctness. The main problem with these algorithm is the number of iterations it goes through before generating the final frequent itemsets. Every time it finds the candidate itemset, it communicates them as per the polling site resulting in high communication cost and network bandwidth. We propose a new algorithm which uses the advantage of N-List structure to find out all the candidate itemsets in a one single scan resulting in less communication. We have also proposed a solution to further study the effect on communication by communicating both frequent and infrequent itemsets in a single pass rather than sending request and reply messages for every infrequent itemset.	en_US
dc.language.iso	en	en_US
dc.relation.ispartofseries	TD NO.2608;	-
dc.subject	N-LIST STRUCTURE	en_US
dc.subject	DATA MINING	en_US
dc.subject	DISTRIBUTED DATABASE	en_US
dc.subject	COMMUNICATION	en_US
dc.title	EFFECT ON COMMUNICATION USING N-LIST STRUCTURE FOR DATA MINING IN DISTRIBUTED DATABASE	en_US
dc.type	Thesis	en_US
Appears in Collections:	M.E./M.Tech. Computer Engineering

Files in This Item:

File	Description	Size	Format
pdf_draft_6.pdf		768.1 kB	Adobe PDF	View/Open

Show simple item record