Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/15399
Full metadata record
DC FieldValueLanguage
dc.contributor.authorBATRA, PALLAVI-
dc.date.accessioned2016-12-15T05:30:48Z-
dc.date.available2016-12-15T05:30:48Z-
dc.date.issued2016-12-
dc.identifier.urihttp://dspace.dtu.ac.in:8080/jspui/handle/repository/15399-
dc.description.abstractFinding association rules through data mining among different items in a large database distributed over a large number of nodes is one of the challenges in the field of discovery of knowledge. Extraction of frequent patterns in transaction-oriented database is crucial to several data mining tasks such as association rule generation, time series analysis, classification, etc. Most of these mining tasks require multiple passes over the database and if the database size is large, which is usually the case, scalable high performance solutions involving multiple processors are required. When the database is distributed among several different systems with share-nothing memory architecture, the problem of mining data for finding frequent patters can be done using distributed data mining algorithms. One such proposed algorithm is FDM (Fast Distributed Mining) and CD (Count Distribution) which are Apriori based algorithms that generates candidate set on each iteration. The generation of candidate sets is same as that of Apriori algorithm. Once the candidate sets have been generated, two pruning techniques, local pruning and global pruning, are developed to prune away some infrequent candidate sets at each individual sites. All sites share a common globally frequent itemset with identical support counts, so rules that are generated at different participating sites have identical confidence. This approach focuses on a rule's exactness and correctness. The main problem with these algorithm is the number of iterations it goes through before generating the final frequent itemsets. Every time it finds the candidate itemset, it communicates them as per the polling site resulting in high communication cost and network bandwidth. We propose a new algorithm which uses the advantage of N-List structure to find out all the candidate itemsets in a one single scan resulting in less communication. We have also proposed a solution to further study the effect on communication by communicating both frequent and infrequent itemsets in a single pass rather than sending request and reply messages for every infrequent itemset.en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTD NO.2608;-
dc.subjectN-LIST STRUCTUREen_US
dc.subjectDATA MININGen_US
dc.subjectDISTRIBUTED DATABASEen_US
dc.subjectCOMMUNICATIONen_US
dc.titleEFFECT ON COMMUNICATION USING N-LIST STRUCTURE FOR DATA MINING IN DISTRIBUTED DATABASEen_US
dc.typeThesisen_US
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
pdf_draft_6.pdf768.1 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.