Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/13938
Title: PLAGIARISM DETECTION
Authors: GUPTA, AKANKSHA
Keywords: PLAGIARISM DETECTION
GRAPH BASED TECHNIQUE
TOPIC SIGNATURE
Issue Date: 27-Jan-2012
Series/Report no.: TD 894;65
Abstract: In this project we have improved the graph-based techniques of representing a document and accessing its constituent sentences. We have also proposed a new approach to reconstruct the evolution process in the database i.e.,the global corpus to optimize the no. of suspected texts. Further work has been done to extract tokens from the document and optimize the tokens to be used to detect the plagiarism based on graph representation. Tokens provide semantic metadata that summarize and characterize documents. We have also described an algorithm for automatically extracting keyphrases from text and providing a basis for optimized set of tokens for optimization techniques to be used like PSO , SA and BA .Our algorithm identifies candidate tokens using lexical methods, calculates local best, global best for each candidate token, and uses a machine-learning algorithm to predict which candidates are good tokens. The machine learning scheme first builds a prediction model using training documents , and then uses the model to find tokens in new documents.Preprocessing for each document is required such as breaking down the document into its constituent sentences. Segmentation of each sentence into separated terms and stop word removal.For the comparison of two documents graph is built by grouping each sentence terms in one node, The resulted nodes are connected to each other based on order of sentence within the document, All nodes in graph are also connected to top level node “Topic Signature”. Topic signature node is formed by extracting the concepts of each sentence terms and grouping them in such node. The topic signature is main entry for the graph is used as quick guide to the relevant nodes which should be considered for the comparison between source documents and suspected one. The proposed method has enabled us to achieve better performance in terms of effectiveness and efficiency of memory and CPU cycles used.
Description: M.TECH
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/13938
Appears in Collections:M.E./M.Tech. Information Technology

Files in This Item:
File Description SizeFormat 
thesis-AkankshaGupta.docx3.02 MBMicrosoft Word XMLView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.