PLAGIARISM DETECTION

GUPTA, AKANKSHA

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/13938

Full metadata record

DC Field	Value	Language
dc.contributor.author	GUPTA, AKANKSHA	-
dc.date.accessioned	2012-01-27T10:41:43Z	-
dc.date.available	2012-01-27T10:41:43Z	-
dc.date.issued	2012-01-27	-
dc.identifier.uri	http://dspace.dtu.ac.in:8080/jspui/handle/repository/13938	-
dc.description	M.TECH	en_US
dc.description.abstract	In this project we have improved the graph-based techniques of representing a document and accessing its constituent sentences. We have also proposed a new approach to reconstruct the evolution process in the database i.e.,the global corpus to optimize the no. of suspected texts. Further work has been done to extract tokens from the document and optimize the tokens to be used to detect the plagiarism based on graph representation. Tokens provide semantic metadata that summarize and characterize documents. We have also described an algorithm for automatically extracting keyphrases from text and providing a basis for optimized set of tokens for optimization techniques to be used like PSO , SA and BA .Our algorithm identifies candidate tokens using lexical methods, calculates local best, global best for each candidate token, and uses a machine-learning algorithm to predict which candidates are good tokens. The machine learning scheme first builds a prediction model using training documents , and then uses the model to find tokens in new documents.Preprocessing for each document is required such as breaking down the document into its constituent sentences. Segmentation of each sentence into separated terms and stop word removal.For the comparison of two documents graph is built by grouping each sentence terms in one node, The resulted nodes are connected to each other based on order of sentence within the document, All nodes in graph are also connected to top level node “Topic Signature”. Topic signature node is formed by extracting the concepts of each sentence terms and grouping them in such node. The topic signature is main entry for the graph is used as quick guide to the relevant nodes which should be considered for the comparison between source documents and suspected one. The proposed method has enabled us to achieve better performance in terms of effectiveness and efficiency of memory and CPU cycles used.	en_US
dc.language.iso	en	en_US
dc.relation.ispartofseries	TD 894;65	-
dc.subject	PLAGIARISM DETECTION	en_US
dc.subject	GRAPH BASED TECHNIQUE	en_US
dc.subject	TOPIC SIGNATURE	en_US
dc.title	PLAGIARISM DETECTION	en_US
dc.type	Thesis	en_US
Appears in Collections:	M.E./M.Tech. Information Technology

Files in This Item:

File	Description	Size	Format
thesis-AkankshaGupta.docx		3.02 MB	Microsoft Word XML	View/Open

Show simple item record