DESIGN AND DEVELOPMENT OF  FRAMEWORK TO DETECT MALICIOUS  MANIPULATIONS IN MULTIMEDIA DATA

YADAV, ANKIT

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21215

Title:	DESIGN AND DEVELOPMENT OF FRAMEWORK TO DETECT MALICIOUS MANIPULATIONS IN MULTIMEDIA DATA
Authors:	YADAV, ANKIT
Keywords:	MALICIOUS MANIPULATIONS MULTIMEDIA DATA SOCIAL MEDIA PLATEFORM FRAMEWORK DETECTION Face-NeSt MRT-Net
Issue Date:	Jan-2024
Series/Report no.:	TD-7543;
Abstract:	Due to the widespread usage of image and video editing tools, an alarming problem has emerged in an era characterised by the rapid spread of multimedia content on social media platforms. The combination of the simplicity and complexity of these innovations presents a substantial risk to the genuineness and reliability of the information included in multimedia files. This thesis emphasises the necessity to create robust systems for detecting dangerous alterations in multimedia data by using the potential of deep learning techniques. This is achieved by using the potential of deep learning algorithms. The susceptibility of audiovisual content to harmful changes has significantly increased, reaching unprecedented levels. This results from implementing modern technologies that facilitate the production of counterfeits with a high degree of authenticity. The objective of this study is to leverage the capabilities of deep learning to identify and mitigate such manipulations effectively. This research examines the incorporation of multimodal approaches, considering the many characteristics of multimedia material that are widespread in our present digital environment. Given that social media platforms are the main channels for sharing information, the suggested detection systems utilising deep learning aim to ensure the reliability and accuracy of multimedia material. As a result, this will enhance the establishment of a digital ecosystem characterised by increased reliability and credibility. This thesis tackles this manipulation detection challenge by proposing four novel deep-learning architectures and a novel image manipulation dataset that aids in training such forgery detection models. The first two models, namely MRT-Net and Face-NeSt are dedicated to the problem of face manipulation detection. Facial manipulation is an extremely serious form of identity manipulation that can easily be used to mislead others and perform fraudulent activities. MRT Net is a dual-branch architecture that extracts manipulation residuals and textural features to detect forgery in facial images. An auto-adaptive mechanism lets it dynamically choose the best proportion of the two features. Face-NeSt extracts the discriminative information from multiple scales of features extracted from a baseline model. Specifically, it extracts multi-scale attentional features fused adaptively, representing the best proportion of discriminative features. MRT-Net and Face-NeSt are evaluated on three public benchmark datasets: the FaceForensics ++ (FF++), DeepFake Detection Challenge (DFDC) and the CelebDF datasets. Experimental results prove that the proposed models are superior to the existing state-of-the art methods. vi The next two models are dedicated to the problem of detecting splice manipulation in images. The first framework has a dual-branch structure with a spatial and compression branch. The spatial branch leverages transfer learning to extract discriminative spatial clues without adding any significant computational cost. The second branch highlights inconsistencies in the DCT coefficient histograms caused by the splice forgery. The second model is a splice localization framework. It contains a unique "visually attentive multi-domain feature extractor" (VA-MDFE) that extracts attentional features from the RGB, edge and depth domains. Next, a "visually attentive downsampler" (VA-DS) is responsible for fusing and downsampling the multi-domain features. Finally, a novel "visually attentive multi-receptive field upsampler" (VA-MRFU) module employs multiple receptive field-based convolutions to upsample attentional features by focussing on different information scales. Experimental results conducted on the public benchmark dataset CASIA v2.0 prove the potency of the proposed model. A novel splice manipulation dataset has also been created from Python code and Adobe Photoshop software since the existing splice detection datasets have very few samples and are not ideally suited to train deep-learning models. Lastly, the role of visual attention models is studied in the context of forgery detection. Specifically, five recently proposed visual attention mechanisms are integrated with a baseline convolutional neural network. The performance boost for each type of attention model is measured. Also, the increase in the computational cost for each type of attention is measured, and this tradeoff of performance vs complexity is presented.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/21215
Appears in Collections:	Ph.D. Information Technology

Files in This Item:

File	Description	Size	Format
Ankit Yadav Ph.D..pdf		7.24 MB	Adobe PDF	View/Open

Show full item record