ATTENTION DRIVEN NETWORKS FOR IDENTIFYING  DEEPFAKES

BARWA, ARPANA

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21794

Title:	ATTENTION DRIVEN NETWORKS FOR IDENTIFYING DEEPFAKES
Authors:	BARWA, ARPANA
Keywords:	ATTENTION DRIVEN NETWORKS IDENTIFYING DEEPFAKES MOBILE ViT
Issue Date:	May-2025
Series/Report no.:	TD-8005;
Abstract:	Detection of deepfakes is a crucial challenge in the context of maintaining the integrity of digital media. The ability to precisely differentiate between genuine and fake content is important for keeping intact the trust in information shared across multiple platforms. This thesis primarily aims at discovering the potential of vision transformers based models to correctly classify real and the modifies images. This study involves exploring the potential of three different variants of vision transformer namely DeiT-224, Mobile ViT and Tiny ViT ,their effectiveness in detecting real and fake images. Each of the model were trained and tested on a consistent dataset containing both real and altered images. The dataset was first preprocessed and later it was trained and then evaluation metrics were used to en sure fair comparison. The models were examined via standard metrics like accuracy, ROC AUC, and F1-score, along with qualitative observations of their predictions. Out of all the transformers Mobile Vit gave the most promising result indicating it is most preferable in scenarios where precision is of atmost concern. Deit- 224 despite its larger capacity, yields a slightly lower accuracy still very strong, but with diminishing returns given its higher computational cost.Tiny ViT, while the most light- weight and efficient in terms of speed and memory use, showed a slight decline in accuracy, reflecting a common trade-off between model size and performance. The results highlight the suitability of transformer-based architectures for identi fying image modifications, with a range of models available to match varying application needs. However, this study is limited to a single dataset, and further investigation is needed to evaluate how well these models perform on different types of manipulations and across varied data sources. Considerations such as reliability across demographic groups and resistance to adversarial alterations were outside the scope of this work. Future research could explore the use of combined model strategies, incorporate additional image features, or focus on optimizing models for real time deployment. The outcomes of this thesis provide a strong foundation for advancing reliable image classification systems in practical settings.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/21794
Appears in Collections:	M.E./M.Tech. Information Technology

Files in This Item:

File	Description	Size	Format
ARPANA BARWA M.Tech..pdf		969.81 kB	Adobe PDF	View/Open

Show full item record