Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21763
Title: ADVANCING IMAGE CAPTIONING WITH TRANSFORMER BASED TECHNIQUES
Authors: RAWAT, AMAN
Keywords: ADVANCING IMAGE CAPTIONING
TRANSFORMER
VISION TRANSFORMER (ViTs)
BERT
ViBERT
Issue Date: May-2025
Series/Report no.: TD-8037;
Abstract: Image captioning relates to the automatic generation of natural language descriptions for visual content, and It has seen major progress through the acceptance of deep learning methods.. This thesis critically explores the transformation of image captioning methods, with a particular focus on the transformative impact of Vision Transformers (ViTs) . While common methods employing CNNs and RNNs had provided initial advancements their basis, they are generally poor at understanding global context and relationships within the entire image. Vision Transformers overcome this deficit by employing self attention and allowing thorough understanding of fine detail as much as overall context of the image.This study compares ViT-based models with traditional techniques across a variety of architectures and benchmark datasets, particularly MS COCO. The findings indicate that ViT-based approaches significantly outperform conventional models in gen erating semantically rich and contextually accurate captions. Additionally, this thesis introduces a novel image captioning framework ViBERT, which merges advantages of both Vision transformer and Bidirectional Encoder Representations from Transformers in an encoder-decoder architecture.Sometimes traditional models often fail in capturing the long range semantic dependencies and global visual setting, ViBERT effectively lever ages ViT’s visual attention and BERT’s deep contextual understanding to generate more strong and semantic correct description. The performance of the proposed model is cal culate using standard performance measures
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21763
Appears in Collections:MTech Data Science

Files in This Item:
File Description SizeFormat 
AMAN RAWAT M.Tech..pdf3.31 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.