Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/21763
Title: | ADVANCING IMAGE CAPTIONING WITH TRANSFORMER BASED TECHNIQUES |
Authors: | RAWAT, AMAN |
Keywords: | ADVANCING IMAGE CAPTIONING TRANSFORMER VISION TRANSFORMER (ViTs) BERT ViBERT |
Issue Date: | May-2025 |
Series/Report no.: | TD-8037; |
Abstract: | Image captioning relates to the automatic generation of natural language descriptions for visual content, and It has seen major progress through the acceptance of deep learning methods.. This thesis critically explores the transformation of image captioning methods, with a particular focus on the transformative impact of Vision Transformers (ViTs) . While common methods employing CNNs and RNNs had provided initial advancements their basis, they are generally poor at understanding global context and relationships within the entire image. Vision Transformers overcome this deficit by employing self attention and allowing thorough understanding of fine detail as much as overall context of the image.This study compares ViT-based models with traditional techniques across a variety of architectures and benchmark datasets, particularly MS COCO. The findings indicate that ViT-based approaches significantly outperform conventional models in gen erating semantically rich and contextually accurate captions. Additionally, this thesis introduces a novel image captioning framework ViBERT, which merges advantages of both Vision transformer and Bidirectional Encoder Representations from Transformers in an encoder-decoder architecture.Sometimes traditional models often fail in capturing the long range semantic dependencies and global visual setting, ViBERT effectively lever ages ViT’s visual attention and BERT’s deep contextual understanding to generate more strong and semantic correct description. The performance of the proposed model is cal culate using standard performance measures |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/21763 |
Appears in Collections: | MTech Data Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
AMAN RAWAT M.Tech..pdf | 3.31 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.