Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/20742
Title: ADVANCING VISUAL NARRATION THROUGH TRANSFORMERS
Authors: THAKUR, SHUBHAM
Keywords: ADVANCING VISUAL NARRATION
TRANSFORMERS
MOBILENET
CNN-RNN
Issue Date: May-2024
Series/Report no.: TD-7255;
Abstract: Image captioning has experienced significant advancements over the past decade, transitioning from traditional semantic approaches to sophisticated neural network models. This thesis explores the enhancement of image captions using a novel architecture combining MobileNet and Transformers. Traditional models, such as the Encoder-Decoder framework with CNN-RNN architectures, have laid the groundwork for image description generation. However, these models often face limitations in capturing complex image contexts and generating coherent, contextually rich descriptions. The proposed model leverages MobileNet for efficient feature extraction and Transformers for superior sequence generation, addressing the limitations of earlier methods. By incorporating attention mechanisms, the model enhances the understanding of intricate image details, resulting in more accurate and descriptive captions. Experimental results demonstrate a 12.5% improvement in captioning performance compared to standard methods, showcasing the potential of this approach. This work contributes to the ongoing innovation in image captioning, paving the way for more advanced techniques and applications in the field.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/20742
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
SHUBHAM THAKUR M.Tech.pdf7.12 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.