ADVANCING VISUAL NARRATION THROUGH TRANSFORMERS

THAKUR, SHUBHAM

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/20742

Title:	ADVANCING VISUAL NARRATION THROUGH TRANSFORMERS
Authors:	THAKUR, SHUBHAM
Keywords:	ADVANCING VISUAL NARRATION TRANSFORMERS MOBILENET CNN-RNN
Issue Date:	May-2024
Series/Report no.:	TD-7255;
Abstract:	Image captioning has experienced significant advancements over the past decade, transitioning from traditional semantic approaches to sophisticated neural network models. This thesis explores the enhancement of image captions using a novel architecture combining MobileNet and Transformers. Traditional models, such as the Encoder-Decoder framework with CNN-RNN architectures, have laid the groundwork for image description generation. However, these models often face limitations in capturing complex image contexts and generating coherent, contextually rich descriptions. The proposed model leverages MobileNet for efficient feature extraction and Transformers for superior sequence generation, addressing the limitations of earlier methods. By incorporating attention mechanisms, the model enhances the understanding of intricate image details, resulting in more accurate and descriptive captions. Experimental results demonstrate a 12.5% improvement in captioning performance compared to standard methods, showcasing the potential of this approach. This work contributes to the ongoing innovation in image captioning, paving the way for more advanced techniques and applications in the field.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/20742
Appears in Collections:	M.E./M.Tech. Computer Engineering

Files in This Item:

File	Description	Size	Format
SHUBHAM THAKUR M.Tech.pdf		7.12 MB	Adobe PDF	View/Open

Show full item record