Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/20742
Title: | ADVANCING VISUAL NARRATION THROUGH TRANSFORMERS |
Authors: | THAKUR, SHUBHAM |
Keywords: | ADVANCING VISUAL NARRATION TRANSFORMERS MOBILENET CNN-RNN |
Issue Date: | May-2024 |
Series/Report no.: | TD-7255; |
Abstract: | Image captioning has experienced significant advancements over the past decade, transitioning from traditional semantic approaches to sophisticated neural network models. This thesis explores the enhancement of image captions using a novel architecture combining MobileNet and Transformers. Traditional models, such as the Encoder-Decoder framework with CNN-RNN architectures, have laid the groundwork for image description generation. However, these models often face limitations in capturing complex image contexts and generating coherent, contextually rich descriptions. The proposed model leverages MobileNet for efficient feature extraction and Transformers for superior sequence generation, addressing the limitations of earlier methods. By incorporating attention mechanisms, the model enhances the understanding of intricate image details, resulting in more accurate and descriptive captions. Experimental results demonstrate a 12.5% improvement in captioning performance compared to standard methods, showcasing the potential of this approach. This work contributes to the ongoing innovation in image captioning, paving the way for more advanced techniques and applications in the field. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/20742 |
Appears in Collections: | M.E./M.Tech. Computer Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
SHUBHAM THAKUR M.Tech.pdf | 7.12 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.