Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/19830
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | GUPTA, AYUSH KUMAR | - |
dc.date.accessioned | 2023-06-12T09:31:21Z | - |
dc.date.available | 2023-06-12T09:31:21Z | - |
dc.date.issued | 2023-05 | - |
dc.identifier.uri | http://dspace.dtu.ac.in:8080/jspui/handle/repository/19830 | - |
dc.description.abstract | The task of generating comprehensive and elaborate descriptions for images, commonly referred to as image captioning, presents a formidable challenge. This involves the amalgamation of computer vision and natural language processing techniques to establish a connection between visual data and textual comprehension. The fundamental goal of image captioning is to develop models and algorithms capable of comprehending the information conveyed by an image, thereby generating captions that effectively and coherently portray the visual content of the image in a manner akin to human-like interpretation. The concept of deep learning is introduced as a potential solution for image captioning, with a specific emphasis on the utilization of convolutional neural networks (CNNs) to extract salient visual features and recurrent neural networks (RNNs) to generate descriptive captions. This approach highlights the integration of CNNs and RNNs within the framework of deep learning, enabling the fusion of visual and textual understanding to facilitate the image captioning process. Image captioning is far more challenging than tasks like object identification and image categorization. Usually, two pipelines are used in the process: the first pipeline performs the computer vision task, while the second pipeline covers the natural language processing task. Deep learning approaches can manage the aforementioned pipelines and can create captions for images that are more robust. For visually impaired people, image captioning is immensely helpful. Image captioning makes things more accessible and entertaining for users, and it may be utilized to improve intelligent systems in a variety of ways. This research proposes a attention based image captioning method based upon the encoder decoder architecture. The proposed methodology firstly extract the image features. The image features is passed to the attention layer which applies attention to the different region of images. Later, the decoder layer receives the attention vector and context vector to produce the caption. | en_US |
dc.language.iso | en | en_US |
dc.relation.ispartofseries | TD-6385; | - |
dc.subject | IMAGE CAPTION GENERATION | en_US |
dc.subject | ATTENTION | en_US |
dc.subject | CNN | en_US |
dc.subject | RNN | en_US |
dc.title | ATTENTION BASED IMAGE CAPTION GENERATION | en_US |
dc.type | Thesis | en_US |
Appears in Collections: | M.E./M.Tech. Computer Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
AYUSH KUMAR GUPTA M.Tach..pdf | 1.38 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.