Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/19830
Full metadata record
DC FieldValueLanguage
dc.contributor.authorGUPTA, AYUSH KUMAR-
dc.date.accessioned2023-06-12T09:31:21Z-
dc.date.available2023-06-12T09:31:21Z-
dc.date.issued2023-05-
dc.identifier.urihttp://dspace.dtu.ac.in:8080/jspui/handle/repository/19830-
dc.description.abstractThe task of generating comprehensive and elaborate descriptions for images, commonly referred to as image captioning, presents a formidable challenge. This involves the amalgamation of computer vision and natural language processing techniques to establish a connection between visual data and textual comprehension. The fundamental goal of image captioning is to develop models and algorithms capable of comprehending the information conveyed by an image, thereby generating captions that effectively and coherently portray the visual content of the image in a manner akin to human-like interpretation. The concept of deep learning is introduced as a potential solution for image captioning, with a specific emphasis on the utilization of convolutional neural networks (CNNs) to extract salient visual features and recurrent neural networks (RNNs) to generate descriptive captions. This approach highlights the integration of CNNs and RNNs within the framework of deep learning, enabling the fusion of visual and textual understanding to facilitate the image captioning process. Image captioning is far more challenging than tasks like object identification and image categorization. Usually, two pipelines are used in the process: the first pipeline performs the computer vision task, while the second pipeline covers the natural language processing task. Deep learning approaches can manage the aforementioned pipelines and can create captions for images that are more robust. For visually impaired people, image captioning is immensely helpful. Image captioning makes things more accessible and entertaining for users, and it may be utilized to improve intelligent systems in a variety of ways. This research proposes a attention based image captioning method based upon the encoder decoder architecture. The proposed methodology firstly extract the image features. The image features is passed to the attention layer which applies attention to the different region of images. Later, the decoder layer receives the attention vector and context vector to produce the caption.en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTD-6385;-
dc.subjectIMAGE CAPTION GENERATIONen_US
dc.subjectATTENTIONen_US
dc.subjectCNNen_US
dc.subjectRNNen_US
dc.titleATTENTION BASED IMAGE CAPTION GENERATIONen_US
dc.typeThesisen_US
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
AYUSH KUMAR GUPTA M.Tach..pdf1.38 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.