ATTENTION BASED IMAGE CAPTION GENERATION

GUPTA, AYUSH KUMAR

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/19830

Full metadata record

DC Field	Value	Language
dc.contributor.author	GUPTA, AYUSH KUMAR	-
dc.date.accessioned	2023-06-12T09:31:21Z	-
dc.date.available	2023-06-12T09:31:21Z	-
dc.date.issued	2023-05	-
dc.identifier.uri	http://dspace.dtu.ac.in:8080/jspui/handle/repository/19830	-
dc.description.abstract	The task of generating comprehensive and elaborate descriptions for images, commonly referred to as image captioning, presents a formidable challenge. This involves the amalgamation of computer vision and natural language processing techniques to establish a connection between visual data and textual comprehension. The fundamental goal of image captioning is to develop models and algorithms capable of comprehending the information conveyed by an image, thereby generating captions that effectively and coherently portray the visual content of the image in a manner akin to human-like interpretation. The concept of deep learning is introduced as a potential solution for image captioning, with a specific emphasis on the utilization of convolutional neural networks (CNNs) to extract salient visual features and recurrent neural networks (RNNs) to generate descriptive captions. This approach highlights the integration of CNNs and RNNs within the framework of deep learning, enabling the fusion of visual and textual understanding to facilitate the image captioning process. Image captioning is far more challenging than tasks like object identification and image categorization. Usually, two pipelines are used in the process: the first pipeline performs the computer vision task, while the second pipeline covers the natural language processing task. Deep learning approaches can manage the aforementioned pipelines and can create captions for images that are more robust. For visually impaired people, image captioning is immensely helpful. Image captioning makes things more accessible and entertaining for users, and it may be utilized to improve intelligent systems in a variety of ways. This research proposes a attention based image captioning method based upon the encoder decoder architecture. The proposed methodology firstly extract the image features. The image features is passed to the attention layer which applies attention to the different region of images. Later, the decoder layer receives the attention vector and context vector to produce the caption.	en_US
dc.language.iso	en	en_US
dc.relation.ispartofseries	TD-6385;	-
dc.subject	IMAGE CAPTION GENERATION	en_US
dc.subject	ATTENTION	en_US
dc.subject	CNN	en_US
dc.subject	RNN	en_US
dc.title	ATTENTION BASED IMAGE CAPTION GENERATION	en_US
dc.type	Thesis	en_US
Appears in Collections:	M.E./M.Tech. Computer Engineering

Files in This Item:

File	Description	Size	Format
AYUSH KUMAR GUPTA M.Tach..pdf		1.38 MB	Adobe PDF	View/Open

Show simple item record