IMAGE PARAGRAPH GENERATION  USING DEEP LEARNING

TIWARI, AVANISH

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/19204

Full metadata record

DC Field	Value	Language
dc.contributor.author	TIWARI, AVANISH	-
dc.date.accessioned	2022-06-30T07:31:06Z	-
dc.date.available	2022-06-30T07:31:06Z	-
dc.date.issued	2022-05	-
dc.identifier.uri	http://dspace.dtu.ac.in:8080/jspui/handle/repository/19204	-
dc.description.abstract	Recently, a neural network based approach to automatic generation of image descriptions has become popular. Originally introduced as neural image captioning, it refers to a family of models where several neural network components are connected end-to-end to infer the most likely caption given an input image. Neural image captioning models usually comprise a Convolutional Neural Network (CNN) based image encoder and a Recurrent Neural Network (RNN) language model for generating image captions based on the output of the CNN. Generating long image captions – commonly referred to as paragraph captions – is more challenging than producing shorter, sentence-length captions. When generating paragraph captions, the model has more degrees of freedom, due to a larger total number of combinations of possible sentences that can be produced. In this thesis, we describe a combination of two approaches to improve paragraph captioning: using a hierarchical RNN model that adds a top level RNN to keep track of the sentence context, and using richer visual features obtained from dense captioning networks. In addition to the standard MS-COCO Captions dataset used for image captioning, we also utilize the Stanford-Paragraph dataset specifically designed for paragraph captioning. This thesis describes experiments performed on three variants of RNNs for generating paragraph captions. The flat model uses a non-hierarchical RNN, the hierarchical model implements a two level, hierarchical RNN, and the hierarchical-coherent model improves the hierarchical model by optimizing the coherence between sentences.	en_US
dc.language.iso	en	en_US
dc.relation.ispartofseries	TD-5770;	-
dc.subject	IMAGE PARAGRAPH	en_US
dc.subject	DEEP LEARNING	en_US
dc.subject	HIERARCHICAL MODEL	en_US
dc.subject	RNN	en_US
dc.subject	CNN	en_US
dc.title	IMAGE PARAGRAPH GENERATION USING DEEP LEARNING	en_US
dc.type	Thesis	en_US
Appears in Collections:	M.E./M.Tech. Electronics & Communication Engineering

Files in This Item:

File	Description	Size	Format
Avanish Tiwari_M.Tech.pdf		1.87 MB	Adobe PDF	View/Open

Show simple item record