Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/19204
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTIWARI, AVANISH-
dc.date.accessioned2022-06-30T07:31:06Z-
dc.date.available2022-06-30T07:31:06Z-
dc.date.issued2022-05-
dc.identifier.urihttp://dspace.dtu.ac.in:8080/jspui/handle/repository/19204-
dc.description.abstractRecently, a neural network based approach to automatic generation of image descriptions has become popular. Originally introduced as neural image captioning, it refers to a family of models where several neural network components are connected end-to-end to infer the most likely caption given an input image. Neural image captioning models usually comprise a Convolutional Neural Network (CNN) based image encoder and a Recurrent Neural Network (RNN) language model for generating image captions based on the output of the CNN. Generating long image captions – commonly referred to as paragraph captions – is more challenging than producing shorter, sentence-length captions. When generating paragraph captions, the model has more degrees of freedom, due to a larger total number of combinations of possible sentences that can be produced. In this thesis, we describe a combination of two approaches to improve paragraph captioning: using a hierarchical RNN model that adds a top level RNN to keep track of the sentence context, and using richer visual features obtained from dense captioning networks. In addition to the standard MS-COCO Captions dataset used for image captioning, we also utilize the Stanford-Paragraph dataset specifically designed for paragraph captioning. This thesis describes experiments performed on three variants of RNNs for generating paragraph captions. The flat model uses a non-hierarchical RNN, the hierarchical model implements a two level, hierarchical RNN, and the hierarchical-coherent model improves the hierarchical model by optimizing the coherence between sentences.en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTD-5770;-
dc.subjectIMAGE PARAGRAPHen_US
dc.subjectDEEP LEARNINGen_US
dc.subjectHIERARCHICAL MODELen_US
dc.subjectRNNen_US
dc.subjectCNNen_US
dc.titleIMAGE PARAGRAPH GENERATION USING DEEP LEARNINGen_US
dc.typeThesisen_US
Appears in Collections:M.E./M.Tech. Electronics & Communication Engineering

Files in This Item:
File Description SizeFormat 
Avanish Tiwari_M.Tech.pdf1.87 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.