Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/19204
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | TIWARI, AVANISH | - |
dc.date.accessioned | 2022-06-30T07:31:06Z | - |
dc.date.available | 2022-06-30T07:31:06Z | - |
dc.date.issued | 2022-05 | - |
dc.identifier.uri | http://dspace.dtu.ac.in:8080/jspui/handle/repository/19204 | - |
dc.description.abstract | Recently, a neural network based approach to automatic generation of image descriptions has become popular. Originally introduced as neural image captioning, it refers to a family of models where several neural network components are connected end-to-end to infer the most likely caption given an input image. Neural image captioning models usually comprise a Convolutional Neural Network (CNN) based image encoder and a Recurrent Neural Network (RNN) language model for generating image captions based on the output of the CNN. Generating long image captions – commonly referred to as paragraph captions – is more challenging than producing shorter, sentence-length captions. When generating paragraph captions, the model has more degrees of freedom, due to a larger total number of combinations of possible sentences that can be produced. In this thesis, we describe a combination of two approaches to improve paragraph captioning: using a hierarchical RNN model that adds a top level RNN to keep track of the sentence context, and using richer visual features obtained from dense captioning networks. In addition to the standard MS-COCO Captions dataset used for image captioning, we also utilize the Stanford-Paragraph dataset specifically designed for paragraph captioning. This thesis describes experiments performed on three variants of RNNs for generating paragraph captions. The flat model uses a non-hierarchical RNN, the hierarchical model implements a two level, hierarchical RNN, and the hierarchical-coherent model improves the hierarchical model by optimizing the coherence between sentences. | en_US |
dc.language.iso | en | en_US |
dc.relation.ispartofseries | TD-5770; | - |
dc.subject | IMAGE PARAGRAPH | en_US |
dc.subject | DEEP LEARNING | en_US |
dc.subject | HIERARCHICAL MODEL | en_US |
dc.subject | RNN | en_US |
dc.subject | CNN | en_US |
dc.title | IMAGE PARAGRAPH GENERATION USING DEEP LEARNING | en_US |
dc.type | Thesis | en_US |
Appears in Collections: | M.E./M.Tech. Electronics & Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Avanish Tiwari_M.Tech.pdf | 1.87 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.