Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/16596
Title: | IMAGE CAPTIONING USING YOLO'S OUTPUT AS INPUT TO ENCODER-DECODER LSTM |
Authors: | KUMAR, NAVNEET |
Keywords: | IMAGE CAPTIONING ENCODER-DECODER LSTM RECURRENT NEURAL NETWORK |
Issue Date: | May-2018 |
Series/Report no.: | TD-4462; |
Abstract: | The task of caption generation for image has recently received considerable attention. In this thesis we will see how we can make computers to look at an image and output a description for the same. This process has many potential applications in real life. A noteworthy one would be to save the captions of an image so that it can be retrieved easily at a later stage just on the basis of this description. With few modifications this system can also assist visually-impaired persons with their daily chores. The task of caption generation is straightforward – Given an input image our algorithm is expected to describe what is there in the image. By description we mean that the system will tell us about the objects present in the image, and the tasks that are being performed by the objects. Tasks like these are trivial for humans, but non-trivial for computers. Thanks to advancement in deep learning, computers are now reaching human level performance. This thesis work introduces a generic end-to-end trainable Convolutional Neural Network (CNN) -Recurrent Neural Network (RNN) Fusion-based technique to solve the problem of image captioning. In particular, we feed an image into a CNN and the output of CNN then gets fed to an Encoder-Decoder. The task of CNN is to output set of objects and their location. EncoderDecoder network takes the output of CNN as input and feed that into its Encoder. The Encoder uses a two-stream RNN to encode the information coming from CNN, the coded information then gets passed to decoder. Decoder uses a standard LSTM neural network to generate text. Standard MS-COCO captioning task dataset is used for this task. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/16596 |
Appears in Collections: | M.E./M.Tech. Electronics & Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
navneet_thesis_final.pdf | 4.85 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.