Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/20459
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | PULIYANI, MANAN | - |
dc.date.accessioned | 2024-01-18T05:49:45Z | - |
dc.date.available | 2024-01-18T05:49:45Z | - |
dc.date.issued | 2023-06 | - |
dc.identifier.uri | http://dspace.dtu.ac.in:8080/jspui/handle/repository/20459 | - |
dc.description.abstract | The project aims to develop a deep learning model for automatic image captioning using the Flickr 8k dataset, which contains 8000 images with five captions each. The dataset underwent preprocessing that included lowercasing, eliminating connections and special characters, and developing a vocabulary of original terms. To feed the deep neural network, the visual data was converted into a fixed-size vector. Each word in the output of the model was preprocessed and encoded into a fixed-size vector. The Convolutional Neural Network (CNN), the Recurrent Neural Network (RNN), and the Long Short-Term Memory (LSTM) were the three classification methods that were taken into consideration for the model. The decision to use the LSTM was made possible by its capacity to resolve dependencies in sequence prediction issues and do away with the issue of long-term dependency. The BLEU Score, a metric for contrasting a generated sentence to a reference sentence, was used to assess the model's performance. The dataset was gathered from the University of Illinois at Urbana-Champaign, and to better comprehend the data, a lexicon named descriptions was created. A training set of 6000 photos, a development set of 1000 photos, and a test set of 1000 photos were created from the dataset. The BLEU Score was used to assess the performance of the model after it had been trained on the training set and tested on the test set. The project's outcomes show that the model generated correct image descriptions with good performance. | en_US |
dc.language.iso | en | en_US |
dc.relation.ispartofseries | TD-6988; | - |
dc.subject | IMAGE CAPTIONING | en_US |
dc.subject | DEEP LEARNING | en_US |
dc.subject | LSTM | en_US |
dc.title | COMPARATIVE ANALYSIS OF IMAGE CAPTIONING USING DEEP LEARNING | en_US |
dc.type | Thesis | en_US |
Appears in Collections: | MTech Data Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Manan Puliyani M.Tech..pdf | 1.29 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.