Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/22123
Title: | IMAGE-TO-TEXT GENERATOR |
Authors: | RAJ, NISHANT |
Keywords: | IMAGE CAPTIONING DEEP LEARNING VISION TRANSFORMERS MECHANISMS ZERO-SHOT LEARNING WORD2VEC DIFFUSION MODELS ATTENTION GANS CLIP |
Issue Date: | May-2025 |
Series/Report no.: | TD-8115; |
Abstract: | Image-to-text generation is an emerging field at the intersection of computer vision and natural language processing. It enables machines to understand visual content and generate coherent, contextually relevant textual descriptions. This thesis provides a comprehensive comparative analysis of image captioning techniques, spanning from traditional CNN-LSTM architectures to state-of-the-art transformer-based and zero- shot learning models such as CLIP and diffusion frameworks. The study explores multiple methodologies, including attention mechanisms, generative adversarial networks (GANs), contrastive learning, Word2Vec embeddings, and diffusion-based models. We examine the strengths and limitations of each approach by assessing model performance on standard datasets like MS-COCO and Flickr30k using BLEU, METEOR, CIDEr, and ROUGE evaluation metrics. Through experimental evaluation, we highlight the trade-offs between model accuracy, generalization, semantic alignment, and computational cost. Our findings suggest that while CNN-LSTM-based models are effective for dataset-specific tasks, transformer-based and contrastive learning models demonstrate superior scalability and performance in zero-shot settings. The thesis concludes with a discussion of current challenges, including dataset biases, semantic misalignment, and the high computational requirements of advanced models. Recommendations for future work include the development of lightweight, domain- adaptive architectures with ethical considerations and human feedback integration. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/22123 |
Appears in Collections: | M.E./M.Tech. Computer Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Nishant Raj M.tech.pdf | 4.69 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.