Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/22123
Title: IMAGE-TO-TEXT GENERATOR
Authors: RAJ, NISHANT
Keywords: IMAGE CAPTIONING
DEEP LEARNING
VISION TRANSFORMERS
MECHANISMS
ZERO-SHOT LEARNING
WORD2VEC
DIFFUSION MODELS
ATTENTION
GANS
CLIP
Issue Date: May-2025
Series/Report no.: TD-8115;
Abstract: Image-to-text generation is an emerging field at the intersection of computer vision and natural language processing. It enables machines to understand visual content and generate coherent, contextually relevant textual descriptions. This thesis provides a comprehensive comparative analysis of image captioning techniques, spanning from traditional CNN-LSTM architectures to state-of-the-art transformer-based and zero- shot learning models such as CLIP and diffusion frameworks. The study explores multiple methodologies, including attention mechanisms, generative adversarial networks (GANs), contrastive learning, Word2Vec embeddings, and diffusion-based models. We examine the strengths and limitations of each approach by assessing model performance on standard datasets like MS-COCO and Flickr30k using BLEU, METEOR, CIDEr, and ROUGE evaluation metrics. Through experimental evaluation, we highlight the trade-offs between model accuracy, generalization, semantic alignment, and computational cost. Our findings suggest that while CNN-LSTM-based models are effective for dataset-specific tasks, transformer-based and contrastive learning models demonstrate superior scalability and performance in zero-shot settings. The thesis concludes with a discussion of current challenges, including dataset biases, semantic misalignment, and the high computational requirements of advanced models. Recommendations for future work include the development of lightweight, domain- adaptive architectures with ethical considerations and human feedback integration.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/22123
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
Nishant Raj M.tech.pdf4.69 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.