Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/22040
Title: NEXT WORD LIKELIHOOD USING LLMs
Authors: YADAV, MOHINI
Keywords: WORD EMBEDDING
ATTENTION
TOKENIZATION
N-GRAM
STEMMING
KEY STROKE MINIMIZATION
USER EXPERIENCE ENHANCEMENT
Issue Date: May-2025
Series/Report no.: TD-8114;
Abstract: This research work presents a comprehensive study of next word likelihood systems leveraging state-of-the-art natural language processing and machine learning techniques, including Chain Modelling, Recurrent Neural Networks, Long Short-Term Memory, Bidirectional LSTM, and Transformer-based models such as BERT, ALBERT, GPT, and GPT-Neo. The study incorporates a variety of preprocessing methods including tokenization, text stemming, n-gram generation, word embeddings, and vectorization to enhance model performance. These predictive systems are vital for improving communication efficiency, minimizing user input, and enhancing the user experience across multiple languages including English, Hindi, Bangla, Dzongkha, Urdu, and Japanese especially those with complex linguistic structures or low-resource availability. The research also emphasizes the integration of hybrid language models and self-attention mechanisms to address challenges such as morphological complexity, resource constraints, and cross-domain adaptability. Further, the research work explores strategies to improve model generalization, computational efficiency, and ethical considerations in real-world applications. The findings highlight the transformative potential of next-word prediction models in real- time operations, ranging from assistive technologies to multilingual text processing, and underline the growing importance of LLMs in bridging linguistic and accessibility gaps.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/22040
Appears in Collections:MTech Data Science

Files in This Item:
File Description SizeFormat 
Mohini Yadav m.tECH.pdf1.3 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.