OPTIMIZING TRANSFORMER MODELS  FOR ENGLISH2HINDI TRANSLATION: A  SUPERVISED FINE-TUNING ANALYSIS

CHHETRI, ANMOL

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/20717

Title:	OPTIMIZING TRANSFORMER MODELS FOR ENGLISH2HINDI TRANSLATION: A SUPERVISED FINE-TUNING ANALYSIS
Authors:	CHHETRI, ANMOL
Keywords:	NLP BLEU FINE-TUNING TRANSFORMER NEURAL MACHINE TRANSLATION
Issue Date:	May-2024
Series/Report no.:	TD-7218;
Abstract:	Machine Translation is an essential task in natural language processing. It breaks down language barriers, enabling effective communication and collaboration across diverse linguistic and cultural backgrounds. Sequence-to-Sequence models are used for solving various downstream tasks like machine translation, text summarization, Question Answering, Speech recognition, etc. However, machine translation has been a challenging task for researchers. This encouraged researchers to shift from SMT to NMT. NMT is a way of solving a translation task using neural networks like transformers. Due to its parallel computation capability, it was also used in other applications such as computer vision, audio processing, etc. But researchers stated many challenges with the model such as structural constraints between input and output text, computational complexity, and path length between long-range dependencies. Several other versions of transformers were introduced to address these issues. Modification of transformer architecture can be either in the positional encoding or in the attention mechanism. However, a systematic review with mathematical understanding is still not present. Transformer-based architectures have been built to perform human-like translation through rigorous training on large corpus data. Hence, Transformers are now considered a benchmark for translation tasks. There are also various pre-trained models that have shown their potential on selective languages, but limited models that solve English to Hindi translation due to the unavailability of large parallel corpus and also due to Hindi language exhibiting complex sentence structures compared to English. This paper fills this gap by fine-tuning four pre-trained models on the IITB English Hindi dataset, namely OPUS-MT, M2M100, mBART-50, and MADLAD-400. In this study, the aim is to compare the quality of translated text among these models through a metric called BLEU. It was observed that OPUS-MT and M2M100 produced high-quality Hindi translated text with BLEU of 89.11 and 86.83 respectively. These results were found better as compared to the 44.34 BLEU point of the SOTA model on the IITB dataset. At last, this paper also reviews and analyses two types of X-Formers mainly pre-training and training.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/20717
Appears in Collections:	M.E./M.Tech. Computer Engineering

Files in This Item:

File	Description	Size	Format
Anmol Chhetri M.Tech..pdf		1.87 MB	Adobe PDF	View/Open

Show full item record