ENHANCING MEDICAL QUESTION-ANSWERING THROUGH LOCAL FINE-TUNING OF SMALL LANGUAGE MODELS

PRAKASH, ARUNAV

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21795

Title:	ENHANCING MEDICAL QUESTION-ANSWERING THROUGH LOCAL FINE-TUNING OF SMALL LANGUAGE MODELS
Authors:	PRAKASH, ARUNAV
Keywords:	MEDICAL QUESTION-ANSWERING LOCAL FINE-TUNING SMALL LANGUAGE MODELS LLaMA
Issue Date:	May-2025
Series/Report no.:	TD-8006;
Abstract:	This thesis investigates how parameter-efficient fine-tuning techniques can bring powerful language models into everyday clinical environments without sacrificing performance or patient privacy. We begin by selecting a representative subset (16,412 pairs) of the MedQuAD medical question-answer dataset and adapt two LLaMA variants, a 3 billion-parameter model using LoRA adapters and an 8 billion-parameter model with 4-bit QLoRA quantization, entirely on a single RTX 4060 GPU with 8 GB VRAM. Training and inference both complete in under five hours, demonstrating that consumer-grade hardware can support domain-specific LLMs when only lightweight adapters are updated. To evaluate model outputs, we develop a multi-axis scoring framework, relevance, accuracy, conciseness, and completeness, automatically produced by a locally hosted LLaMA 3.1 8B judge via Ollama. This structured, human-aligned approach reveals clear differences: the 8 B QLoRA model consistently outperforms its 3 B counterpart across all four dimensions (mean score 7.16 vs. 6.96), while traditional overlap metrics like ROUGE fail to capture these gains. We show that ROUGE’s reliance on n-gram matching penalizes valid paraphrases and richer contextual detail, making it an unreliable proxy for clinical language quality. Our contributions include a reproducible, on-device pipeline for fine-tuning and evaluation, compelling evidence that aggressive quantization need not compromise model expressivity, and a practical blueprint for deploying privacy-preserving medical chatbots in resource-constrained settings. We conclude by outlining future directions, ensemble judging, expanded empathy and readability metrics, dynamic adapter libraries, and hybrid human-in-the-loop workflows, to further bridge the gap between scalable automation and clinical safety.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/21795
Appears in Collections:	M.E./M.Tech. Computer Engineering

Files in This Item:

File	Description	Size	Format
ARUNAV PRAKASH M.Tech..pdf		1.54 MB	Adobe PDF	View/Open

Show full item record