ENHANCING MEDICAL QUESTION-ANSWERING THROUGH LOCAL FINE-TUNING OF SMALL LANGUAGE MODELS

PRAKASH, ARUNAV

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21795

Full metadata record

DC Field	Value	Language
dc.contributor.author	PRAKASH, ARUNAV	-
dc.date.accessioned	2025-07-08T08:41:26Z	-
dc.date.available	2025-07-08T08:41:26Z	-
dc.date.issued	2025-05	-
dc.identifier.uri	http://dspace.dtu.ac.in:8080/jspui/handle/repository/21795	-
dc.description.abstract	This thesis investigates how parameter-efficient fine-tuning techniques can bring powerful language models into everyday clinical environments without sacrificing performance or patient privacy. We begin by selecting a representative subset (16,412 pairs) of the MedQuAD medical question-answer dataset and adapt two LLaMA variants, a 3 billion-parameter model using LoRA adapters and an 8 billion-parameter model with 4-bit QLoRA quantization, entirely on a single RTX 4060 GPU with 8 GB VRAM. Training and inference both complete in under five hours, demonstrating that consumer-grade hardware can support domain-specific LLMs when only lightweight adapters are updated. To evaluate model outputs, we develop a multi-axis scoring framework, relevance, accuracy, conciseness, and completeness, automatically produced by a locally hosted LLaMA 3.1 8B judge via Ollama. This structured, human-aligned approach reveals clear differences: the 8 B QLoRA model consistently outperforms its 3 B counterpart across all four dimensions (mean score 7.16 vs. 6.96), while traditional overlap metrics like ROUGE fail to capture these gains. We show that ROUGE’s reliance on n-gram matching penalizes valid paraphrases and richer contextual detail, making it an unreliable proxy for clinical language quality. Our contributions include a reproducible, on-device pipeline for fine-tuning and evaluation, compelling evidence that aggressive quantization need not compromise model expressivity, and a practical blueprint for deploying privacy-preserving medical chatbots in resource-constrained settings. We conclude by outlining future directions, ensemble judging, expanded empathy and readability metrics, dynamic adapter libraries, and hybrid human-in-the-loop workflows, to further bridge the gap between scalable automation and clinical safety.	en_US
dc.language.iso	en	en_US
dc.relation.ispartofseries	TD-8006;	-
dc.subject	MEDICAL QUESTION-ANSWERING	en_US
dc.subject	LOCAL FINE-TUNING	en_US
dc.subject	SMALL LANGUAGE MODELS	en_US
dc.subject	LLaMA	en_US
dc.title	ENHANCING MEDICAL QUESTION-ANSWERING THROUGH LOCAL FINE-TUNING OF SMALL LANGUAGE MODELS	en_US
dc.type	Thesis	en_US
Appears in Collections:	M.E./M.Tech. Computer Engineering

Files in This Item:

File	Description	Size	Format
ARUNAV PRAKASH M.Tech..pdf		1.54 MB	Adobe PDF	View/Open

Show simple item record