FINE TUNING LLMs FOR CONTEXT-AWARE DIALOGUE SUMMARIZATION AND HUMAN ALIGNED RESPONSES VIA RLHF

KUMAR, GAPESH

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21783

Full metadata record

DC Field	Value	Language
dc.contributor.author	KUMAR, GAPESH	-
dc.date.accessioned	2025-07-08T06:11:26Z	-
dc.date.available	2025-07-08T06:11:26Z	-
dc.date.issued	2025-05	-
dc.identifier.uri	http://dspace.dtu.ac.in:8080/jspui/handle/repository/21783	-
dc.description.abstract	Large language model (LLM) optimization on a task is based on tuning it, which saves costs in resources. Training models or instruction tuning on pairs of instructions and completions makes them follow human directions in the right manner. Complete tuning remains computationally expensive, however. There have been a number of recent parameter-efficient fine-tuning (PEFT) methods that assist in resolving this problem. It remains very hard to align model outputs with human preferences, however. In this current work, we tried instruction fine-tuning and PEFT techniques such as Low-Rank Adaptation (LoRA) to fine-tune pre-trained LLMs on a given task using struc tured training data and efficient tuning of a portion of model parameters. To ensure con textual appropriateness while improving response alignment with human expectation, we incorporated Reinforcement Learning from Human Feedback (RLHF) during fine-tuning. Our results indicate that while PEFT approaches significantly minimize computational and memory expense without any loss in performance, instruction adaptation actually enhances model task conformity. RLHF also prevents the model from providing out-of context responses thus ensuring that responses are uniform and human-aligned. Observations of this work show that highly specialized and resource-effective LLMs may be built by combining PEFT, instruction tuning, and RLHF. These methods offer a rational and scalable way to fine-tune, thus enhancing the usefulness and flexibility of LLMs for a wide range of other applications.	en_US
dc.language.iso	en	en_US
dc.relation.ispartofseries	TD-7993;	-
dc.subject	FINE TUNING LLMs	en_US
dc.subject	DIALOGUE SUMMARIZATION	en_US
dc.subject	HUMAN ALIGNED RESPONSES	en_US
dc.subject	RLHF	en_US
dc.subject	LLM	en_US
dc.title	FINE TUNING LLMs FOR CONTEXT-AWARE DIALOGUE SUMMARIZATION AND HUMAN ALIGNED RESPONSES VIA RLHF	en_US
dc.type	Thesis	en_US
Appears in Collections:	M.E./M.Tech. Computer Engineering

Files in This Item:

File	Description	Size	Format
GAPESH KUMAR M.Tech..pdf		515.93 kB	Adobe PDF	View/Open

Show simple item record