Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21783
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKUMAR, GAPESH-
dc.date.accessioned2025-07-08T06:11:26Z-
dc.date.available2025-07-08T06:11:26Z-
dc.date.issued2025-05-
dc.identifier.urihttp://dspace.dtu.ac.in:8080/jspui/handle/repository/21783-
dc.description.abstractLarge language model (LLM) optimization on a task is based on tuning it, which saves costs in resources. Training models or instruction tuning on pairs of instructions and completions makes them follow human directions in the right manner. Complete tuning remains computationally expensive, however. There have been a number of recent parameter-efficient fine-tuning (PEFT) methods that assist in resolving this problem. It remains very hard to align model outputs with human preferences, however. In this current work, we tried instruction fine-tuning and PEFT techniques such as Low-Rank Adaptation (LoRA) to fine-tune pre-trained LLMs on a given task using struc tured training data and efficient tuning of a portion of model parameters. To ensure con textual appropriateness while improving response alignment with human expectation, we incorporated Reinforcement Learning from Human Feedback (RLHF) during fine-tuning. Our results indicate that while PEFT approaches significantly minimize computational and memory expense without any loss in performance, instruction adaptation actually enhances model task conformity. RLHF also prevents the model from providing out-of context responses thus ensuring that responses are uniform and human-aligned. Observations of this work show that highly specialized and resource-effective LLMs may be built by combining PEFT, instruction tuning, and RLHF. These methods offer a rational and scalable way to fine-tune, thus enhancing the usefulness and flexibility of LLMs for a wide range of other applications.en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTD-7993;-
dc.subjectFINE TUNING LLMsen_US
dc.subjectDIALOGUE SUMMARIZATIONen_US
dc.subjectHUMAN ALIGNED RESPONSESen_US
dc.subjectRLHFen_US
dc.subjectLLMen_US
dc.titleFINE TUNING LLMs FOR CONTEXT-AWARE DIALOGUE SUMMARIZATION AND HUMAN ALIGNED RESPONSES VIA RLHFen_US
dc.typeThesisen_US
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
GAPESH KUMAR M.Tech..pdf515.93 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.