Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/21672
Title: | COMPARATIVE ANALYSIS OF MODEL-FREE REINFORCEMENT LEARNING ALGORITHMS IN DYNAMIC AND PARTIALLY OBSERVABLE GRID-WORLD ENVIRONMENTS |
Authors: | SHAILLY SHARMA, BHAKTI |
Keywords: | REINFORCEMENT LEARNING ALGORITHMS IN DYNAMIC GRID-WORLD ENVIRONMENTS Q LEARNING POLICY GRADIENT ACTOR CRITIC PARTIAL OBSERVABILITY |
Issue Date: | May-2025 |
Series/Report no.: | TD-7877; |
Abstract: | This dissertation presents a comparative evaluation of three primary categories of model-free reinforcement learning i.e. RL approaches — Q Learning, Policy Gradient, and Actor-Critic —within a specially designed Gridworld environment. This environment replicates complex, real-life decision-making challenges, incor porating two major elements: partial observability, where the agent has limited visibility of the environment, and dynamic goal positioning, where the goal lo cation changes during training. Each RL method is developed using a consis tent framework and tested over multiple independent runs to maintain statistical rigor. The assessment focuses on various performance indicators such as accumu lated rewards, convergence trends, training stability, and adaptability in response to environmental fluctuations. Q-Learning, though robust and easy to imple ment, exhibits delayed adaptation due to its static value update structure. Policy Gradient methods show better responsiveness to dynamic goals but suffer from policy update variance. Actor-Critic algorithms combine advantages from both approaches, yielding balanced performance with comparatively stable training be havior. The research also explores how tuning hyperparameters, incorporating ex ploration strategies, and applying reward shaping can influence learning outcomes. The findings are supported through visual data and performance graphs, offering insight into selecting the most suitable RL strategy for real-world applications like autonomous systems, robotic control, and adaptive resource management in un certain settings. Ultimately, this study outlines the trade-offs among model-free methods and suggests possible directions for future research involving hybrid or meta-learning frameworks. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/21672 |
Appears in Collections: | M Sc Applied Maths |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Shailly & Bhakti M.Sc..pdf | 1.52 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.