COMPARATIVE ANALYSIS OF MODEL-FREE REINFORCEMENT LEARNING ALGORITHMS IN DYNAMIC AND PARTIALLY OBSERVABLE GRID-WORLD ENVIRONMENTS

SHAILLY; SHARMA, BHAKTI

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21672

Title:	COMPARATIVE ANALYSIS OF MODEL-FREE REINFORCEMENT LEARNING ALGORITHMS IN DYNAMIC AND PARTIALLY OBSERVABLE GRID-WORLD ENVIRONMENTS
Authors:	SHAILLY SHARMA, BHAKTI
Keywords:	REINFORCEMENT LEARNING ALGORITHMS IN DYNAMIC GRID-WORLD ENVIRONMENTS Q LEARNING POLICY GRADIENT ACTOR CRITIC PARTIAL OBSERVABILITY
Issue Date:	May-2025
Series/Report no.:	TD-7877;
Abstract:	This dissertation presents a comparative evaluation of three primary categories of model-free reinforcement learning i.e. RL approaches — Q Learning, Policy Gradient, and Actor-Critic —within a specially designed Gridworld environment. This environment replicates complex, real-life decision-making challenges, incor porating two major elements: partial observability, where the agent has limited visibility of the environment, and dynamic goal positioning, where the goal lo cation changes during training. Each RL method is developed using a consis tent framework and tested over multiple independent runs to maintain statistical rigor. The assessment focuses on various performance indicators such as accumu lated rewards, convergence trends, training stability, and adaptability in response to environmental fluctuations. Q-Learning, though robust and easy to imple ment, exhibits delayed adaptation due to its static value update structure. Policy Gradient methods show better responsiveness to dynamic goals but suffer from policy update variance. Actor-Critic algorithms combine advantages from both approaches, yielding balanced performance with comparatively stable training be havior. The research also explores how tuning hyperparameters, incorporating ex ploration strategies, and applying reward shaping can influence learning outcomes. The findings are supported through visual data and performance graphs, offering insight into selecting the most suitable RL strategy for real-world applications like autonomous systems, robotic control, and adaptive resource management in un certain settings. Ultimately, this study outlines the trade-offs among model-free methods and suggests possible directions for future research involving hybrid or meta-learning frameworks.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/21672
Appears in Collections:	M Sc Applied Maths

Files in This Item:

File	Description	Size	Format
Shailly & Bhakti M.Sc..pdf		1.52 MB	Adobe PDF	View/Open

Show full item record