In Reinforcement Learning problems, Markov Decision Processes (MDPs) describe the change in environment which changes its state in response to agent’s actions. The state of the environment affects the immediate reward obtained by the agent, as well as the probabilities of future state transitions. The agent’s objective (also called MDP Planning) is to select actions to maximize a long-term measure (generally expectation) of total reward.
MDP planning
Each MDP is provided as a text file in the following format.
Generate MDP file by using the following command
python generateMDP.py --S 2 --A 2 --gamma 0.90 --mdptype episodic --rseed 0
Generate Optimal Value Function of the MDP (located at some path p1) by the algorithm a1 (vi/hpi/lp) using the following command
python planner.py --mdp p1 --algorithm a1