5/5 - (1 vote)

Implement (in C++) the Value Iteration algorithm (detailed in chapter 3 [Sutton and Barto, 1998] and chapter 13 [Mitchell, 1997]) in order to find the optimal value (V ) for each state in a small grid-world (figure 1). Use the following information:

The agent has 4 actionsstates { s_{1 2 3 4 5}{sleft, right, up, down₆}. Figure 1 shows the possible transitions}, and the grid-world 6

, s , s , s , s ,

between states (actions for given states).

The state transition distributionis deterministic, so 0 for all states and actions.
Rewards for all state transitions are zero ( = 0), except the following:
State s₃is the terminal state.
The discount factor is 0.8, e. = 0.8.

Figure 1: A small grid-world, where arrows show possible transitions between states. Note that state s₃is a terminal state.

Question 1: How many iterations does it take for the Value Iteration algorithm to converge? In an output text file list the optimal values (V for each state).

Question 2: Assume we start in state s₁, give the states that form the optimal policy () to reach the terminal state (s₃).

Question 3: Is it possible to change the reward function function so that V changes, but the optimal policy () remains unchanged?

If yes, describe how the reward function must be changed and the resulting change to V . Otherwise, briefly explain why this is impossible.

In a ZIP file, place the source code, makefile, and output text file (answers to questions 1, 2, 3). Upload the ZIP file to Vula before 10.00 AM, Monday, 23 September.

References

Name: [Solved] CSC3022H ML5- Reinforcement Learning
Brand: Assignment Chef
SKU: [Solved] CSC3022H ML5- Reinforcement Learning
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

[Mitchell, 1997] Mitchell, T. (1997). Machine Learning: Chapter 13: Reinforcement Learning. McGraw Hill, New York, USA.

[Sutton and Barto, 1998] Sutton, R. and Barto, A. (1998). An Introduction to Reinforcement Learning. John Wiley and Sons, Cambridge, USA.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSC3022H ML5- Reinforcement Learning

References

Reviews

Related products

[Solved] CSC3022H Assignment 5- Templating, specialization and the STL

[Solved] CSC3022H Assignment 4

[Solved] CSC3022H Lab 1- Version control, the build system and linking

[Solved] CSC3022H Assignment3-Huffman Tree Encoding

[Solved] CSC3022H Reinforcement Learning Assignment: Q-Learning

[Solved] CSC3022H ML1-Principal Component Analysis (PCA)