[Solved] CSC3022H ML5- Reinforcement Learning

$25

File Name: CSC3022H_ML5-_Reinforcement_Learning.zip
File Size: 339.12 KB

SKU: [Solved] CSC3022H ML5- Reinforcement Learning Category: Tag:
5/5 - (1 vote)

Implement (in C++) the Value Iteration algorithm (detailed in chapter 3 [Sutton and Barto, 1998] and chapter 13 [Mitchell, 1997]) in order to find the optimal value (V ) for each state in a small grid-world (figure 1). Use the following information:

  1. The agent has 4 actionsstates { s1 2 3 4 5{sleft, right, up, down6 }. Figure 1 shows the possible transitions}, and the grid-world 6

, s , s , s , s ,

between states (actions for given states).

  1. The state transition distributionis deterministic, so 0 for all states and actions.
  2. Rewards for all state transitions are zero ( = 0), except the following:
  3. State s3 is the terminal state.
  4. The discount factor is 0.8, e. = 0.8.

Figure 1: A small grid-world, where arrows show possible transitions between states. Note that state s3 is a terminal state.

Question 1: How many iterations does it take for the Value Iteration algorithm to converge? In an output text file list the optimal values (V for each state).

Question 2: Assume we start in state s1, give the states that form the optimal policy () to reach the terminal state (s3).

Question 3: Is it possible to change the reward function function so that V changes, but the optimal policy () remains unchanged?

If yes, describe how the reward function must be changed and the resulting change to V . Otherwise, briefly explain why this is impossible.

In a ZIP file, place the source code, makefile, and output text file (answers to questions 1, 2, 3). Upload the ZIP file to Vula before 10.00 AM, Monday, 23 September.

References

[Mitchell, 1997] Mitchell, T. (1997). Machine Learning: Chapter 13: Reinforcement Learning. McGraw Hill, New York, USA.

[Sutton and Barto, 1998] Sutton, R. and Barto, A. (1998). An Introduction to Reinforcement Learning. John Wiley and Sons, Cambridge, USA.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSC3022H ML5- Reinforcement Learning
$25