[SOLVED] CS489 Assignment 2

$25

File Name: CS489__Assignment_2.zip
File Size: 178.98 KB

SKU: [Solved] CS489 – Assignment 2 Category: Tag:
5/5 - (1 vote)

Reinforcement Learning

1 Introduction

The goal of this assignment is to do experiments with Monte-Carlo(MC) Learning and Temporal-Difference(TD) Learning. MC and TD methods learn directly from episodes of experience without knowledge of MDP model. TD method can learn after every step, while MC method requires a full episode to update value evaluation. Your goal is to implement MC and TD methods and test them in the small gridworld.

2 Small Gridworld

Figure 1: Gridworld

As shown in Fig.1, each grid in the gridwold represents a certain state. Let st denotes the state at grid t. Hence the state space can be denoted as S = {st|t E 0, .., 35}. S1 and S35 are terminal states, where the others are non-terminal states and can move one grid to north, east, south and west. Hence the action space is A = {n, e, s, w}. Note that actions leading out of the grid leave state unchanged. Each movement get a reward of -1 until the terminal state is reached.

3 Experiment Requirments

  • Programming language: python3
  • You should implement both first-visit and every-visit MC method and TD(0) to evaluate an uniform random policy (n|) = (e|) = (s|) = (w|) = 0.
Shopping Cart
[SOLVED] CS489 Assignment 2
$25