[Solved] Reinforcement Learning -Homework 1

$25

File Name: Reinforcement_Learning__Homework_1.zip
File Size: 320.28 KB

SKU: [Solved] Reinforcement Learning -Homework 1 Category: Tag:
5/5 - (1 vote)

Dynamic Programming

1 Question1.1Finding the shortest path to state 14 corresponds to a deterministic policy. The reward rs has to be negative to ensure the shortest path goal while not being too low to prevent the agent from visiting red terminal state 1. Precisely, rs must respect: 10 < rs < 0.Let us define rs = 1 . Since the transitions are deterministic, the optimal policy is indeed the shortest path to state 14. For such rewards, the value function of the optimal policy for each state is the following.5 -10 7 6 6 7 8 5 7 8 9 4 8 9 10 31.2In a general MDP, a policy induces a value function V 1 for a reward signal r1(s, a). Let us apply an affine transformation to the reward r1 of the form: r2(s, a) = r1(s, a) + with (, ) E R2. For the same policy , the new value function V 2 is thus the following:V2(s)=E[= E+ t=0+ t=0= Etr2(st,dt(ht))|s0 = s;]+ t=0+ )|s0 = s;]tr1(st,dt(ht))|s0 = s;] ++ t=0tTherefore, after an affine transformation of the reward signal, the new value function becomes:V2 (s) = V1(s) + 1 Let 1 be the optimal policy corresponding to reward r1 and *2 the optimal policy after the affine1

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] Reinforcement Learning -Homework 1[Solved] Reinforcement Learning -Homework 1
$25