[SOLVED] CS代考 Useful Formulas

30 $

File Name: CS代考_Useful_Formulas.zip
File Size: 226.08 KB

SKU: 5413271962 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Useful Formulas
MDPs and RL
• Q-learningupdate:Qk+1(s,a)=Qk(s,a)+α(Rt+1+maxa’ γQk(s’,a’)−Qk(s,a)).
• Sarsa update: Qk+1(s,a)=Qk(s,a)+α(Rt+1+γQ(s’ ,a’)−Qk(s,a)).

Copyright By PowCoder代写加微信 assignmentchef

1. What is an MDP? What are the elements that define an MDP?
2. What makes a transition system Markovian?
3. What does it mean that an RL method bootstraps? Provide an example of an RL algorithm that bootstraps and one that does not.
4. An agent has to find the coin in the MDP below, and pick it up. The actions available to the agent are move up, down, left, right, toggle switch and pick up. The action toggle switch turns on and off the light in the room, and succeeds only if executed in the square with the switch, while it does not do anything anywhere else. The action pick up picks up the coin if executed in the square with the coin and if the light is on, while does nothing anywhere else, or with the light off. How would you model this domain so that the representation is Markovian?
Note on notation: in the following MDPs, each state is labeled with an id. Each transition is labeled with the name of the corresponding action, the probability of landing in the next state, and the reward for that transition. If a state has no outgoing edges, it is an absorbing state.
5. Calculate the action-value function that Sarsa and Q-learning would compute on the following MDP, while acting with an ε-greedy policy with and ε = 0.1 and γ = 0.5.

6. Calculate the action-value function that Q-learning and Sarsa would compute on the following MDP, with γ = 0.5 and ε=0.1.

程序代写 CS代考加微信: assignmentchef QQ: 1823890830 Email: [email protected]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS代考 Useful Formulas
30 $