In this project, you will be asked to implement two model-free algorithms. The first one is Monte-Carlo(MC), including the first visit of on-policy MC prediction and on-policy MC control for blackjack. The second one is Temporal-Difference(TD), including Sarsa(on-policy) and Q-Learning(off-policy) for cliffwalking.


TA will run your code twice. You will get full credits if one of the tests passes.
Hints
- On-policy first visit Monte-Carlo prediction

- On-policy first visit Monte-Carlo control

- Sarsa (on-policy TD control)

- Q-learing (off-policy TD control)


![[Solved] CS5225 Project 2-Monte Carlo Prediction and Control](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[Solved] CS5225 Project 3-Deep Q-Learning](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
Reviews
There are no reviews yet.