[SOLVED] CS285 Assignment 1 Imitation Learning

$25

File Name: CS285_Assignment_1_Imitation_Learning.zip
File Size: 348.54 KB

SKU: [Solved] CS285 Assignment 1-Imitation Learning Category: Tag:
5/5 - (1 vote)
cs285_hw1

The goal of this assignment is to experiment with imitation learning, including direct behavior cloning and the DAgger algorithm. In lieu of a human demonstrator, demonstrations will be provided via an expert policy that we have trained for you. Your goals will be to set up behavior cloning and DAgger, and compare their performance on a few different continuous control tasks from the OpenAI Gym benchmark suite. Turn in your report and code as described in Section 4.

The starter-code for this assignment can be found at https://github.com/berkeleydeeprlcourse/ homework_fall2019. Follow the instructions in the Readme file to setup the codebase.

Section 1. Behavioral Cloning

  1. The starter code provides an expert policy for each of the MuJoCo tasks in OpenAI Gym. Fill in the blanks in the code marked with Todo to implement behavioral cloning. A command for running behavioral cloning is given in the Readme file.

The following files have blanks in them and can be read in this order:

  • scripts/run hw1 behavior cloning.py
  • infrastructure/rl trainer.py
  • agents/bc agent.py
  • policies/MLP policy.py
  • infrastructure/replay buffer.py
  • infrastructure/utils.py
  • infrastructure/tf utils.py
  1. Run behavioral cloning (BC) and report results on two tasks: one task where a behavioral cloning agent achieves at least 30% of the performance of the expert, and one task where it does not. When providing results, report the mean and standard deviation of the return over multiple rollouts in a table, and state which task was used. Be sure to set up a fair comparison, in terms of network size, amount of data, and number of training iterations, and provide these details (and any others you feel are appropriate) in the table caption.

Tip: to speed up run times, the video logging can be disabled by setting video log freq -1

  1. Experiment with one set of hyperparameter that affects the performance of the behavioral cloning agent, such as the number of demonstrations, the number of training epochs, the variance of the expert policy, or something that you come up with yourself. For one of the tasks used in the previous question, show a graph of how the BC agents performance varies with the value of this hyperparameter, and state the hyperparameter and a brief rationale for why you chose it in the caption for the graph.

Section 2. DAgger

  1. Implement DAgger by filling out all the remaining blanks in the code marked with Todo. A command for running DAgger is provided in the Readme file.
  2. Run DAgger and report results on one task in which DAgger can learn a better policy than behavioral cloning. Report your results in the form of a learning curve, plotting the number of DAgger iterations vs. the policys mean return, with error bars to show the standard deviation. Include the performance of the expert policy and the behavioral cloning agent on the same plot. In the caption, state which task you used, and any details regarding network architecture, amount of data, etc. (as in the previous section).

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS285 Assignment 1 Imitation Learning
$25