[Solved] CMU10403 Homework2-Imitation_Learning

$25

File Name: CMU10403_Homework2-Imitation_Learning.zip
File Size: 348.54 KB

SKU: [Solved] CMU10403 Homework2-Imitation_Learning Category: Tag:
5/5 - (1 vote)

Problem 1: Behavior Cloning and DAGGER

In this problem, you will implement behavior cloning using supervised imitation learning from an expert policy. This problem and the GAIL one will be implemented in the following Colab notebook:

https://colab.research.google.com/drive/1dlur0hqm9hrGBflReCzNCwLBt5MnGtb5

This Colab notebook runs in the cloud, so you should not need to install anything on your laptop.

Background: Imitation Module

We have provided you with some function templates in the Colab notebook that you should implement. If you need to modify the function signatures, you may do so, but specify in your report what you changed and why.

Preliminaries

The following is the same as HW1s CMA-ES question, you can copy your solutions from there.

First, you will implement a TensorFlow/Keras model. You will use this model a number of times in the subsequent problems. To test that your model is implemented correctly, you will use it to solve a toy classification task.

Next, you will implement a function to collect data using a policy in an environment.

This function will be used in subsequent problems.

Behavior Cloning

Start by implementing the train() method of the Imitation class. To test your behavior cloning implementation, you will run the following cell titled Experiment: Student vs Expert.

  1. [15 pts] Run your behavior cloning implementation for 100 iterations. Plot the reward, training loss, and training accuracy throughout training.
  2. [10pts] This question studies how the amount of expert data eects the performance. You will run the same experiment as above, each time varying the number of expert episodes collected at each iteration. Use values of 1, 10, 50, and 100. As before, plot the reward, loss, and accuracy for each, remembering to label each line.

DAGGER

In the previous problem, you saw that when the cloned agent is in states far from normal expert demonstration states, it does a worse job of controlling the cart-pole than the expert. In this problem you will implement the DAGGER algorithm [2]. Implementing DAGGER is quite straightforward. First, implement the generate dagger data() method of the Imitation class. Second, in the cell titled Experiment: Student vs Expert, set mode = dagger.

  1. Run your DAGGER implementation for 100. Plot the reward, training loss, and training accuracy throughout training. We have included code for plotting in the following cell.
  2. This question studies how the amount of expert data eects performance. You will run the same experiment as above, each time varying the number of expert episodes collected at each iteration. Use values of 1, 10, 50, and 100. As before, plot the reward, loss, and accuracy for each, remembering to label each line.
  3. Does your DAGGER implementation outperform your behavior cloning implementation? Generate a hypothesis to explain this observation. What experiment could you run to test this hypothesis?

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CMU10403 Homework2-Imitation_Learning
$25