[Solved] CSE527 Homework6-estimating the 3D pose of a person given their 2D pose

$25

File Name: CSE527_Homework6-estimating_the_3D_pose_of_a_person_given_their_2D_pose.zip
File Size: 668.82 KB

SKU: [Solved] CSE527 Homework6-estimating the 3D pose of a person given their 2D pose Category: Tag:
5/5 - (1 vote)

In this homework we are going to work on estimating the 3D pose of a person given their 2D pose. Turns out, just regressing the 3D pose coordinates using the 2D pose works pretty well [1] (you can find the paper here (https://arxiv.org/pdf/1705.03098.pdf)). In Part One, we are going to work on reproducing the results of the paper, in Part Two, we are going to try to find a way to handle noisy measurement.

Some Tutorials (PyTorch)

You will be using PyTorch for deep learning toolbox (follow the link (http://pytorch.org) for installation).

For PyTorch beginners, please read this tutorial

(http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) before doing your homework.

Feel free to study more tutorials at http://pytorch.org/tutorials/ (http://pytorch.org/tutorials/).

Find cool visualization here at http://playground.tensorflow.org (http://playground.tensorflow.org).

Starter Code

In the starter code, you are provided with a function that loads data into minibatches for training and testing in PyTorch.

Benchmark

Train for a least 30 epochs to get a least 44mm avg error. The test result(mm error) should be in the following sequence direct. discuss. eat. greet. phone photo pose purch. sit sitd. somke wait walkd. walk walkT avg

Problem 1:

{60 points} Let us first start by trying to reproduce the testing accuracy obtained by in the paper

((https://arxiv.org/pdf/1705.03098.pdf) above using PyTorch. The 2D pose of a person is represented as a set of 2D coordinates for each of their n = 32 joints i.e Pi2D= {(x1i , yi1), . . . , (xi32 , yi32)}, where (xji, yij) are the 2D coordinates of the jth joint of the ith sample. Similarly, the 3D pose of a person is Pi3D = {

(x1i , yi1, zi1), . . . , (x32i , yi32, zi32)}, where (xji, yij, zij) are the 3D coordinates of the jth joint of the ith sample.

The only data given to you is the ground truth 3D pose and the 2D pose calculated using the camera parameters. You are going to train a network f : R2n R3n that takes as input the Pi2D and tries to regress the ground truth 3D pose Pi3D. The loss function to train this network would be the L2 loss between the ground truth and the predicted pose

M

L; for a minibatch of size M (2)

i=1

Download the Human3.6M Dataset here (https://www.dropbox.com/s/e35qv3n6zlkouki/h36m.zip).

Bonus: Every 1mm drop in test error from 44mm till 40mm gets you 2 extra points, and every 1mm drop below 40mm gets you 4 extra points.

Report the test result(mm error) in the following sequence direct. discuss. eat. greet. phone photo pose purch. sit sitd. somke wait walkd. walk walkT avg

Problem 2:

{40 points} In this task, were going to tackle the situation of having a faulty 3D sensor. Since the sensor is quite old its joint detections are quite noisy:

x^ = xGT + x y^ = yGT + y

z^ = zGT + z

Where, (xGT, yGT, zGT) are the ground truth joint locations, (x^, y^, z^) are the noisy measurements detected by our sensor and (x, y, z) are the noise values. Being grad students, wed much rather the department spend money for free coffee and doughnuts than on a new 3D sensor. Therefore, youre going to denoise the noisy data using a linear Kalman filter.

Modelling the state using velocity and acceleration: We assume a simple, if unrealistic model, of our system were only going to use the position, velocity and acceleration of the joints to denoise the data. The underlying equations representing our assumptions are:

x

xt+1 = xt + tt t + 0.5 2tx2t t2 (1)

y 2

yt+1 = yt + tt t + 0.5 ty2t t2 (2) zt+1 = zt + tt t + 0.5 2tz2t t2 (3) z

The only measurements/observations we have (i.e our observation space) are the noisy joint locations as recorded by the 3D sensors ot = (x^t, y^t, z^t). The corresponding state-space would be zt = (xt, yt, zt, .

Formallly, a linear Kalman filter assumes the underlying dynamics of the system to be a linear Gaussian model i.e.

z0 N(0, 0)

zt+1 = Azt

ot = Czt

where, A and C are the transition_matrix and observation_matrix respectively, that you are going to define based on equations (1), (2) and (3). The intitial estimates of other parameters can be assumed as:

initial_state_mean := 0 = mean(given data)

initial_state_covariance := 0 = Cov(given data)

transition_offset := b = 0

observation_offset := d = 0

transition_covariance := Q = I observation_covariance := R = I

The covariance matrices Q and R are hyperparameters that we initalize as identity matrices. In the code below, you must define A and C and use pykalman (https://pykalman.github.io/), a dedicated library for kalman filtering in python, to filter out the noise in the data.

(Hint: Gradients could be calculated using np.gradient or manually using finite differences You can assume the frame rate to be 50Hz)

For more detailed resources related to Kalman filtering, please refer to: http://web.mit.edu/kirtley/kirtley/binlustuff/literature/control/Kalman%20filter.pdf

(http://web.mit.edu/kirtley/kirtley/binlustuff/literature/control/Kalman%20filter.pdf) https://www.bzarg.com/p/howa-kalman-filter-works-in-pictures/ (https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/) https://stanford.edu/class/ee363/lectures/kf.pdf (https://stanford.edu/class/ee363/lectures/kf.pdf)

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSE527 Homework6-estimating the 3D pose of a person given their 2D pose
$25