# 0. Linear Algebra Review

Let *θ *:= (*θ*_{1}*,…,θ _{d}*) ∈ R

*be a vector, and*

^{d }*θ*

_{0 }∈ R be a scalar. Let the hyperplane H be the set of all points

*x*:= (

*x*

_{1}

*,…,x*)

_{d}^{∈ }R

*such that 0 =*

^{d }*θ*

^{>}

*x*+

*θ*

_{0}, where

*θ*^{>}*x *= *θ*_{1}*x*_{1 }+ ··· + *θ _{d}x_{d}*

is the dot product. The goal is to find the shortest distance between H and a point *y *∈ R* ^{d}*. There are many ways to solve this problem, but we will be using Lagrange multipliers to familiarize ourselves with this powerful method.

Let ˜*x *be the point on H that is closest to *y*. Then ˜*x *solves the optimization problem

minimize (*x *− *y*)^{>}(*x *− *y*) *x*∈R* ^{d }*subject to

*θ*

^{>}

*x*+

*θ*

_{0 }= 0

*.*

The Lagrangian for this optimization problem is

*L*(*x,λ*) = (*x *− *y*)^{>}(*x *− *y*) + *λ*(*θ*^{>}*x *+ *θ*_{0})

where *λ *is the Lagrange multiplier.

1.1. Write down the derivatives of *L*(*x,λ*) with respect to *x*_{1}*,…,x _{d }*and

*λ*.

1.2. Equate the derivatives to zero, and solve the equations to find ˜*x*.

1.3. Use ˜*x *to find the distance of *y *to the hyperplane H.

# 1. Probability Review

Let *X *and *Y *be independent Poisson random variables, i.e.

*, *for all *x,y *≥ 0*.*

for some rates *α,β > *0. Let the random variable *Z *= *X *+ *Y *be their sum.

2.1. Write P(*Z *= *z*) as a sum of products of P(*X *= *x*) and P(*Y *= *y*).

2.2. Show that *Z *is also Poisson, and find its rate *γ*.

40.319 STATISTICAL AND MACHINE LEARNING SPRING 2021 HOMEWORK 1 3 3. Linear Regression [20 Points]

We will use PyTorch to perform linear regression using gradient descent. Import the Boston housing data from the following link.

https://www.dropbox.com/s/kkeu8nvto35n0dt/boston.csv?dl=1

We will train a linear model that predicts the prices of houses MEDV using three inputs:

(i) average number of rooms per dwelling RM; (ii) index of accessibility to radial highways RAD; (iii) per capita crime rate by town CRIM.

You can access the selected inputs and target variables using the following code:

import matplotlib.pyplot as plt import numpy csv = ’boston.csv’

data = numpy.genfromtxt(csv,delimiter=’,’)

The data contains 506 observations on housing prices in suburban Boston. The first three columns are the inputs RM, RAD and CRIM. The last column is the target MEDV.

Convert the data to PyTorch tensors using the following code.

import torch inputs = data[:, [0,1,2]] inputs = inputs.astype(numpy.float32) inputs = torch.from_numpy(inputs) target = data[:,3] target = target.astype(numpy.float32) target = torch.from_numpy(target)

3.1. Write the code to generate (random) weights *w*_{RM}*,w*_{RAD}*,w*_{CRIM }and bias *b*. After that, write a function to compute the linear model.

3.2. Write a function that computes the mean squared error (MSE).

3.3. Complete the loop below to update the weights and bias using a fixed learning rate (try different values from 0.01 to 0.0001) over 200 iterations/epochs.

4 DUE 14 FEB. TOTAL 40 POINTS.

for i in range(200): print(“Epoch”, i, “:”)

# compute the model predictions # compute the loss and its gradient print(“Loss=”, loss) with torch.no_grad():

# update the weights # update the bias

w.grad.zero_()

b.grad.zero_()

(We use w.grad.zero () and b.grad.zero () to reset the gradients to zero because PyTorch accumulates gradients.)

3.4. Use the matplotlib library to plot the MSE against the number of iterations. Print the output to the PDF file that you are submitting on Gradescope.

For this problem, DO NOT use the in-built functions for the loss or the linear model in the torch library. Upload the final script as a file named [student-id].py using the Dropbox link at the start of this assignment.

## Reviews

There are no reviews yet.