5/5 - (1 vote)

Short Answer and True or False Conceptual questions

Name: [Solved] CSE446 Homework #1 -Short Answer and True or False Conceptual questions
Brand: Assignment Chef
SKU: [Solved] CSE446 Homework #1 -Short Answer and “True or False” Conceptual questions
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

The answers to these questions should be answerable without referring to external materials. Please include brief explanation for T/F questions as well.
[2 points] In your own words, describe what bias and variance are. What is the bias-variance tradeoff?
[2 points] What happens to bias and variance when model complexity increases/decreases?
[1 points] True or False: The bias of a model increases as the amount of available training data increases.
[1 points] True or False: The variance of a model decreases as the amount of available training data increases.
[1 points] True or False: A learning algorithm will always generalize better if we use fewer features to represent our data.
[2 points] To obtain superior performance on new unseen data, should we use the training set or the test set to tune our hyperparameters?
[1 points] True or False: The training error of a function on the training set provides an overestimate of the true error of that function.
[1 points] True or False: Using L2 regularization when training a linear regression model encourages it to use less input features when making a prediction.

Maximum Likelihood Estimation (MLE)

Consider a model consisting of random variables X,Y and Z: Y = Xw+Z, where Z U[0.5,0.5]. Assume that Z is a noise here (i.e. the independence holds) and w R is a fixed parameter (i.e. it is not random).
1. [5 points] Derive the probability density function (pdf) of Y conditioning on {X = x}.
2. [5 points] Assume that you have n points of training data {(X₁,Y₁),(X₂,Y₂), ,(X_n,Y_n)} generated

i.i.d in the above setting. Derive a maximum likelihood estimator of w for the conditional distribution of Y conditioning on {X = x}. Assume that X_i> 0 for all i = 1,,n and note that MLE for this case may not be unique and you are required to report only one particular estimate.

Youre a Reign FC fan, and the team is five games into its 2018 season. The number of goals scored by theteam in each game so far are given below:

[2,0,1,1,2].

Lets call these scores x₁,,x₅. Based on your (assumed iid) data, youd like to build a model to understand how many goals the Reign are likely to score in their next game. You decide to model the number of goals scored per game using a Poisson distribution. The Poisson distribution with parameter assigns every non-negative integer x = 0,1,2, a probability given by

Poi(.

So, for example, if = 1.5, then the probability that the Reign score 2 goals in their next game is

0.25. To check your understanding of the Poisson, make sure you have a sense of whether raising will mean more goals in general, or fewer.

[5 points] Derive an expression for the maximum-likelihood estimate of the parameter governing the Poisson distribution, in terms of your goal counts x₁,,x₅. (Hint: remember that the log of the likelihood has the same maximum as the likelihood function itself.)
[5 points] Suppose the team scores 4 goals in its sixth game. Derive the same expression for the estimate of the parameter as in the prior example, now using the 6 games x₁,,x₅,x₆= 4.
[5 points] Given the goal counts, please give numerical estimates of after 5 and 6 games.

[10 points] In World War 2, the Allies attempted to estimate the total number of tanks the Germans had manufactured by looking at the serial numbers of the German tanks they had destroyed. The idea was that if there were n total tanks with serial numbers {1,,n} then it is reasonable to expect the observed serial numbers of the destroyed tanks constituted a uniform random sample (without replacement) from this set. The exact maximum likelihood estimator for this so-called German tank problem is non-trivial and quite challenging to work out (try it!). For our homework, we will consider a much easier problem with a similar flavor. Let x₁,,x_nbe independent, uniformly distributed on the continuous domain [0,] for some . What is the Maximum likelihood estimate for ?

Overfitting

Suppose we have N labeled samples drawn i.i.d. from an underlying distribution D.

Suppose we decide to break this set into a set S_trainof size N_trainand a set S_testof size N_testsamples for our training and test set, so N = N_train+N_test, and S = S_trainS_test. Recall the definition of the true least squares error of f:

where the subscript (x,y) D makes clear that our input-output pairs are sampled according to D. Our training and test losses are defined as:

1 X 2

train(f) = (f(x) y)

b Ntrain

(x,y)S_train

test

(x,y)S_test

We then train our algorithm (for example, using linear least squares regression) using the training set to obtain a. [6 points] (^{bias: the test error) Define}Etrain as the expectation over all training set S_train^andEtest as the expectation over all testing set S_test. For all fixed f (before weve seen any data) show that

Use a similar line of reasoning to show that the test error is an unbiased estimate of our true error for f. Specifically, show that:

[5 points] (bias: the train/dev error) Is the above equation true (in general) with regards to the training loss? Specifically, does )] equal )]? If so, why? If not, give a clear argument as to where your previous argument breaks down.
[8 points] Let F = (f₁,f₂,) be a collection of functions and let f_btrainminimize the training error such that _b) for all f F. Show that

train,test.

(Hint: note that

Etrain,testtrain,test

train train = f) fF fF

where the second equality follows from the independence between the train and test set.)

Polynomial Regression

Relevant Files^[1]

polyreg.py linreg closedform.py

test polyreg univariate.py test polyreg learningCurve.py data/polydata.dat

[15 points] Recall that polynomial regression learns a function h(x) = ₀+ ₁x + ₂x²+ + _dx^d. In this case, d represents the polynomials degree. We can equivalently write this in the form of a linear model

h(x) = ₀₀(x) + ₁₁(x) + ₂₂(x) + + _d_d(x) , (1)

using the basis expansion that _j(x) = x^j. Notice that, with this basis expansion, we obtain a linear model where the features are various powers of the single univariate x. Were still solving a linear regression problem, but are fitting a polynomial function of the input.

Implement regularized polynomial regression in polyreg.py. You may implement it however you like, using gradient descent or a closed-form solution. However, I would recommend the closed-form solution since the data sets are small; for this reason, weve included an example closed-form implementation of linear regression in linreg closedform.py (you are welcome to build upon this implementation, but make CERTAIN you understand it, since youll need to change several lines of it). You are also welcome to build upon your implementation from the previous assignment, but you must follow the API below. Note that all matrices are actually 2D numpy arrays in the implementation.

init (degree=1, regLambda=1E-8) : constructor with arguments of d and
fit(X,Y): method to train the polynomial regression model
predict(X): method to use the trained polynomial regression model for prediction
polyfeatures(X, degree): expands the given n1 matrix X into an nd matrix of polynomial features of degree d. Note that the returned matrix will not include the zero-th power.

Note that the polyfeatures(X, degree) function maps the original univariate data into its higher order powers. Specifically, X will be an n 1 matrix (X Rⁿ¹) and this function will return the polynomial expansion of this data, a n d matrix. Note that this function will not add in the zero-th order feature (i.e., x₀= 1). You should add the x₀feature separately, outside of this function, before training the model. By not including the x₀column in the matrix polyfeatures(), this allows the polyfeatures function to be more general, so it could be applied to multi-variate data as well. (If it did add the x₀feature, wed end up with multiple columns of 1s for multivariate data.)

Also, notice that the resulting features will be badly scaled if we use them in raw form. For example, with a polynomial of degree d = 8 and x = 20, the basis expansion yields x¹= 20 while x⁸= 2.5610¹⁰ an absolutely huge difference in range. Consequently, we will need to standardize the data before solving linear regression. Standardize the data in fit() after you perform the polynomial feature expansion. Youll need to ap-

ply the same standardization transformation in predict() before you apply it to new data.

Figure 1: Fit of polynomial regression with =0 and d = 8

Run test polyreg univariate.py to test your implementation, which will plot the learned function. In this case, the script fits a polynomial of degree d = 8 with no regularization = 0. From the plot, we see that the function fits the data well, but will not generalize well to new data points. Try increasing the amount of regularization, and examine the resulting effect on the function.

[15 points] In this problem we will examine the bias-variance tradeoff through learning curves. Learning curves provide a valuable mechanism for evaluating the bias-variance tradeoff. Implement the learningCurve() function in py to compute the learning curves for a given training/test set. The learningCurve(Xtrain, ytrain, Xtest, ytest, degree, regLambda) function should take in the training data (Xtrain, ytrain), the testing data (Xtest, ytest), and values for the polynomial degree d and regularization parameter .

The function should return two arrays, errorTrain (the array of training errors) and errorTest (the array of testing errors). The i^thindex (start from 0) of each array should return the training error (or testing error) for learning with i + 1 training instances. Note that the 0^thindex actually wont matter, since we typically start displaying the learning curves with two or more instances.

When computing the learning curves, you should learn on Xtrain[0:i] for i = 1,,numInstances(Xtrain)+1, each time computing the testing error over the entire test set. There is no need to shuffle the training data, or to average the error over multiple trials just produce the learning curves for the given training/testing sets with the instances in their given order. Recall that the error for regression problems is given by

. (2)

Notice the following:

The y-axis is using a log-scale and the ranges of the y-scale are all different for the plots. The dashed black line indicates the y = 1 line as a point of reference between the plots.
The plot of the unregularized model with d = 1 shows poor training error, indicating a high bias (i.e., it is a standard univariate linear regression fit).
The plot of the unregularized model ( = 0) with d = 8 shows that the training error is low, but that the testing error is high. There is a huge gap between the training and testing errors caused by the model overfitting the training data, indicating a high variance problem.
As the regularization parameter increases (e.g., = 1) with d = 8, we see that the gap between the training and testing error narrows, with both the training and testing errors converging to a low value. We can see that the model fits the data well and generalizes well, and therefore does not have either a high bias or a high variance problem. Effectively, it has a good tradeoff between bias and variance.
Once the regularization parameter is too high ( = 100), we see that the training and testing errors are once again high, indicating a poor fit. Effectively, there is too much regularization, resulting in high bias.

Please include both your code and the generated plots in your homework. Make absolutely certain that you understand these observations, and how they relate to the learning curve plots. In practice, we can choose the value for via cross-validation to achieve the best bias-variance tradeoff.

Ridge Regression on MNIST

In this problem we will implement a regularized least squares classifier for the MNIST data set. The task is to classify handwritten images of numbers between 0 to 9.

You are NOT allowed to use any of the prebuilt classifiers in sklearn. Feel free to use any method from numpy or scipy. Remember: if you are inverting a matrix in your code, you are probably doing something wrong (Hint: look at scipy.linalg.solve).

Get the data from https://pypi.python.org/pypi/python-mnist. Load the data as follows: from mnist import MNIST

def load_dataset():

mndata = MNIST(./data/)

X_train, labels_train = map(np.array, mndata.load_training())

X_test, labels_test = map(np.array, mndata.load_testing())

X_train = X_train/255.0

X_test = X_test/255.0

Each example has features x_i R^d(with d = 2828 = 784) and label z_j {0,,9}. You can visualize a single example x_iwith imshow after reshaping it to its original 28 28 image shape (and noting that the label z_jis accurate). We wish to learn a predictor f_bthat takes as input a vector in R^dand outputs an index in {0,,9}.

We define our training and testing classification error on a predictor f as

X _btrain1{f(x) 6= z}

(x,z)Training Set

1 X

btest( ) = Ntest 1{f(x) 6= z}

(x,z)Test Set

We will use one-hot encoding of the labels, i.e. of (x,z) the original label z {0,,9} is mapped to the standard basis vector e_zwhere e_zis a vector of all zeros except for a 1 in the zth position. We adopt the notation where we have n data points in our training objective with features x_i R^dand label one-hot encoded as y_i {0,1}^kwhere in this case k = 10 since there are 10 digits.

[10 points] In this problem we will choose a linear classifier to minimize the regularized least squares objective:

W_c= argmin

i=0

Note that kWk_Fcorresponds to the Frobenius norm of W, i.e. . To classify a point x_iwe will use the rule argmax_j_=0,,9e^T_jW_c^Tx_i. Note that if then

where and. Show that

W_c= (X^TX + I)¹X^TY

[10 points]
- Code up a function train that takes as input X Rⁿ^d, Y {0,1}ⁿ^k, > 0 and returns W_c.
- Code up a function predict that takes as input W R^d^k, X⁰ R^m^dand returns an m-length vector with the ith entry equal to argmax where is a column vector representing the ith example from X⁰.
- Train W_con the MNIST training data with = 10⁴and make label predictions on the test data. What is the training and testing error? Note that they should both be about 15%.

[1] Bold text indicates files or functions that you will need to complete; you should not need to modify any of the other files.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSE446 Homework #1 -Short Answer and True or False Conceptual questions

Short Answer and True or False Conceptual questions

Maximum Likelihood Estimation (MLE)

Overfitting

Polynomial Regression

Ridge Regression on MNIST

1 X

Reviews

Related products

[Solved] CSE446 Homework #2 -Conceptual Questions

[Solved] CSE446/598 Project 5

[Solved] CSE446 Homework #4 -Basics of SVD

[Solved] CSE446/598 Project 4

[Solved] CSE446/598 Project 2

[Solved] CSE446 Project 3 (Assignments 5 and 6, 50+50 Points)