[Solved] CSC411 Machine Learning & Data Mining ASSIGNMENT#1

$25

File Name: CSC411_Machine_Learning_&_Data_Mining_ASSIGNMENT#1.zip
File Size: 471 KB

SKU: [Solved] CSC411 Machine Learning & Data Mining ASSIGNMENT#1 Category: Tag:
5/5 - (1 vote)

In this assignment we will be working with the Boston Houses dataset. This dataset contains 506 entries. Each entry consists of a house price and 13 features for houses within the Boston area. We suggest working in python and using the scikit-learn package to load the data.

Starter code written in python is provided for each question.

1 Learning basics of regression in Python (3%)

This question will take you step-by-step through performing basic linear regression on the Boston Houses dataset. You need to submit modular code for this question. Your code needs to be functional once downloaded. Non-functional code will result in losing all marks for this question. If your code is non-modular or otherwise difficult to understand, you risk losing a significant portion of the marks, even if the code is functional.

Environment setup: For this question you are strongly encouraged to use the following python packages:

  • sklearn
  • matplotlib
  • numpy

It is strongly recommended that you download and install Anaconda 3.4 to manage the above installations. This is a Data Science package that can be downloaded from https://www.anaconda. com/download/.

You will submit a complete regression analysis for the Boston Housing data. To do that, here are the necessary steps:

  • Load the Boston housing data from the sklearn datasets module
  • Describe and summarize the data in terms of number of data points, dimensions, target, etc
  • Visualization: present a single grid containing plots for each feature against the target. Choose the appropriate axis for dependent vs. independent variables. Hint: use pyplot.tight layout function to make your grid readable
  • Divide your data into training and test sets, where the training set consists of 80% of the data points (chosen at random). Hint: You may find numpy.random.choice useful
  • Write code to perform linear regression to predict the targets using the training data. Remember to add a bias term to your model.
  • Tabulate each feature along with its associated weight and present them in a table. Explain what the sign of the weight means in the third column (INDUS) of this table. Does the sign match what you expected? Why?
  • Test the fitted model on your test set and calculate the Mean Square Error of the result.
  • Suggest and calculate two more error measurement metrics; justify your choice.
  • Feature Selection: Based on your results, what are the most significant features that best predict the price? Justify your answer.

2 Locally reweighted regression (6%)

  1. Given {(x(1),y(1)),..,(x(N),y(N))} and positive weights a(1),,a(N) show that the solution to the weighted least square problem

w = argmin (1)

is given by the formula

w = XTAXAy (2)

where X is the design matrix (defined in class) and A is a diagonal matrix where Aii = a(i)

  1. Locally reweighted least squares combines ideas from k-NN and linear regression. For each new test example x we compute distance-based weights for each training example a(i) =

, computes w = argmin and predicts

y = xTw. Complete the implementation of locally reweighted least squares by providing the missing parts for q2.py.

Important things to notice while implementing: First, do not invert any matrix, use a linear solver (numpy.linalg.solve is one example). Second, notice that but if we use B = maxj Aj it is much more numerically stable as overflows/underflows easily. This is handled automatically in the scipy package with the scipy.misc.logsumexp function.

  1. Use k-fold cross-validation to compute the average loss for different values of in the range [10,1000] when performing regression on the Boston Houses dataset. Plot these loss values for each choice of .
  2. How does this algorithm behave when ? When 0?

3 Mini-batch SGD Gradient Estimator (6%)

Consider a dataset D of size n consisting of (x,y) pairs. Consider also a model M with parameters to be optimized with respect to a loss function.

We will aim to optimize L using mini-batches drawn randomly from D of size m. The indices of these points are contained in the set I = {i1,,im}, where each index is distinct and drawn uniformly without replacement from {1,,n}. We define the loss function for a single mini-batch as,

(3)

  1. Given a set {a1,,an} and random mini-batches I of size m, show that
  2. Show that EI [LI(x,y,)] = L(x,y,)
  3. Write, in a sentence, the importance of this result.
  4. (a) Write down the gradient, L above, for a linear regression model with cost function `(x,y,) = (y wTx)2.

(b) Write code to compute this gradient.

  1. Using your code from the previous section, for m = 50 and K = 500 compute

, where Ik is the mini-batch sampled for the kth time.

Randomly initialize the weight parameters for your model from a N(0,I) distribution. Compare the value you have computed to the true gradient, L, using both the squared distance metric and cosine similarity. Which is a more meaningful measure in this case and why?

a b

[Note: Cosine similarity between two vectors a and b is given by cos() = .]

||a||2||b||2

  1. For a single parameter, wj, compare the sample variance, j, of the mini-batch gradient estimate for values of m in the range [0,400] (using K = 500 again). Plot log j against logm.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSC411 Machine Learning & Data Mining ASSIGNMENT#1
$25