In this assignment we will be working with the Boston Houses dataset. This dataset contains 506 entries. Each entry consists of a house price and 13 features for houses within the Boston area. We suggest working in python and using the scikit-learn package to load the data.
Starter code written in python is provided for each question.
1 Learning basics of regression in Python (3%)
This question will take you step-by-step through performing basic linear regression on the Boston Houses dataset. You need to submit modular code for this question. Your code needs to be functional once downloaded. Non-functional code will result in losing all marks for this question. If your code is non-modular or otherwise difficult to understand, you risk losing a significant portion of the marks, even if the code is functional.
Environment setup: For this question you are strongly encouraged to use the following python packages:
- sklearn
- matplotlib
- numpy
It is strongly recommended that you download and install Anaconda 3.4 to manage the above installations. This is a Data Science package that can be downloaded from https://www.anaconda. com/download/.
You will submit a complete regression analysis for the Boston Housing data. To do that, here are the necessary steps:
- Load the Boston housing data from the sklearn datasets module
- Describe and summarize the data in terms of number of data points, dimensions, target, etc
- Visualization: present a single grid containing plots for each feature against the target. Choose the appropriate axis for dependent vs. independent variables. Hint: use pyplot.tight layout function to make your grid readable
- Divide your data into training and test sets, where the training set consists of 80% of the data points (chosen at random). Hint: You may find numpy.random.choice useful
- Write code to perform linear regression to predict the targets using the training data. Remember to add a bias term to your model.
- Tabulate each feature along with its associated weight and present them in a table. Explain what the sign of the weight means in the third column (INDUS) of this table. Does the sign match what you expected? Why?
- Test the fitted model on your test set and calculate the Mean Square Error of the result.
- Suggest and calculate two more error measurement metrics; justify your choice.
- Feature Selection: Based on your results, what are the most significant features that best predict the price? Justify your answer.
2 Locally reweighted regression (6%)
- Given {(x(1),y(1)),..,(x(N),y(N))} and positive weights a(1),,a(N) show that the solution to the weighted least square problem
w = argmin (1)
is given by the formula
w = XTAXAy (2)
where X is the design matrix (defined in class) and A is a diagonal matrix where Aii = a(i)
- Locally reweighted least squares combines ideas from k-NN and linear regression. For each new test example x we compute distance-based weights for each training example a(i) =
, computes w = argmin and predicts
y = xTw. Complete the implementation of locally reweighted least squares by providing the missing parts for q2.py.
Important things to notice while implementing: First, do not invert any matrix, use a linear solver (numpy.linalg.solve is one example). Second, notice that but if we use B = maxj Aj it is much more numerically stable as overflows/underflows easily. This is handled automatically in the scipy package with the scipy.misc.logsumexp function.
- Use k-fold cross-validation to compute the average loss for different values of in the range [10,1000] when performing regression on the Boston Houses dataset. Plot these loss values for each choice of .
- How does this algorithm behave when ? When 0?
3 Mini-batch SGD Gradient Estimator (6%)
Consider a dataset D of size n consisting of (x,y) pairs. Consider also a model M with parameters to be optimized with respect to a loss function.
We will aim to optimize L using mini-batches drawn randomly from D of size m. The indices of these points are contained in the set I = {i1,,im}, where each index is distinct and drawn uniformly without replacement from {1,,n}. We define the loss function for a single mini-batch as,
(3)
- Given a set {a1,,an} and random mini-batches I of size m, show that
- Show that EI [LI(x,y,)] = L(x,y,)
- Write, in a sentence, the importance of this result.
- (a) Write down the gradient, L above, for a linear regression model with cost function `(x,y,) = (y wTx)2.
(b) Write code to compute this gradient.
- Using your code from the previous section, for m = 50 and K = 500 compute
, where Ik is the mini-batch sampled for the kth time.
Randomly initialize the weight parameters for your model from a N(0,I) distribution. Compare the value you have computed to the true gradient, L, using both the squared distance metric and cosine similarity. Which is a more meaningful measure in this case and why?
a b
[Note: Cosine similarity between two vectors a and b is given by cos() = .]
||a||2||b||2
- For a single parameter, wj, compare the sample variance, j, of the mini-batch gradient estimate for values of m in the range [0,400] (using K = 500 again). Plot log j against logm.
Reviews
There are no reviews yet.