Generating Synthetic DataThis assignment shows how we can extend ordinary least squares regression, which uses the hypothesis class of linear regression functions, to non-linear regression functions modeled using polynomial basis functions and radial basis functions. The function we want to fit is ytrue = ftrue(x) = 6(sin(x + 2) + sin(2x + 4)). This is a univariate function as it has only one input variable. First, we generate synthetic input (data) xi by sampling n = 750 points from a uniform distribution on the interval [7.5, 7.5].
We can generate a synthetic data set, with Gaussian noise.In [ ]: import numpy as np # For all our math needsn = 750 # Number of data pointsX = np.random.uniform(-7.5, 7.5, n) # Training examples, in one dimension e = np.random.normal(0.0, 5.0, n) # Random Gaussian noise y = f_true(X) + e # True labels with noiseNow, we plot the raw data as well as the true function (without noise).
Recall that we want to build a model to generalize well on future data, and in order to generalize well on future data, we need to pick a model that trade-off well between fit and complexity (that is, bias and variance). We randomly split the overall data set (D) into three subsets:Training set: Dtrn consists of the actual training examples that will be used to train the model;Validation set: Dval consists of validation examples that will be used to tune model hyperparameters (such as > 0 in ridge regression) in order to find the best trade-off between fit and complexity (that is, the value of that produces the best model);
For this example, let us randomly partition the data into three non-intersecting sets: Dtrn = 60% of D, Dval = 10% of D and Dtst = 30% of D.In [ ]: # scikit-learn has many tools and utilities for model selection from sklearn.model_selection import train_test_splittst_frac = 0.3 # Fraction of examples to sample for the test set val_frac = 0.1 # Fraction of examples to sample for the validation set# First, we use train_test_split to partition (X, y) into training and test se tsX_trn, X_tst, y_trn, y_tst = train_test_split(X, y, test_size=tst_frac, random _state=42)# Next, we use train_test_split to further partition (X_trn, y_trn) into train ing and validation setsX_trn, X_val, y_trn, y_val = train_test_split(X_trn, y_trn, test_size=val_frac , random_state=42)# Plot the three subsets plt.figure()plt.scatter(X_trn, y_trn, 12, marker=o, color=orange) plt.scatter(X_val, y_val, 12, marker=o, color=green) plt.scatter(X_tst, y_tst, 12, marker=o, color=blue)1. **Regression with Polynomial Basis Functions**, 30 points.This problem extends ordinary least squares regression, which uses the hypothesis class of linear regression functions, to non-linear regression functions modeled using polynomial basis functions. In order to learn nonlinear models using linear regression, we have to explicitly transform the data into a higher-dimensional space. The nonlinear hypothesis class we will consider is the set of d-degree polynomials of the form f(x) = w0 + w1x + w2x2++wdxd or a linear combination of polynomial basis function: 1 x f(x) = [w0, w1, w2 ,wd]T x2 .xd The monomials {1, x, x2, , xd} are called basis functions, and each basis function xk has a corresponding weight wk associated with it, for all k = 1,,d. We transform each univariate data point xi into into a multivariate (d-dimensional) data point via (xi) [1, xi, x2i , , xdi ]. When this transformation is applied to every data point, it produces the Vandermonde matrix:1 x1 x21 xd1 1 x2 x22 xd2 = . 1 xn x2n xdn
a.Complete the Python function below that takes univariate data as input and computes a Vandermonde matrix of dimension d. This transforms one-dimensional data into d-dimensional data in terms of the polynomial basis and allows us to model regression using a d-degree polynomial.
bComplete the Python function below that takes a Vandermonde matrix and the labels y as input and learns weights via ordinary least squares regression. Specifically, given a Vandermonde matrix , implement the computation of w = (T )1T y. Remember that in Python, @ performs matrix multiplication, while * performs element-wise multiplication. Alternately, numpy.dot (https://docs.scipy.org/doc/numpy1.15.0/reference/generated/numpy.dot.html) also performs matrix multiplication.
c.Complete the Python function below that takes a Vandermonde matrix , corresponding labels y, and a linear regression model w as input and evaluates the model using mean squared error. That is,MSE wT i)2.
d.We can explore the effect of complexity by varying d = 3,6,9, ,24 to steadily increase the non-linearity of the models. For each model, we train using the transformed training data (, whose dimension increases) and evaluate its performance on the transformed validation data and estimate what our future accuracy will be using the test data.From plot of d vs. validation error below, which choice of d do you expect will generalize best?In [ ]: w = {} # Dictionary to store all the trained models validationErr = {} # Validation error of the models testErr = {} # Test error of all the modelsfor d in range(3, 25, 3): # Iterate over polynomial degreePhi_trn = polynomial_transform(X_trn, d) # Transform train ing data into d dimensionsw[d] = train_model(Phi_trn, y_trn) # Learn model on training data
Phi_val = polynomial_transform(X_val, d) # Transform valid ation data into d dimensionsvalidationErr[d] = evaluate_model(Phi_val, y_val, w[d]) # Evaluate model on validation data
Phi_tst = polynomial_transform(X_tst, d) # Transform test data i nto d dimensionstestErr[d] = evaluate_model(Phi_tst, y_tst, w[d]) # Evaluate model on test data# Plot all the models plt.figure()plt.plot(validationErr.keys(), validationErr.values(), marker=o, linewidth=3, markersize=12)plt.plot(testErr.keys(), testErr.values(), marker=s, linewidth=3, markersize=12)plt.xlabel(Polynomial degree, fontsize=16) plt.ylabel(Validation/Test error, fontsize=16) plt.xticks(list(validationErr.keys()), fontsize=12) plt.legend([Validation Error, Test Error], fontsize=16) plt.axis([2, 25, 15, 60])
Finally, lets visualize each learned model.
2. **Regression with Radial Basis Functions**, 70 pointsIn the previous case, we considered a nonlinear extension to linear regression using a linear combination of polynomial basis functions, where each basis function was introduced as a feature (x) = xk. Now, we consider Gaussian radial basis functions of the form:(x) = e (x)2,whose shape is defined by its center and its width > 0. In the case of polynomial basis regression, the users choice of the dimension d determined the transformation and the model. For radial basis regression, we have to contend with deciding how many radial basis functions we should have, and what their center and width parameters should be. For simplicity, lets assume that = 0.1 is fixed. Instead of trying to identify the number of radial basis functions or their centers, we can treat **each data point as the center of a radial basis function**, which means that the model will be: e (xx1)2 e (xx2)2 f(x) = [w1, w2, w3 ,wn]T e (xx2)2 . e (xxn)2 This transformation uses radial basis functions centered around data points e (xxi)2 and each basis function has a corresponding weight wi associated with it, for all i = 1,,n. We transform each univariate data point xj into into a multivariate (n-dimensional) data point via (xj) [,e (xjxi)2, . When this] transformation is applied to every data point, it produces the radial-basis kernel: 1 e (x1x2)2 e (x1x3)2 e (x1xn)2 e (x2x1)2 1 e (x2x3)2 e (x2xn)2 = . e (xnx1)2 e (xnx2)2 e (xnx3) 1
aComplete the Python function below that takes univariate data as input and computes a radial-basis kernel. This transforms one-dimensional data into n-dimensional data in terms of Gaussian radial-basis functions centered at each data point and allows us to model nonlinear (kernel) regression.
b.Complete the Python function below that takes a radial-basis kernel matrix , the labels y, and a regularization parameter > 0 as input and learns weights via ridge regression. Specifically, given a radial-basis kernel matrix , implement the computation of w = (T + In)1 T y.
c.As before, we can explore the tradeoff between fit and complexity by varying [103,102 ,1, 103]. For each model, train using the transformed training data () and evaluate its performance on the transformed validation and test data. Plot two curves: (i) vs. validation error and (ii) vs. test error, as above.What are some ideal values of ?In [ ]: #d.)Plot the learned models as well as the true model similar to the polynomial basis case above. How does the linearity of the model change with ?In [ ]: #You have to submit a single .py file that contains all the code.
Reviews
There are no reviews yet.