[Solved] Stat435 Homework3

$25

File Name: Stat435_Homework3.zip
File Size: 160.14 KB

SKU: [Solved] Stat435 Homework3 Category: Tag:
5/5 - (1 vote)

The homework comes with a test data set test-data.R in dump format. For the test data, n = 100, and x1,,xn are equi-spaced in [0,2]. The true conditional expectation is f(x) = sin(x), and the error sd is = 0.4.

1. Experiments with Turbo

Turbo is an expansion based smoother that fits 2nd order (linear) splines. It is described in the article Flexible Parsimonious Smoothing and Additive Modeling by J.H. Friedman and B.W. Silverman (Technometrics, Vol. 31, No. 1, 1989, pp 3 39.

  1. Define basis functions Bi(x) = (xxi)+,i = 1,,n1, and

Bn(x) = 1. Write a function truncated.power.design.matrix(x) that generates the n n design matrix for this set of basis functions.

  1. Install the package leaps and take a look at the documentation. Write a function

regsubsets.fitted.values <- function(X, regsubsets.out, nterm) that computes the fitted values for a model with nterm terms.

  1. For the test data produce a plot of residual sum of squares as a function of the number k of basis functions in the model.
  2. Plot the GCV score as a function of k. Surprised? Why? Explanation?
  3. F&S (pp 910) propose to fix this problem by charging 3 degrees of freedom for each of B1,,Bn1 entered into the model. Plot this modified GCV score as a function of the number of basis functions in the model. Surprised? Problems with the F&S definition of GCV? (If you have trouble figuring out where the constant term is included in the model, you may charge 3 degrees of freedom for each of B1,Bn.)
  4. Restricting yourself to suitable small values of k, find the forward and backward models with the smallest (modified) GCV scores and plot them.

1

2. Experiments with order 2 smoothing splines

Training data and basis functions as in (1) above. Define the n n matrix X by Xij = Bj(xi).

An order 2 smoothing spline is a function of the form

g(x) = XajBj(x),where
a = argmina ky Xak2 + aTai with h
= diag(0,1,,1,0).

The vector of predicted values for the training sample is y = Xa.

  • Show that

y = X(XTX + )1XTy = S y.

  • Read the data in the file test-data.R. Use the glmnet package to plot data and spline for = 0,1,10,106. Verify (graphically) that the spline for = 106 is very close to the least squares line.
  • Use the glmnet package to find the optimal value of by cross-vaildation. Print out opt and plot the corresponding spline.
  1. ISLR Section 6.8 Problem 1
  2. ISLR Section 6.8 Problem 4

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] Stat435 Homework3[Solved] Stat435 Homework3
$25