linear regression model
Write two R functions that will enable you to calculate optimal coefficient estimates fo and B, which minimise pRSS on the basis
of a given set of data y1, Y2,. .., Vni corresponding covariate observations x1, X2, . .., *ni and a parameter 1 0.
Your first function should be called lasso.pRSS (), and should calculate the penalised Residual Sum of Squares.
Copyright By Assignmentchef assignmentchef
The arguments to this function should be (in this order): beta, a vector of length 2 such that beta [1] is the value of Bo and
beta [2] is the value of Bj; y, the vector of observations; x, the vector of covariate values; and lambda, the penalty parameter 1
The value returned by the function should be a single non-negative number equal to p RSS (Bo, Bi, y, *, A) from the expression
Your second function should be called lasso. fit () and should calculate Po and 1, the coefficient values which minimise
PRSS, given a vector of observations y1, y2, . . ., Yni covariates x1, X2, . . ., X,iand 1 0. The arguments to this function should
be y, the vector of observations; , the vector of covariates; and lambda, the value of 1. The structure of this function should be
as follows:
1. Determine a starting value of beta from least squares regression by regressing y on x using the 1m () function and
extracting the coefficient estimates, (n. B, ).
2. Call n1m () to minimise pRSS as defined in your lasso. pRSS () function, starting from the initial value obtained in step 1.
3. Extract the minimising value of (Bo, B1), the required number of iterations and the convergence code from the result of
your nim () call.
Your lasso. fit () function should return a list containing elements beta. tilde (a vector of length 2, where the first
component is the estimated value of in and the second component is the estimated value of P, I, n.iter (the required number
of iterations of the numerical minimisation process) and code (the convergence code indicating why the numerical minmisation
procedure halted).
Question 10
Introduction
Consider using data (x1, V1), . . . , (Xn, Yn) to estimate the coefficients o and 1 in the linear regression model
Y, = Bo+ BIx; + ,,i = 1,2,, n.
Least squares regression determines coefficient estimates Po and P, which minimise the Residual Sum of Squares (RSS), given
RSS(Po,BI) = E (Vi PO BIX,)?.
Resulting coefficient estimates are unbiased estimators of the true coefficient values but may potentially have large variances,
particularly when the number of covariates is large in comparison to the sample size. The lasso is an alternative procedure which
is designed to overcome this problem. It determines coefficient estimates Po and B, which minimise a penalised Residual Sum
of Squares (pRSS). Lasso coefficient estimators are typically biased, but may have a smaller variance and hence a smaller mean
squared error than the least-squares estimator.
The penalised Residual Sum of Squares for coefficient values Bo and 1 ; observed data y = (y1, y2,. .., Yn); covariate values
x = (X1, X2,. .., *n); and penalty parameter 1 0 is given by
PRSS(Po, PI. y, X, 1) = Z-, (Vi BO BIX,)? + AIBIl.
The coefficient values which minimise pRSS (Po and 1 ) cannot be determined analytically for values of 1 > O as pRSS is not
differentiable as a function of By. Instead, Bo and P, must be determined by numerical minimisation methods.
CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.