Problem 1 [25%]
It is mentioned in Chapter 7 of ISL that a cubic regression spline with one knot at can be obtained using a basis of the form x, x2, x3, [x ]3+, where [x ]3+ = (x )3 if x > and equals 0 otherwise. We will now show that a function of the form
f(x) = 0 + 1x + 2x2 + 3x3 + 4[x ]+3
is indeed a cubic regression spline, regardless of the values of 0,1,2, 3,4.
- Find a cubic polynomial
f1(x) = a1 + b1x + c1x2 + d1x3
such that f(x) = f1(x) for all x . Express a1,b1,c1,d1 in terms of 0,1,2,3,4.
- Find a cubic polynomialf2(x) = a2 + b2x + c2x2 + d2x3
such that f(x) = f2(x) for all x > . Express a2,b2,c2,d2 in terms of 0,1,2,3,4. We have now established that f(x) is a piecewise polynomial.
- Show that f1() = f2(). That is, f(x) is continuous at .
Problem 2 [25%]
Use linear, cubic, and natural regression splines investigated Chapter 7 of ISL to the Auto data set. Is there evidence for non-linear relationships in this data set? Create some informative plots to justify your answer.
Problem 3 [25%]
You will now derive the Bayesian connection to the lasso as discussed in Section 6.2.2. of ISL.
- Suppose that yi = 0 + Ppj=1 xijj + i where 1,,n are independent and identically distributed from a normal distribution N(0,1). Write out the likelihood for the data as a function of values .
- Assume that the prior for : 1,,p is that they are independent and identically distributed according to a Laplace distribution with mean zero and variance c. Write out the posterior for in this setting using Bayes theorem.
- Argue that the lasso estimate is the value of with maximal probability under this posterior distribution. Compute log of the probability in order to make this point. Hint: The denominator (= the probability of data) can be ignored in computing the maximum probability.
- Suppose that 1,,n are independent and identically distributed according to the Laplace distribution.
What are the maximum likelihood/MAP estimates of i under this assumption? Hint: See https: //en.wikipedia.org/wiki/Least_absolute_deviations
1
Problem 4 [25%]
Based on a true story, according to: The Drunkards Walk: How Randomness Rules Our Lives, Leonard Mlodinow
Suppose that you applied for a life insurance and underwent a physical exam. The bad news is that your application was rejected because you tested positive for HIV. The tests sensitivity is 99.7% and specificity is 98.5% [https://en.wikipedia.org/wiki/Diagnosis_of_HIV/AIDS#Accuracy_of_HIV_testing]. However, after studying the CDC website, you find that in your ethnic group (age, gender, race, ) only one in 10,000 people is infected. What is the probability that you actually have HIV?
2
Reviews
There are no reviews yet.