MATH 185 Take-Home Exam 2
Due Sunday, June 9th, by 11:59 PM
AGREEMENT
By taking this exam, you agree to not discuss the exam with anyone, starting now, neither with a classmate or anyone else, neither in person nor through other means, including electronic. Please do not post questions on Piazza. Unless otherwise speci- fied, it is acceptable to copy-paste from the lecture or homework solution code.
Problem 1. (Bootstrap tests for goodness-of-fit) We saw in lecture that when it comes to goodness-of-fit (GOF) testing, it is quite natural to obtain a p-value by permutation. It is also possible, however, to use the bootstrap for that purpose. Consider the two-sample situation for simplicity, although this generalizes to any number of samples. Thus assume a situation where we observe X1,,Xm iid from F and (independently) Y1,,Yn iid from G, where F and G are two distributions on the real line. We want to test F = G versus F = G. We may want to use a statistic T = T(X1,,Xm,Y1,,Yn) for that purpose, and the question is how to obtain a p-value for T via a bootstrap. The idea is, as usual, to estimate the best null distribution and bootstrap from that distribution. A natural approach to estimate the null distribution is to simply combine the two samples as one, and estimate the corresponding distribution via the empirical distribution. We thus use the empirical distribution from the combined sample to bootstrap from.
A. Write a function bootGOFdiff(x, y, B = 2000) that takes in two samples as vectors x and y, and a number of replicates B (Monte Carlo samples from the estimated null distribution), and returns the bootstrap GOF p-value for the difference in means T = |X Y |.
B. Apply your function to the FIFA dataset to compare the wages of players 29 years old with older players ( 30 years old).
Problem 2. (Local Absolute Linear Regression) Local linear regression is a popular smoother. However, based on the squared errors, it is not robust. To make it more robust, one option is to use the absolute errors instead.
A. Write a function localAbsLinearRegression(x, y, h, xnew = x) that takes in paired vectors x (predictor) and y (response), and a bandwidth h, and computes the local absolute linear regression (use any kernel of your liking). The function is evaluated at the vector xnew (equal to x by default).
B. Apply your function to the Boeing stock closing prices from 1/01/2018 to 6/01/2019 see the BA.csv file, which was downloaded from here (some dates are missing for some unknown reason). Plot the data and overlay the fitted curve for a few choices of bandwidth (identified in a legend).
C. Choose the bandwidth by 10-fold cross-validation.
Reviews
There are no reviews yet.