[SOLVED] R algorithm Consider the two variables in the dataset Assign3.csv. We are interested in predicting the second variable Y given the rst variable X.

$25

File Name: R_algorithm_Consider_the_two_variables_in_the_dataset_Assign3.csv._We_are_interested_in_predicting_the_second_variable_Y_given_the_rst_variable_X..zip
File Size: 1375.32 KB

5/5 - (1 vote)

Consider the two variables in the dataset Assign3.csv. We are interested in predicting the second variable Y given the rst variable X.
1. We are interested in constructing a step function learner as follows:
First draw a random number U uniformly on the interval spanned by the minimum and maximum values of the inputs x1; :::; xn and then use it to
construct the following function whose purpose is to give the prediction of Y given Xx:
fx 1IU 6x 2IU x;
where 1 and 2 are just unknown constants to be learned. It goes without saying that Isome statement is the indicator function that equals 1 when the statement is true and 0 otherwise.
a. Use two dierent methods to compute the estimate fx1IU 6 x2I Ux. Is f a strong learner?
b. Use one of the previous two methods to write an R function that takes as input x and the data x1; :::; xn; y1; :::; yn and gives as output fx. Make sure the function is capable of dealing with the case where
x conatains more than one number.
c. Using three dierent runs of the previous function, create three dif ferent plots where, on each, f is shown together with the scatter plot of the data.
2. Write an R function that applies boosting to the previous step function learner. That R function should take as inputs: the data, B the number of boosting iterations,the learning rate and an optional argument indicating the size of the test subsample in case a validation set approach is needed.
As output the function should give: fboost the boosted learner evaluated at the training data and the training mean squared error evaluated for each iteration b1; :::; B of the boosting algorithm. Also, in case the size of the test subsample is greater than zero, the function should output: fboost evaluated at the test sample and the test MSE evaluated for each iteration b1; :::; B.
a. Use that function to plot fboost on top of the data scatter plot for 0.01 and for B10000. Show the same with dierent values of B.
b. Plot the training MSE vs. the number of iterations.
c. Was there overtting when B10000?
1

Note: Even though the algorithm is described in detail in both the slides and textbook, for the sake of making the implementation easier, its special case per taining to the questions in the assignment is presented here.
Boosting algorithm: 1. Inputs:
A sample of covariates i.e. inputs x1; :::; xn and responses i.e. out puts y1; :::; yn.
A weak learner f.
A learning rate 0. 2. Initialize:
Set fboostx 0.
Compute the rst learner f0x1IU 6 x2IUx on the

b. Set fboostx fboostxfbx.

c. Set ri rifbxi.
4. Output: fboostx.
original data.
Set ri yi f0xi for i1;:::;n.

3. Do the following for b1; :::; B:
a. Given x1; :::; xn as covariates and r1; :::; rn as responses, t a learner fb
by rst sampling U and then estimating fbx1I U 6 x2I Ux.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] R algorithm Consider the two variables in the dataset Assign3.csv. We are interested in predicting the second variable Y given the rst variable X.
$25