We model an individuals income at age 30 against the number of years of formal education with a linear model. The following data is collected:
Years of formal education (x) Income ($k) (y)
8 8
12 15
14 16
16 20
16 25
20 40
Where possible, solve the following questions in two ways: using matrix calculations as detailed in the lectures, and using the lm command in R.
- Plot the data; is a linear model appropriate?
- Write down the linear model in matrix form.
- Find the normal equations for this model.
- Solve the normal equations to obtain the least squares estimates of the parameters. Add the fittedregression line to your plot (using curve for example).
- This model is a simple linear regression model. Use the standard linear regression formulae,
to estimate the parameters again (where the bar indicates the mean). Check that you have the same answers as above.
- Calculate the sample variance s2.
- Estimate the average income of a person who has had 18 years of formal education.
- Calculate the standardised residuals, leverage, and Cooks distance for the first observation. Youmay need the R functions rstandard, influence, and distance.
Check your numbers against the diagnostic plots produced by R.
- We know that the least squares estimator b is an unbiased estimator for . Show that tTb is an unbiased estimator for tT, where t is a vector of constants.
R exercises
Read Sections 5.15.3 of spuRs, then attempt the exercises below.
- The (Euclidean) length of a vector v = (a0,,ak) is the square root of the sum of squares of its coordinates, that is. Write a function that returns the length of a vector.
- Last week you wrote a program to calculate h(x,n), the sum of a finite geometric series. Turn this program into a function that takes two arguments, x and n, and returns h(x,n).
Make sure you deal with the case x = 1.
- In this question we simulate the rolling of a die. To do this we use the function runif(1), which returns a random number in the range (0,1). To get a random integer in the range {1,2,3,4,5,6}, we use ceiling(6*runif(1)), or if you prefer, sample(1:6,size=1) will do the same job.
1
- Suppose that you are playing the gambling game of the Chevalier de Mere. That is, you arebetting that you get at least one six in four throws of a die. Write a program that simulates one round of this game and prints out whether you win or lose.
Check that your program can produce a different result each time you run it.
- Turn the program that you wrote in part (a) into a function sixes, which returns TRUE if you obtain at least one six in n rolls of a fair die, and returns FALSE That is, the argument is the number of rolls n, and the value returned is TRUE if you get at least one six and FALSE otherwise.
How would you give n the default value of 4?
- Now write a program that uses your function sixes from part (b), to simulate N plays of the game (each time you bet that you get at least one six in n rolls of a fair die). Your program should then determine the proportion of times you win the bet. This proportion is an estimate of the probability of getting at least one six in n rolls of a fair die.
Run the program for n = 4 and N = 100, 1000, and 10000, conducting several runs for each N value. How does the variability of your results depend on N?
The probability of getting no 6s in n rolls of a fair die is (5/6)n, so the probability of getting at least one is 1 (5/6)n. Modify your program so that it calculates the theoretical probability as well as the simulation estimate and prints the difference between them. How does the accuracy of your results depend on N?
- In part (c), instead of processing the simulated runs as we go, suppose we first store the resultsof every game in a file, then later postprocess the results. You should read spuRs Chapter 4 to see how to read and write text files.
Write a program to write the result of all N runs to a textfile sixes_sim.txt, with the result of each run on a separate line. For example, the first few lines of the textfile could look like
TRUE
FALSE
FALSE TRUE FALSE
. .
Now write another program to read the textfile sixes_sim.txt and again determine the proportion of bets won.
This method of saving simulation results to a file is particularly important when each simulation takes a very long time (hours or days), in which case it is good to have a record of your results in case of a system crash.
Reviews
There are no reviews yet.