HW06
Your Name, Your Uniqname
Due Wednesday October 22, 2019 at 10pm on Canvas
For question 2, you will need to load the file xy.csv. Set your working directory using Session -> Set Working Directory -> To Source File.
xy <- read.csv(“xy.csv”)Question 1 (4 pts)A Poisson process (N(t)) with rate parameter (lambda) is such that, (N(0) = 0) and for any (t > 0), [P(N(s + t) N(s) = n) = frac{e^{-lambda t} (lambda t)^n}{n!}, quad n ge 0, t,s > 0]
Part (a) (1 pt)
Show that the mean of (N(t)) (remembering that (t) is a fixed index element) is [E(N(t)) = lambda t] Recall that [e^a = sum_{i=0}^infty frac{a^i}{i!}]
Part (b) (2 pt)
A compound Poisson process is stochastic process ({X(t), t ge 0}) such that [X(t) = sum_{i = 1}^{N(t)} Y_i] where (Y_i) are independently distributed from some other distribution.
Rizzo section 3.7 provides a way to simulate from a regular Poisson process. Use this method to simulate from a compound Poission(4)-Gamma(shape = 2, rate = 4).
Estimate the mean of (X(t)) (by generating 1000 Poisson processes upto (t = 10)). Relate this mean to what you found in Part (a) and what you know about the mean of Gamma(2, 4).
Part (c) (1 pt)
For any type of (Y_i) (not just Gamma), (lambda) and (t), find
[E(X(t))]
Question 2 (3 pts)
Consider sampling (n) pairs ((Y_i, X_i)) from a very large population of size (N). We will assume that the population is so large that we can treat (n/N approx 0), so that all pairs in our sample are effectively independent.
ggplot(xy, aes(x = x, y = y)) + geom_point()
For the population, you want to relate (Y) and (X) as a linear function: [Y_i = beta_0 + beta_1 X_i + R_i] where [ begin{aligned} beta_1 &= frac{text{Cov}(X,Y)}{text{Var}(X)} \ beta_0 &= E(Y) beta_1 E(X) \ R_i &= Y_i beta_0 beta_1 X_i end{aligned} ]
The the line described by (beta_0) and (beta_1) is the population regression line. We dont get to observe (R_i) for our sample, but we can estimate (beta_0) and (beta_1) to get estimates of (R_i).
Part (a) (1 pt)
The lm function in R can estimate (beta_0) and (beta_1) using sample means and variances. Since these estimators are based on sample means, even we can use the central limit theorem to justify confidence intervals for (beta_0) and (beta_1).
Use the lm function to estimate (beta_0) and (beta_1). Apply the confint function to the results to get 95% confidence intervals for the (beta) parameters.
Part (b) (2 pts)
You can use the coef function to get just the estimators (hat beta_0) and (hat beta_1). Use the boot package to get basic and percentile confidence intervals for just (beta_1). You will need to write a custom function to give as the statistic argument to boot. Use at least 1000 bootstrap samples. You can use boot.ci for the confidence intervals.
Compare these intervals to part (a) and comment on the assumptions required for the bootstrap intervals.
Question 3 (4 pts)
Suppose that instead of sampling pairs, we first identified some important values of (x) that we wanted to investigate. Treating these values as fixed, we sampled a varying number of (Y_i) for each (x) value. For these data, well attempt to model the conditional distribution of (Y , | , x) as: [Y , | , x = beta_0 + beta_1 x + epsilon] where (epsilon) epsilon is assumed to be symmetric about zero (therefore, (E(epsilon) = 0)) and the variance of (epsilon) does not depend on (x) (a property called homoskedasticity). These assumptions are very similar to the population regression line model (as (E(R_i) = 0) by construction), but cover the case where we want to design the study on paricular values (a common case is a randomized trial where (x) values are assigned from a known procedure and (Y) is measured after).
Part (a) (2 pts)
Lets start with some stronger assumptions and then relax them in the subsequent parts of the question.
Suppose we think that (epsilon) follows a scaled (t)-distribution with 4 degrees of freedom (i.e., has fatter tails than the Normal distribution): [epsilon sim frac{sigma}{sqrt{2}} t(4) Rightarrow text{Var}(epsilon) = sigma^2] (The (sqrt{2}) is there just to scale the (t)-distribution to have a variance of 1. More generally, if we picked a differed degrees of freemdom parameter (v), this would be replaced with (sqrt{v/(v-2)}).)
One way to get an estimate of the distribution of (hat beta_1) is the following algorithm:
1.Estimate (beta_0), (beta_1), and (sigma^2) using linear regression
2.For all the (x_i) in the sample, generate (hat y_i = hat beta_0 + hat beta_1 x_i)
3.For (B) replications, generate (Y_i^* = hat y_i + epsilon_i*), where [epsilon^* sim frac{sqrt{hat sigma^2}}{sqrt{2}} t(4)]
4.For each replication, use linear regression to estimate (hat beta_1^*).
5.Use the (alpha/2) and (1 alpha/2) quantiles of the bootstrap distribution to get the confidence intervals: [[2 hat beta_1 hat beta_1^*(1 alpha/2), 2 hat beta_1 hat beta_1^*(alpha/2)], quad j = 0, 1] To avoid double subscripts Ive written (hat beta^*_1(1 alpha/2)) as the upper (1 alpha/2) quantile of the bootstrap (and likewise for the lower (alpha/2) quantile).
You may note that this is a basic basic bootstrap interval. In fact, this procedure (fitting parameters, then simulating from a model) is known as a parametric bootstrap.
Use the algorithm above to generate confidence intervals for the (beta) parameters. Compare them to the fully parametric intervals produced in Question 2(a).
Note: The boot function does have the option of performing a parametric bootstrap using a user supplied rand.gen function. Feel free to use this functionality, but you may find it easier to implement the algorithm directly.
Part (b) (2 pts)
As an alternative to sampling from an assumed distribuiton for (epsilon), we can replace step (3) in the previous algorithm with
3.Draw a sample (with replacement) from (hat epsilon_i) and make (Y_i^* = hat y_i + epsilon_i^*)
Implement this version of a parametic bootstrap. Feel free to use the boot package. Compare the results to Part (a) of this question.
Question 4 (1 pts)
Read the paper THE RISK OF CANCER ASSOCIATED WITH SPECIFIC MUTATIONS OF BRCA1 AND BRCA2 AMONG ASHKENAZI JEWS. Briefly summarize the paper. Make sure to discuss the research question, data source, methods, and results. How did the authors use the bootstrap procedure in this paper?
Reviews
There are no reviews yet.