[Solved] GU4206-GR5206 Lab5-empirical behavior of a common hypothesis testing

$25

File Name: GU4206-GR5206_Lab5-empirical_behavior_of_a_common_hypothesis_testing.zip
File Size: 640.56 KB

SKU: [Solved] GU4206-GR5206 Lab5-empirical behavior of a common hypothesis testing Category: Tag:
5/5 - (1 vote)

Make sure that you upload the PDF (or HTML) output after you have knitted the file. The files you upload to the Canvas page should be updated with commands you provide to answer each of the questions below. You can edit this file directly to produce your final solutions.

Goal

The goal of this lab is to investigate the empirical behavior of a common hypothesis testing procedure through simulation using R. We consider the traditional two-sample t-test.

Two-Sample T-Test

Consider an experiment testing if a 35 year old males heart rate statistically differs between a control group and a dosage group. Let X denote the control group and let Y denote the drug group. One common method used to solve this problem is the two-sample t-test. The null hypothesis for this study is:

H0 : 1 2 = 0,

where 0 is the hypothesized value. The assumptions of the two-sample t-test follow below:

Assumptions

  1. X1,X2,,Xm is a random sample from a normal distribution with mean 1 and variance 12.
  2. Y1,Y2,,Yn is a random sample from a normal distribution with mean 2 and variance 22.
  3. The X and Y samples are independent of one another.

Procedure

The test statistic is

tcalc = xqs12y+ sn220,

m

where x, y are the respective sample means and s21,s22 are the respective sample standard deviations.

The approximate degrees of freedom is

sm21 + s222 df = n

m1 n1

Under the null hypothesis, tcalc (or Tcalc) has a students t-distribution with df degrees of freedom.

Rejection rules

HA : 1 2 > 0 (upper-tailed) P(tcalc > T)
HA : 1 2 < 0 (lower-tailed) P(tcalc < T)
HA : 1 2 = 6 0 (two-tailed) 2 P(|tcalc| > T)

Alternative Hypothesis P-value calculation

Reject H0 when:

Pvalue

Tasks

  • Using the R function test, run the two sample t-test on the following simulated dataset. Note that the t.test function defaults a two-tailed alternative. Also briefly interpret the output.
set.seed(5) sigma=5Control <- rnorm(30,mean=10,sd=sigma)Dosage <- rnorm(35,mean=12,sd=sigma)#t.test()
  • Write a function called test.sim that simulates R different samples of X for control and R different samples of Y for the drug group and computes the proportion of test statistics that fall in the rejection region. The function should include the following: Inputs:
    • R is the number of simulated data sets (simulated test statistics). Let R have default 10,000.
    • Parameters mu1, mu2, sigma1 and sigma2 which are the respective true means and true standard deviations of X & Y . Let the parameters have respective defaults mu1=0, mu2=0,

sigma1=1 and sigma2=1.

  • Sample sizes n and m defaulted at m=n=30.
  • level is the significance level as a decimal with default at = . value is the hypothesized value defaulted at 0.
  • The output should be a list with the following labeled elements:
  • list vector of simulated t-statistics (this should have length R).
  • list vector of empirical p-values (this should have length R).
  • rate is a single number that represents the proportion of simulated test statistics that fell in the rejection region.

I started the function below:

t.test.sim <- function(R=10000, mu1=0,mu2=0, sigma1=1,sigma2=1, m=30,n=30, level=.05, value=0, direction=Two) {#Define empty lists statistic.list <- rep(0,R) pvalue.list <- rep(0,R)#for (i in 1:R) {# Sample realized data#Control <-#Dosage <-# Testing values#testing.procedure <-#statistic.list[i] <-#pvalue.list[i] <-#}#rejection.rate <-#return()}

Evaluate your function with the following inputs R=10,mu1=10,mu2=12,sigma1=5 and sigma2=5. 3) Assuming the null hypothesis

H0 : 1 2 = 0

is true, compute the empirical size (or rejection rate) using 10,000 simulated data sets. Use the function

t.test.sim to accomplish this task and store the object as sim. Output the empirical size quantity sim$rejection.rate. Comment on this value. What is it close to?

Note: use mu1=mu2=10 (i.e., the null is true). Also set sigma1=5,sigma2=5 and n=m=30.

  • Plot a histogram of the simulated P-values, i.e., hist(sim$pvalue.list). What is the probability distribution shown from this histogram? Does this surprise you?
  • Plot a histogram illustrating the empirical sampling distribution of the t-statistic, i.e., hist(sim$statistic.list,probabilit =TRUE). What is the probability distribution shown from this histogram?
  • Run the following four lines of code:

t.test.sim(R=1000,mu1=10,mu2=10,sigma1=5,sigma2=5)$rejection.rate

t.test.sim(R=1000,mu1=10,mu2=12,sigma1=5,sigma2=5)$rejection.rate

t.test.sim(R=1000,mu1=10,mu2=14,sigma1=5,sigma2=5)$rejection.rate

t.test.sim(R=1000,mu1=10,mu2=16,sigma1=5,sigma2=5)$rejection.rate Comment on the results.

  • Run the following four lines of code:

t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=10,n=10)$rejection.rate

t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=30,n=30)$rejection.rate

t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=50,n=50)$rejection.rate

t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=100,n=100)$rejection.rate Comment on the results.

  • Extra credit: Modify the test.sim() function to investigate how the power and size behave in the presence of heavy tailed data, i.e., investigate how robust the t-test is in the presence of violations from normality.

Hint: The Cauchy distribution and the students t-distribution with low df are both heavy tailed.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] GU4206-GR5206 Lab5-empirical behavior of a common hypothesis testing
$25