5/5 - (1 vote)

Make sure that you upload the PDF (or HTML) output after you have knitted the file. The files you upload to the Canvas page should be updated with commands you provide to answer each of the questions below. You can edit this file directly to produce your final solutions.

Goal

The goal of this lab is to investigate the empirical behavior of a common hypothesis testing procedure through simulation using R. We consider the traditional two-sample t-test.

Two-Sample T-Test

Consider an experiment testing if a 35 year old males heart rate statistically differs between a control group and a dosage group. Let _Xdenote the control group and let _Ydenote the drug group. One common method used to solve this problem is the two-sample t-test. The null hypothesis for this study is:

H₀: ₁ ₂= ₀,

where ₀is the hypothesized value. The assumptions of the two-sample t-test follow below:

Assumptions

_X₁_,X₂_,,Xmis a random sample from a normal distribution with mean ₁and variance ₁²_.
_Y₁_,Y₂_,,Ynis a random sample from a normal distribution with mean ₂and variance ₂²_.
The _Xand _Ysamples are independent of one another.

Procedure

The test statistic is

tcalc = xqs12y+ sn220,

where _x, _y are the respective sample means and _s²₁,s²₂are the respective sample standard deviations.

The approximate degrees of freedom is

s_m21 + s222 df = _n

m1 n1

Under the null hypothesis, _tcalc(or _Tcalc) has a students t-distribution with _dfdegrees of freedom.

Rejection rules


H_A: ₁ ₂> ₀(upper-tailed)	P(t_calc> T)
H_A: ₁ ₂< ₀(lower-tailed)	P(t_calc< T)
H_A: ₁ ₂= 6 ₀(two-tailed)	2 P(\|t_calc\| > T)

Alternative Hypothesis P-value calculation

Reject _H₀when:

Pvalue

Tasks

Using the _Rfunction _test, run the two sample t-test on the following simulated dataset. Note that the _t.testfunction defaults a two-tailed alternative. Also briefly interpret the output.

set.seed(5) sigma=5Control <- rnorm(30,mean=10,sd=sigma)Dosage <- rnorm(35,mean=12,sd=sigma)#t.test()

Write a function called _test.simthat simulates _Rdifferent samples of _Xfor control and _Rdifferent samples of _Yfor the drug group and computes the proportion of test statistics that fall in the rejection region. The function should include the following: Inputs:
- _Ris the number of simulated data sets (simulated test statistics). Let _Rhave default 10,000.
- Parameters _mu1, _mu2, _sigma1and _sigma2which are the respective true means and true standard deviations of _X& _Y. Let the parameters have respective defaults _mu1=0, _mu2=0,

sigma1=1 and sigma2=1.

Sample sizes n and m defaulted at m=n=30.
_levelis the significance level as a decimal with default at = _. _valueis the hypothesized value defaulted at 0.
The output should be a _listwith the following labeled elements:
_listvector of simulated t-statistics (this should have length _R).
_listvector of empirical p-values (this should have length _R).
_rateis a single number that represents the proportion of simulated test statistics that fell in the rejection region.

I started the function below:

t.test.sim <- function(R=10000, mu1=0,mu2=0, sigma1=1,sigma2=1, m=30,n=30, level=.05, value=0, direction=Two) {#Define empty lists statistic.list <- rep(0,R) pvalue.list <- rep(0,R)#for (i in 1:R) {# Sample realized data#Control <-#Dosage <-# Testing values#testing.procedure <-#statistic.list[i] <-#pvalue.list[i] <-#}#rejection.rate <-#return()}

Evaluate your function with the following inputs R=10,mu1=10,mu2=12,sigma1=5 and sigma2=5. 3) Assuming the null hypothesis

H₀: ₁ ₂= 0

is true, compute the empirical size (or rejection rate) using 10,000 simulated data sets. Use the function

t.test.sim to accomplish this task and store the object as _sim. Output the empirical size quantity sim$rejection.rate. Comment on this value. What is it close to?

Note: use mu1=mu2=10 (i.e., the null is true). Also set sigma1=5,sigma2=5 and n=m=30.

Plot a histogram of the simulated P-values, i.e., _{hist(sim$pvalue.list)}. What is the probability distribution shown from this histogram? Does this surprise you?
Plot a histogram illustrating the empirical sampling distribution of the t-statistic, i.e., _{hist(sim$statistic.list,probabilit}=_TRUE). What is the probability distribution shown from this histogram?
Run the following four lines of code:

t.test.sim(R=1000,mu1=10,mu2=10,sigma1=5,sigma2=5)$rejection.rate

t.test.sim(R=1000,mu1=10,mu2=12,sigma1=5,sigma2=5)$rejection.rate

t.test.sim(R=1000,mu1=10,mu2=14,sigma1=5,sigma2=5)$rejection.rate

t.test.sim(R=1000,mu1=10,mu2=16,sigma1=5,sigma2=5)$rejection.rate Comment on the results.

Run the following four lines of code:

t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=10,n=10)$rejection.rate

t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=30,n=30)$rejection.rate

t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=50,n=50)$rejection.rate

t.test.sim(R=10000,mu1=10,mu2=12,sigma1=10,sigma2=10,m=100,n=100)$rejection.rate Comment on the results.

_{Extra credit:}Modify the _test.sim()function to investigate how the power and size behave in the presence of heavy tailed data, i.e., investigate how _robustthe t-test is in the presence of violations from normality.

Hint: The Cauchy distribution and the students t-distribution with low df are both heavy tailed.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] GU4206-GR5206 Lab5-empirical behavior of a common hypothesis testing

Goal

Two-Sample T-Test

Assumptions

Procedure

Rejection rules

Tasks

Reviews

Whatsapp Us

[Solved] GU4206-GR5206 Lab5-empirical behavior of a common hypothesis testing

Goal

Two-Sample T-Test

Assumptions

Procedure

Rejection rules

Tasks

Reviews

Related products

[Solved] GU4206-GR5206 Homework3-Transforming data

[Solved] GU4206-GR5206 Homework1-Data cleaning, EDA, R graphics

[Solved] GU4206-GR5206 Lab3-Edgar Andersons Iris Data

[Solved] GU4206-GR5206 Homework5-probability distributions using the Inverse Transform Method and AcceptReject Method

[SOLVED] GU4206 GR5206 Homework 2 Iris

[Solved] GU4206-GR5206 Lab4-KNN Classification and Cross-Validation