[SOLVED] R C math graph statistic Kings College London

$25

File Name: R_C_math_graph_statistic_Kings_College_London.zip
File Size: 423.9 KB

5/5 - (1 vote)

Kings College London
This paper is part of an examination of the College counting towards the award of a degree. Examinations are governed by the College Regulations under the authority of the Academic Board.
Degree Programmes
Module Code Module Title Examination Period
MSc
7CCMMS61T
Statistics for Data Analysis January 2018 Period 1
Time Allowed Rubric
Calculators
Notes
Two hours
ANSWER ALL QUESTIONS. ANSWER EACH QUESTION ON A NEW PAGE OF YOUR ANSWER BOOK AND WRITE ITS NUMBER IN THE SPACE PROVIDED. A FOR MULA SHEET IS PROVIDED.
Calculators may be used. The following models are permit ted: Casio fx83Casio fx85.
Books, notes or other written material may not be brought into this examination
PLEASE DO EXAMINATION ROOM
2018 Kings College London
SOLUTIONS
NOT REMOVE THIS PAPER FROM THE

January 2018 7CCMMS61T
1. Answer
PARTIAL MARKS ARE AWARDED FOR WORKING, THROUGHOUT. Syl labus topic: descriptive statistics. Teaching outcome: similar problems have been seen in tutorials.
A researcher measured the concentration of potassium in the blood of 50 patients after receiving a new drug. The following table summarises the data:
Interval
Absolute frequency
3,4 4,4.5 4.5,5 5,5.5 5.5,6
7 12 18 8 5
a. Calculate the relative frequency table and the relative cumulative fre quency table for these data.
Answer
2 marks for each.
SOLUTIONS
4 marks
Interval
Relative frequency
Relative cumulative frequency
3,4 4,4.5 4.5,5 5,5.5 5.5,6
0.14 0.24 0.36 0.16 0.1
0.14 0.38 0.74 0.9 1
Page 2
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
Empirical distribution function for Potassium data
3.0 3.5 4.0 4.5 5.0 5.5 6.0
b. Draw appropriate graphical representations for the relative frequency table and the cumulative relative frequencies.
4 marks
Answer
2 marks for each.
c. Determine the modal class and the intervals containing the rst quartile, median and third quartile.
3 marks
Answer
1 mark for the modal class, 2 marks for the quantiles classes. Modal class: 4.5, 5. Q14, 4.5,Q24.5, 5, Q35, 5.5.
Page 3
SEE NEXT PAGE
Relative cumulative frequency
0.0 0.2 0.4 0.6 0.8 1.0

SOLUTIONS
January 2018 7CCMMS61T
d. Calculate approximated values for the median, the mean and the variance of the data.
4 marks
Answer
1 mark for the median, 1 mark for the mean, 2 marks for the variance.
Q2 4.50.50.50.384. 6 0.36
x 3.50.144.250.244.750.365.250.165.750.14.635 21 73.54.6352 124.254.6352 184.754.6352
50
85.254.635255.754.63520.405525
Page 4
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
2. The distance X, measured in thousands of kilometers, that a model of electric car can travel with a newly charged battery is a random variable with density function
1x 12, ifx0.
Syllabus topic: distributions and random variables. Teaching outcome: the
application is new but similar exercises have been done.
a. Find the values of parameter c for which fXx is a valid probability density function.
4 marks
Answer
2 marks for stating the condition, 2 marks for nding the value.
Answer
fXxc I0,x c 2
0, if x0,
c c
2dx1c1. 1x 1x 0
b. Calculate the distribution function FXx.
0
Answer
2 marks for the expression of the integral, 2 marks for solving it.
0
FX xP Xx x 1
0 1y2
if x0 dy11 if x0
1x
4 marks
Page 5
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
c. Calculate the probability that the electric car can travel at least 1 thou sand kilometers.
Answer
PX 11PX 11FX1 1 0.5 11
4 marks
d. Calculate the rst quartile Q1 and the third quartile Q3 of X.
6 marks
Answer
3markseach. For01thequantileq isgivenbyFXq, i.e. 1 1 ,thereforeq . Then,Q1 q0.25 13and
e. Calculate the probability that the distance travelled with a newly charged battery is between Q1 and Q3.
1q Q2q0.753.
1
Answer
PQ1XQ30.5 by denition.
2 marks
Page 6
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
3. Consider the probability density function
x1 for0x1 fXx 0 else ,
with 0 unknown parameter and let X be a random variable with probability density function fXx.
Answer
Syllabus topic: distributions, point estimation. Teaching outcome: The expression of the distribution is new but similar examples have been seen.
a. Calculate the expected value of X.
Answer
4 marks
2 marks for the expression of the integral, 2 marks for the value. EX
1xfXxdx1xdx. 0 0 1
b. Calculate the variance of X.
Answer
4 marks
2 marks for the expression of the integral, 2 marks for the value.
EX21x2fXxdx1x1dxandVarXEX2 0 0 2
22EX21212
Page 7
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
Let X1,,Xn be a random sample from a population with probability density function fXx.
c. Dene the sample mean X and compute its expected value as a function of .
3 marks
Answer
1 mark for the denition, 2 marks for the expected value. X1 n Xi
andEX1n EXin i1 1
d. Find the expression of the estimator forbased on the method of mo ments.
n i1
Answer
XX . 1 1X
3 marks
e. Find the expression of the maximum likelihood estimator for .
6 marks
Answer
2 marks for the likelihood, 3 marks for the maximization, 1 mark for
checking that it is a maximum. L;X1,,Xnn i1
llnLnln1ni1lnxi. Then,
x1 and i
dl n n
ln xi0,
and this impliesn n . Moreover, d2ln20, therefore
d
i1 lnxi d2
the stationary point is a maximum andis the maximum likelihood estimator.
i1
Page 8
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
4. A company is investigating the production of one of their factories that produces metallic laminates. They measure the tensile strength of n samples of metallic laminates. They can assume that the measurements in pounds per square inch, psi X1, . . . , Xn are independent and normally distributed random variables with unknown meanand variance 16psi2 known from previous experiments.
Answer
Syllabus topic: condence intervals, hypothesis testing. Teaching outcome: the application is seen for the rst time and it requires basic knowledge of condence intervals and testing procedure.
a. What is the minimum value of n such that the width of the 95 con dence interval foris not larger than 4psi?
6 marks
Answer
3 marks for the expression of the width of the interval, 3 marks for the correct n. X1,,XnN,2, with 216. The 95 condence interval foris x z10.975n, x z10.975n, where xis the sample mean. Its width is therefore 2z10.975n21.964n and 21.964n4 implies n15.37. The smaller n that satises this condition is n16.
The employee tasked with the investigation decided to collect n20 mea surements and they obtained a sample mean of x 91psi.
b. Calculate the realization of the 95 condence interval for .
3 marks
Answer
911.75, i.e. 89.25, 92.75.
Page 9
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
c. The company guarantees to their client that the mean textile strength of their product is 95psi. Describe the appropriate hypothesis test to check this hypothesis. Could a client complain on the basis of the observed measurements?
5 marks
Answer
3 marks for the appropriate test, 2 marks for carrying out the test.
H0 :95vsH1 :95;thenullhypothesiscanberejectedat5 level because 095 is outside of the 95 condence interval.
d. Describe how the expression of the 95 condence interval would change in the case where the variance of X1,,X20 is unknown. Compute the realization of the 95 condence interval in this case, knowing that the employee reported a sample variance of 17psi2.
6 marks
Answer
3 marks for the expression of the interval, 3 marks for the realization. If the measurements are independent random variables with unknown mean
X 2120 2
and variance, t19, where sXiX . The expression
s 19 19i1
of the 95 condence interval is Xt s , where t is the
0.025,19 19 0.025,19
0.975 quantile of the student t distribution with 19 degrees of freedom.
The realization of the condence interval is 91psi2.0931719psi91psi1.98psi.
Page 10
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
5. A researcher is studying the speed of growth for some plant species. Data are collected by measuring the size of plants size, in cm after they have been allowed to grow in a laboratory for a certain number of days days. Data are collected in three dierent laboratories and this is denoted in R with a factor lab with levels A,B and C.
Answer
Syllabus topic: linear regression. Teaching outcome: the application is new but similar examples have been seen.
a. The researcher decides to t a linear model to the data using R:
plant1lmsizedaysdays:lab
summaryplant1

Call:
lmformulasizedaysdays:lab

Coefficients:
Estimate Std. Error t value Prt
Intercept 0.6464
days 10.1404
days:labB 5.0770
days:labC3.1296

5.3745 0.1200.905
0.367727.576 2e16
0.373913.580 2.58e13
0.37398.371 7.48e09
Residual standard error: 15.09 on 26 degrees of freedom
Multiple Rsquared:0.9858, Adjusted Rsquared:0.9842
Fstatistic: 601.5 on 3 and 26 DF,pvalue:2.2e16
Write down the mathematical expression of the linear regression model. What is the estimated daily increase in size for laboratory A, B and C? What is the estimate of the error variance?
Page 11
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
6 marks
Answer
4 marks for the model, 2 marks for the estimated parameters. Let Yij
be the size and Xij be the number of days for the ith observation of the laboratory j, i1,,nj, j A,B,C. The model is then
Yij01XijjXijij
under the corner point constraint A0, where ij are i.i.d N0,2.
The estimated daily increase in size for the laboratory A is then 1
10.14. The ones for laboratory B and C areA1 1B

0
10.145.0815.22 and 10.143.137.01 respectively. 1C
The estimate of the error variance is s215.092227.71. Alterna tively, the model can be dened using two dummy variables. Let Yk be the size of the ith observation, Zk1 be the number of days for the ith observation, Zk2 be equal to 1 if the ith observation comes from the laboratory B and 0 else and Zk3 be equal to 1 if the ith observation comes from the laboratory C and 0 else, for k1,,30. Then, the model is
Yk01Zk1BZk1Zk2CZk1Zk3k, where k are i.i.d N0,2.
Page 12
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
b. Provide a 95 condence interval for the daily increase in size in labo ratory A.
4 marks
Answer
3 marks for the expression of the interval, 1 mark for the computa
tion. Using model plant1, the daily increase is in laboratory A is
the parameter 1 and a 95 condence interval for 1 is given by

1t0.025;26 10.140.42.0560.367710.140.40.756.
1
c. Provide an estimate for the size of a plant that has been allowed to grow for 10 days in laboratory B.
2 marks
Answer
Using model plant1, Y0.646410.14045.07710152.82.
d. Check the diagnostic plots in Figure 1 and dene the quantities that appear on the x and y axes of these plots. Do you spot any problems with the model assumptions?
6 marks
Answer
2 marks each for the plots description, 1 mark each for the comments.
ThettedvaluesaredenedasY ZZ ZZ Z
k 0 1 k1 B k1 k2 C k1 k3
and the residuals are YY , for k1,,30. The plot of the kkk
residuals vs tted values may suggest the presence of a quadratic trend in the residual.
The qqplot compare the sorted standardized residuals y axis with the corresponding theoretical quantiles of a standard normal distribution y axis. The qqplot does not highlight any problem with the normality assumption.
Page 13
SEE NEXT PAGE

January 2018
7CCMMS61T
12
Res.DfRSS Df Sum of Sq
26 5921.7
F PrF
SOLUTIONS
Figure 1: Diagnostics plots.
e. The researcher tries then to t a second model which allows for dierent intercepts for the dierent laboratories:
plant2lmsizedayslab
anovaplant1,plant2
Analysis of Variance Table

Model 1: sizedaysdays:lab
Model 2: sizedayslab
24 5103.02818.72 1.9253 0.1677
Explain the test that is carried out by the anova command, specify ing the null and the alternative hypothesis, the expression of the test statistics and how the pvalue is computed. Which model is preferable?
7 marks
Answer
2 marks for the null and alternative hypothesis, 2 marks for the test statistics, 2 marks for the pvalue, 1 mark for the conclusion. Let 0 be the model tted in R as plant1 and 1 the model tted as
Page 14
SEE NEXT PAGE

SOLUTIONS
January 2018 7CCMMS61T
plant2. The anova command carries out a hypothesis test where H0 :
data are generated from model 0 vs H1 : data are generated from model 1. The test statistics is
F0RSS0 RSS12, RSS1 26
where RSS denotes the residual sum of squares for the modeland under the null hypothesis F0F2,26. The pvalue is computed as PF2,26F0PF2,261.92530.1677. For all the usual levels, we do not have evidence to reject the null hypothesis and therefore we prefer model 0 plant1.
Page 15
FINAL PAGE

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] R C math graph statistic Kings College London
$25