Kings College London
This paper is part of an examination of the College counting towards the award of a degree. Examinations are governed by the College Regulations under the authority of the Academic Board.
Degree Programmes
Module Code Module Title Examination Period
MSc
7CCMMS61T
Statistics for Data Analysis January 2019 Period 1
Time Allowed Rubric
Calculators
Notes
Two hours
ANSWER ALL QUESTIONS. ANSWER EACH QUESTION ON A NEW PAGE OF YOUR ANSWER BOOK AND WRITE ITS NUMBER IN THE SPACE PROVIDED. A FOR MULA SHEET IS PROVIDED.
Calculators may be used. The following models are permit ted: Casio fx83Casio fx85.
Books, notes or other written material may not be brought into this examination
PLEASE DO EXAMINATION ROOM
2019 Kings College London
SOLUTIONS
NOT REMOVE THIS PAPER FROM THE
SOLUTIONS
January 2019 7CCMMS61T
1. Answer
PARTIAL MARKS ARE AWARDED FOR WORKING, THROUGHOUT. Syl labus topic: descriptive statistics. Teaching outcome: similar problems have been seen in tutorials.
A company collected data about the number of daily calls received by their customer service in the last year, distinguishing between festivity days and working days. The absolute frequency table of the data is the following:
Number of calls
Festivity Working days
0 1 2 3 4 5 6 7 8 9
10
15 19 54 24 68 15 68 2 43 1 30 0 13 0 10 0 10 10 10
a. Compute the relative frequency table for these data. You can round o to the third decimal digit.
5 marks
Answer
The total number of observation is N365 by summing up the values in the table, therefore the relative frequency table is:
Page 2
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
Number of calls
Festivity Working days
0 1 2 3 4 5 6 7 8 9
10
0.041 0.052 0.147 0.066 0.186 0.041 0.186 0.005 0.117 0.003 0.082 0 0.035 0 0.027 0
0.0027 0 0.0027 0 0.0027 0
b. Compute the relative marginal frequency of the number of calls.
5 marks
Answer
Number of calls
Relative marginal frequency
0 1 2 3 4 5 6 7 8 9
10
0.093 0.213 0.227 0.191 0.12 0.082 0.035 0.027 0.0027 0.0027 0.0027
Page 3
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
c. Compute the mean and the standard deviation for the number of calls 6 marks
Answer
3 marks for each.
x 0.09300.21310.22720.19130.1240.0825
0.03560.02770.002780.002790.0027102.64 20.09302.642 0.21312.642 0.22722.642 0.19132.642 0.1242.642 0.08252.642 0.03562.6420.02772.642
0.002782.6420.002792.6420.0027102.6423.267
Page 4
SEE NEXT PAGE
3.2671.807
SOLUTIONS
January 2019 7CCMMS61T
d. Compute the conditional relative frequency distribution of the number of calls when the day is festive and represent it graphically.
9 marks
Answer
5 marks for the distribution, 4 marks for the graphical representation.
The total number of festive days in the dataset is 304, therefore the conditional relative frequency is
Number of calls
Conditional relative marginal frequency for festive days
0 1 2 3 4 5 6 7 8 9
10
15304
54304
68304
68304
43304
30304
13304
10304
1304
1304
1304
This distribution can be graphically represented as a barplot, see Figure 1.
Page 5
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
Figure 1: Barplot of the conditional relative frequency distribution of the number of calls for festive days.
Page 6
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
2. Consider the function fx dened as:
0 if x0 fxax if0xa0 if xa
Answer
Syllabus topic: distributions and random variables. Teaching outcome: the application is new but similar exercises have been done.
a. Find the value of the parameter a such that fx is a valid probability density function.
4 marks
Let X be a random variable with probability density function fx. b. Compute the mean and the variance of X.
Answer
2 marks for stating the condition, 2 marks for nding the value.a f xdx
1 a2 1a2. 2
0
Answer
3 marks for each.
6 marks
EX
EX2
0
a x3 x2 a3 a2 a3 2
xaxdx3 a2a0 3 a263 .
a 0
x4 x3 a4 a3 a4 1 x2axdx4 a3a0 4 a3 123.
VarXEX2EX2 121. 399
Page 7
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
c. Compute the cumulative distribution function of X. Answer
5 marks
2 marks for the denition, 3 marks for the computation.
0 if x0
x2 F xf tdt2xx 2 if 0x2
0 1 ifx 2
d. What is the probability of X being between 0.25 and 0.5?. Answer
2 marks for the expression, 3 marks for the computation
5 marks
P14X12F12F1418
22132
5 marks
240.26
e. Compute the median of X.
Answer
2 marks for the expression, 3 marks for the computation Fm0.52xx22120m21
Page 8
SEE NEXT PAGE
January 2019 7CCMMS61T
3. A researcher working for a company that produces electronic components is studying the failure time i.e. the number of hours the component can work before breaking down for a new component. Let T be number of hours of service for the component before it breaks down. Let us assume that T is normally distributed with meanand standard deviationand let T1,,T10 be a random sample observed from 10 new components. Consider the following two estimators for the meanof T :
1 10
T1 T1 ,T2Ti.
Syllabus topic: point estimation, test of hypothesis. Teaching outcome: the application is new but similar examples have been seen for part a. and b. Part c. has been discussed in the lectures.
a. Compute the bias and the variance of the two estimators. Which one is better and why?
5 marks
Answer
Both estimators are unbiased 2 marks but X2 has smaller variance 2102 and it is to be preferred. 3 marks
Answer
SOLUTIONS
10 i1
Page 9
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
The researcher found out that the sample mean and the sample standard deviation computed from the random sample are 610 and 32 hours, respec tively.
b. At level 5, is there enough evidence that the mean failure time is larger than 600 hours? Carry out the appropriate test of hypothesis.
10 marks
Answer
3 marks for the test, 3 marks for the test statistics, 3 marks for the rejection region, 1 mark for the correct decision.
This is a onesided test with H0: 600 and H1: 600. The test statistics is
X10 600
TS10t9.
The rejection region for the test is therefore Tt9,0.95T1.83. In the observed sample, T0.99 so there is not enough evidence to reject the null and we cannot conclude that the mean failure time is larger than 600.
Page 10
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
c. It is known that the failure time of previously used types of component had standard deviation 55 hours. Is the variance 2 of the failure time of the new type of component smaller than the variance of failure times of the previous component? Carry out the appropriate test of hypothesis.
10 marks
Answer
4 marks for the test, 3 marks for the test statistics, 2 marks for the rejection region, 1 mark for the correct decision.
This is a onesided test with H0 2552 and H1: 2552. The test statistics is
S2
V 9552 29
and the rejection region is V9,0.05V3.33. From the data, V3.053.33 and we have enough evidence to conclude that the variance of the failure time of the new component is smaller than before.
Page 11
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
4. A statistician is given the dataset horses.txt, which contains information about the maximum speed in kmh of 10 horses measured over 30 days. 4 of this horses have been subjected to an experimental training while the other 6 underwent an existing training procedure. The variable speed in the dataset reports the measured maximum speed, the variable day the day in which the measure has been taken, subject is a factor that indicates the horse coded from 1 to 10 and experimental is equal to 1 if the horse was undergoing the experimental training, 0 else. Data are analysed with the following R code.
horsesdataread.tablehorses.txt
model1lmspeeddayday:experimental,datahorsesdata
summarymodel1
Call:
lmformulaspeeddayday:experimental,
datahorsesdata
Residuals:
Min1QMedian3Q Max
4.9359 1.13650.22241.40524.1729
Coefficients:
Intercept
day
Estimate Std. Error t value Prt
40.333430.25166 160.268 2e16
0.270950.0152617.758 2e16
day:experimental0.058660.01411 4.1584.2e05
Signif. codes:00.0010.010.05 . 0.11
Residual standard error: 2.125 on 297 degrees of freedom
Multiple Rsquared:0.6017,Adjusted Rsquared:0.599
Page 12
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
Fstatistic: 224.3 on 2 and 297 DF,pvalue:2.2e16
Answer
Syllabus topic: multiple linear regression. Teaching outcome: the applica tion is new but similar examples have been seen in tutorials and practical classes.
a. Write down the mathematical expression of the linear regression model and the estimates of all the parameters. What is the estimated speed of a horse after 10 days of the experimental training?
7 marks
Answer
3 marks for the model, 2 marks for the parameter estimates and 2 marks for the prediction.
Let Yi being the speed and Xi the day of the ith observation and let Ei being 1 if the the ith observation correspond to a horse with the experimental treatment and 0 else. The mathematical expression of the model is
Yi 0 1Xi 2EiXi i,
with iN0,2 independently. The estimates of the parameters
from the R output are 040.33343, 10.27095, 20.05866 22
and 2.1254.515625. The estimate of the speed of the horse after 10 days of experimental treatment is Y40.333430.27095100.058661043.62953.
Page 13
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
b. Does the analysis support the belief that the new experimental training is better than the existing training procedure? Carry out the appropriate test of hypothesis to justify your conclusion.
7 marks
Answer
2 marks for choosing the test, 3 marks for the test description and 2 marks for the correct decision.
To answer the question we need to carry out the test H0 : 20 vs H1 : 20. At level , we can reject the null hypothesis if
From the R output, the value of the test statistic is 4.158 which is larger than the quantile of the standard normal distribution for all usual levels . Therefore, we can conclude that the data support the belief that new training procedure is better than the old one.
c. Provide a 95 condence interval for the daily increment in speed for horses that undergo the existing not experimental training procedure.
6 marks
Answer
3 marks for the expression and 3 marks for the computation. From the R output,
1se1t0.95,2971se1z0.975
0.270950.015261.960.2410404; 0.3008596.
2
t1,297z1. se2
Page 14
SEE NEXT PAGE
SOLUTIONS
January 2019 7CCMMS61T
d. The statistician would like to modify the model to take into account the fact the dierent horses can have dierent maximum speed before the beginning of the training at day0. Write down the appropriate R code to t this new model.
5 marks
Answer
2 marks for including the new variable, 3 marks for the correct code.
model2lmspeeddayday:experimentalsubject,
datahorsesdata
Page 15
FINAL PAGE
Reviews
There are no reviews yet.