Bayesian Methods Coursework
Dr. Simon Taylor
Deadline: 12noon Friday 21 April 2017 (Uploaded to MOODLE by 5pm)
The coursework for Bayesian Methods is in two parts and comprises of two short reports. The first
report should be about six sides of single spaced A4 including figures and about four sides for the second
part. Use OpenBUGS for your analysis. The data and additional information are available via moodle.
In addition to your report, your final model for both parts should submitted via moodle in separate
files titled exam model.ocd and birth weight model.ocd. The model can be in either code or Doodle
formats, but do not include the data or initial values.
The allocation of marks are as follows:
Part 1 Exam Performance Part 2 Low Birth Weight Total
Introduction 5% 5% 10%
Methods 25% 15% 40%
Results 15% 10% 25%
Conclusion 5% 10% 15%
Total 50% 45% 95%
The remaining 5% is awarded based on the presentation of the report.
1 Exam Performance
The file exam data.odc contains the (normalised) exam performance of 4,059 students from 65 schools
in Inner London. The data are taken from:
Goldstein, H., Rasbash, J., et al (1993). A multilevel analysis of school examination results. Oxford
Review of Education, 19, 425433.
1
The variables in the data are:
school School ID (165) to which the pupil belongs.
examscore The normalised examscore for the pupil.
lrtest The pupils score in LR rest.
gender Gender of the pupil (0=boy, 1=girl).
schooltype School gender (1=mixed, 2=all boys school, 3=all girls school).
intakescore The schools mean intake score.
VR Pupils Verbal Reasoning (VR) score band at intake (1=bottom 25%,
2=middle 50%, 3=top 25%).
studentintake Pupils band intake score (1=bottom 25%, 2=middle 50%, 3=top 25%).
An initial model has been proposed for the data in exam initial model.odc. The model is a hierarchical
(multi-level) model with the examscore yi of pupil i depending upon the school si and the LR test score
xi. Specifically:
yi N(si + xi, 1/) (i = 1, 2, . . . , 4059)
j N(, 1/) (j = 1, 2, . . . , 65)
According to the range of valid values, parameters and are assigned normal prior distributions, whilst
the precisions and are given gamma priors. The school specific intercepts, j , are defined according
to the hierarchical structure, but these are unknown variables and will need to be initialised when using
OpenBUGS.
Note: Both code and doodle are given for this model in a separate file from the data. Since the data set
is large, it is recommended that you keep the two files separate so that loading in the model and data is
easier to do by clicking on the file, choosing EditSelect All and then clicking on the relevant button on
the Specification Tool.
Use these data and OpenBUGS to answer the following:
1. If we are interested in predicting examscores:
(a) Fit the inital heirarchical model to the data. Justify your choice of prior distribution and
discuss the posterior estimates.
2
(b) Develop a sequence of models in a stepwise variable selection procedure for describing exam-
scores using the provided covariates. There are many possibilities and you are not expected
to consider all of these. However, the hierarchical model given above should feature in your
analysis, although you should feel free to change the prior distributions as appropriate. Choose
an optimal model that best describes the examscore against these covariates.
2. Use a node on the best model to find calibrated values for the predicted examscore for:
(a) A female pupil from school 30 with a LR test score of 0.5, a mid-band VR score and a mid-band
student intake score. She is attending an all girls school that has an average intake score of
0.2687752.
(b) A male pupil from school 47 with a LR test score of -0.35, a low-band VR score and a low-band
student intake score. He is attending an mixed gender school that has an average intake score
of -0.139923.
These values should not be calculated manually from the best model, but directly evaluated
within OpenBUGS by use of a node for the unknown values.
2 Low Birth Weight
The file birth weight data.odc contains data from a study to identify risk factors associated with giving
birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59
of which had low birth weight babies. The data is taken from:
Hosmer Jr, D.W. and Lemeshow, S. (2000). Applied logistic regression, 2nd ed. John Wiley & Sons.
Four variables that the doctor thought to be of importance in predicting whether a baby has a low birth
weight were the mothers age, weight of the subject at her last menstrual period, race and the number
of physician visits during the first trimester of pregnancy.
The data file birth weight data.odc contains:
3
LOW Low birth weight indicator (0=Birth Weight 2500 g, 1=Birth Weight < 2500 g).AGE Age of the mother in years.LWT Weight in pounds (lb) at the last menstrual period.RACE Race (1=White, 2=Black, 3=Other).FTV Number of physician visits during the first trimester.The dependence of low birth weight outcome yi for subject i on the explanatory variables {x1,i, . . . , xp,i}is described by the binary regression model:yi Bernoulli(i) (i = 1, 2, . . . , 189)logit(i) 0 + 1×1,i + . . .+ pxp,i.The file birth weight null.odc defines the null binary regression model. Note the logit link functionused in the code.1. The doctor suggests using normal prior distributions with mean 0 and precision 0.1 for all of theunknown co-efficients. Comment on the doctors choice of prior distributions using the informationprovided above or any other appropriate resource.2. Using the doctors suggested prior distributions, develop the null model to include all four covariates.You might need to consider transformations of the covariates to assist with the performance of theGibbs sampler, in particular, using standardised mothers age and weight (e.g. the age minus themean age). Any changes in covariates should ideally be performed within OpenBUGS.3. Considering each explanatory variable in turn, use the samples drawn from the posterior to calculatethe Bayes Factor for the hypothesis test and evaluate the validity of the doctors statement. Clearlystate the hypotheses that you are investigating.4
Reviews
There are no reviews yet.