MATH6143 Survival Models Data Analysis Pro ject
This assignment is worth 30% of the overall mark for the module.
Completed work should be uploaded (as a PDF) to Blackboard before 12.00 on Monday 6th
January 2020.
There is a strict page limit of eight sides of A4, which is easily sufficient to receive full credit. Text must not be smaller than 12pt font and margins must be no smaller than 2cm. It is permissible to include R plots directly in your submission. Other analysis done using R must be integrated into your text or in properly formatted tables cutting and pasting large verbatim sections of output from R is not acceptable.
These questions involve the modelling of real data. There is not necessarily a single correct answer. Careful explanation and clear presentation are important.
All coursework must be carried out and written up independently. Data files can be downloaded from Blackboard.
1. (5 marks) The data below represent times (in days) for the CD4 count to relapse to a prespeci- fied level for HIV+ patients under two treatment regimes (a two-drug regime and a triple-drug therapy). A + after the survival time indicates a right censored observation.
Two-drug treatment (AZT+ddC)
85 32 38+ 45 4+ 84 49 180+ 87 75 102 39 12 11 80 35 6
Triple-drug treatment (AZT+ddC+saquinavir) (A + after the survival time indicates a cen- sored observation)
22 2 48 85 160 238 56+ 94+ 51+ 12 171 80 180 4 90 180+ 3
(a) Compare survival in the two treatment groups using Kaplan-Meier estimates of the sur- vivor function.
(b) For each group, present a 95% confidence interval for the probability that relapse occurs later than 3 months (92 days) after treatment.
2. (7 marks) The data in file larynx.txt represent survival times (in years) for seventy males diagnosed with cancer of the larynx. Also recorded are two potential explanatory variables: stage of disease at diagnosis (a factor with four levels) and age at diagnosis, together with a variable indicating whether death was observed during the study.
(a) Investigate the dependence of survival on the explanatory variables using Cox proportional hazards models. Carefully present your chosen model, giving confidence intervals for any model parameters.
(b) By plotting the baseline survival function estimate (or baseline cumulative hazard esti- mate) on a suitable scale, assess whether or not you think that a Weibull accelerated failure model would be appropriate for these data.
3. (8 marks) The data in file duck.txt represent survival times (in days) after radio-tagging for fifty female black ducks. Also recorded are an indicator of whether death was observed (1=observed, 0=censored) and three potential explanatory variables (age in years, weight in grams and length in cm).
(a) Investigate the dependence of survival on the explanatory variables using Weibull regres- sion models.
(b) Is the Weibull distribution a reasonable model?
(c) Investigate the dependence of survival on the explanatory variables using Cox proportional hazards models.
(d) Plot (on the same figure) the estimated survivor function for a one year old duck with weight 1200 and length 270, under your preferred model in each of parts (a) and (c).
4. (10 marks) The data in file mortality.csv represent numbers of deaths and central exposed to risk for male and female members of a large pension scheme, for age (at last birthday) x = 60,61,
(a) Calculate the crude central mortality rates (mx) for male and female pensioners, and compare log mx for males and females, by plotting both sets of values on the same axes.
(b) Calculate the corresponding qx values under both (i) constant force of mortality within each year of age, and (ii) uniform distribution of deaths within each year of age. Hence, calculate a life table (with l60 = 10000) for males and for females, under both assump- tions. [In your report, it is sufficient to give the values of lx at 5-year intervals, that is l60, l65, l70, . . .]
(c) Calculate the complete and curtate life expectancies for males and females at age 60.
(d) For both males and females, use a formal statistical test to compare the death rates in this insured population with the whole population of England and Wales (you will need to download ELT17 from the ONS website).
(e) For the male population only, use a Gompertz log-linear model to produce a set of gradu- ated central mortality rates from the crude mortality data. Compare crude and graduated rates by by plotting both sets of log mx values on the same axes.
Reviews
There are no reviews yet.