- Consider the prostate cancer dataset available on eLearning as prostate cancer.csv. It consists of data on 97 men with advanced prostate cancer. A description of the variables is given in Figure 1. We would like to understand how PSA level is related to the other predictors in the dataset. Note that vesinv is a qualitative variable. You can treat gleason as a quantitative variable.
Build a reasonably good linear model for these data by taking PSA level as the response variable. Carefully justify all the choices you make in building the model. Be sure to verify the model assumptions. In case a transformation of response is necessary, try the natural log transformation. Use the final model to predict the PSA level for a patient whose quantitative predictors are at the sample means of the variables and qualitative predictors are at the most frequent category.
1
header | name | description |
subject | ID | 1 to 97 |
psa | PSA level | Serum prostate-specific antigen level (mg/ml) |
cancervol | Cancer Volume | Estimate of prostate cancer volume (cc) |
weight | Weight | prostate weight (gm) |
age | Age | Age of patient (years) |
benpros | Benign prostatic hyperplasia | Amount of benign prostatic hyperplasia (cm2) |
vesinv | Seminal vesicle invasion | Presence (1) or absence (0) of seminal vesicle invasion |
capspen | Capsular penetration | Degree of capsular penetration (cm) |
gleason | Gleason score | Pathologically determined grade of disease (6, 7 or 8) |
Figure 1: List of variables in the prostate cancer data
Reviews
There are no reviews yet.