Instructions
STAT 614 Week 13: Non-nested Comparisons
Class Lab and Homework
Richard Ressler 2019-11-20
Instructions: Create your solutions in an R Markdown Document and knit to PDF directly (not via HTML). Upload both the PDF file and the .Rmd document to blackboard. Points will be deducted for a missing document or one that was not knitted directly to PDF from R Markdown.
Learning Outcomes
Choose Between Non-nested Models
I. Case Study: Cereal Data
You work for a nonprofit organization that specializes in nutrition and health. Your boss wants you to explore the cereals dataset from the plspm package, reproduced in the cereals.csv file in the data folder. The variables include:
mfr: Manufacturer of cereal
type: cold or hot
calories: calories per serving
protein: grams of protein
fat: grams of fat
sodium: milligrams of sodium
fiber: grams of dietary fiber
carbo: grams of complex carbohydrates
sugars: grams of sugars
potass: milligrams of potassium
vitamins: vitamins and minerals 0, 25, or 100, indicating the typical percentage of FDA recommended shelf: display shelf (1, 2, or 3, counting from the floor)
weight: weight in ounces of one serving
cups: number of cups in one serving
rating: a rating of the cereals
You are primarily interested in understanding how to predict ratings for new cereals.
1. Load the data. Review the variables. What are the observational units? What is the Response Variable? 2. Convert any variables you think should be factors to be coded as a factor.
3. Conduct EDA to screen the variables and identify candidates for explaining ratings.
4. Step-wise Regression. Start with a complicated model.
Look at p-values (when testing HO that a coefficient is 0)
Drop the one with the largest p-value.
Continue until all p-values are less than some threshold (usually 0.05).
4. Take a step
1
5.Take another step 6.Take another step
7. Replot
8. Use the step() function to do this automatically. What do you get? 9. Try Again with a reduced model
II. Homework
1. Continue to iterate and try different models until you have TWO that are similar in terms of adjusted R2.
2. Compare these models using BIC (Bayesian Information Criterion) and AIC (Akaike Information Criterion)
3. Calculate the Mallows Cp based on a subset of the original variables.
Remember: Only feasible if you have less than p = 10 or so explanatory variables (2p models are possible).
4. Plot Cp on the y-axis and the number of parameters on the x-axis.
5. Select a model. Provide a rationale.
6. Summarize your results,. Include the final model, the p-values, coefficients, and confidence intervals. Pick the variable with the largest effect and explain the effect of that variable on the ratings.
2
Reviews
There are no reviews yet.