Assignment 4
In 2014, Allstate provided the data on Kaggle.com for the Allstate Purchase Prediction Challenge which is open. The data contain transaction history for customers that ended up purchasing a policy. For each Customer ID, you are given their quote history and the coverage options they purchased.
The data is available on the Blackboard as Purchase_Likelihood.csv.
- It contains 665,249 observations on 97,009 unique Customer ID.
- The nominal target variable is insurance which has these categories 0, 1, and 2
- The nominal features are (categories are inside the parentheses):
- group_size. How many people will be covered under the policy (1, 2, 3 or 4)?
- homeowner. Whether the customer owns a home or not (0 = No, 1 = Yes)?
- married_couple. Does the customer group contain a married couple (0 = No, 1 = Yes)?
Question 1 (35 points)
You will build a multinomial logistic model with the following model specifications.
- Enter the six effects to the model in this sequence:
- group_size
- homeowner
- married_couple
- group_size * homeowner
- group_size * married_couple
- homeowner * married_couple
- Include the Intercept term in the model
- The optimization method is Newton 4. The maximum number of iterations is 100
- The tolerance level is 1e-8.
- Use the sympy.Matrix().rref() method to identify the non-aliased parameterS
Please answer the following questions based on your model.
a.(5 points) List the aliased columns that you found in your model matrix.
b.(5 points) How many degrees of freedom does your model have?
c.(20 points) After entering each model effect, calculate the Deviance test statistic, its degrees of freedom, and its significance value between the current model and the previous model. List your Deviance test results by the model effects in a table.
d.(5 points) Calculate the Feature Importance Index as the negative base-10 logarithm of the significance value. List your indices by the model effects.
Question 2
Please answer the following questions based on your multinomial logistic model in Question 1.
a.(10 points) For each of the sixteen possible value combinations of the three features, calculate the predicted probabilities for insurance = 0, 1, 2 based on your multinomial logistic model. List your answers in a table with proper labeling
b. (5 points) Based on your answers in (a), what value combination of group_size, homeowner, and married_couple will maximize the odds value Prob(insurance = 1) / Prob(insurance = 0)? What is that maximum odd value?
c. Based on your model, what is the odds ratio for group_size = 3 versus group_size = 1, and insurance = 2 versus insurance = 0?(Hint: The odds ratio is this odds (Prob(insurance = 2) / Prob(insurance = 0) | group_size = 3) divided by this odds ((Prob(insurance = 2) / Prob(insurance = 0) | group_size = 1).)
d.Based on your model, what is the odds ratio for homeowner = 1 versus homeowner = 0, and insurance = 0 versus insurance = 1?
Question 3 (40 points)
You will build a Nave Bayes model without any smoothing. In other words, the Laplace/Lidstone alpha is zero. Please answer the following questions based on your model.
a. Show in a table the frequency counts and the Class Probabilities of the target variable.
b. Show the crosstabulation table of the target variable by the feature group_size. The table contains the frequency counts.
c. Show the crosstabulation table of the target variable by the feature homeowner. The table contains the frequency counts.
d. Show the crosstabulation table of the target variable by the feature married_couple.
The table contains the frequency counts
e. Calculate the Cramers V statistics for the above three crosstabulations tables. Based on these Cramers V statistics, which feature has the largest association with the target insurance?
f. For each of the sixteen possible value combinations of the three features, calculate the predicted probabilities for insurance = 0, 1, 2 based on the Nave Bayes model. List your answers in a table with proper labeling.
g.Based on your model, what value combination of group_size, homeowner, and married_couple will maximize the odds value Prob(insurance = 1) / Prob(insurance = 0)? What is that maximum odd value?
Reviews
There are no reviews yet.