Marketing Analytics Homework 2
Individual Assignment
MS Section: Due 1 PM Monday September 16th MBA Section Due 1 PM Tuesday September 17th
In this assignment, you will explore how omitted variables bias in a real world dataset. Specifically, you will be estimating a causal model of demand in the canned soup category. This type of model is the key input in a pricing decision, which will be the subject of the fourth homework assignment.
You will be graded both on your code, and the written answers you provide. When evaluating the code, the grader will take on the role of a co-worker. Code will be evaluated both in terms of how correct and how clear it is. By correctness, I mean that the code fulfills the requirements of the question. By clarity, I mean that the grader should be able to understand what your code does within 30 seconds of reading it. As discussed in class, this is aided by clear comments, good variable names, proper indentation, and short lines.
The written portions will be evaluated the use of data and analysis to support your statements, and the quality of the writing.
Assignment Materials for Download:
1. An Rmarkdown template titled Homework2Template.Rmd
2. A data file in .csv format titled Homework 2 Data 436(R).csv
Submission Checklist:
To help us grade the assignments efficiently and correctly, we ask that you submit your assignments in a specific format. A complete submission for this assignment will send the following to [email protected]:
o A .rmd Rmarkdown file, based on the template for this assignment.
o A .html file, generated by knitting the .rmd file in RStudio.
o All file names should be [last name], [first name].[file extension], where you replace everything in the square brackets with the appropriate values (i.e. your last name), and delete the square brackets.
o Do not archive the files or combine the files. Each of these files should be a separate attachment in the email.
o Do not send the assignment to my work address thank you!
Data Dictionary:
units refers to the total units purchased,
weekInYearNum refers to which week within a particular year the purchased was made,
totalRevenue refers to the total price that a consumer paid for all the units they
purchased,
storeNum is a categorical variable that represents which store the purchase was made at
productNum is a categorical variable that represents which product was sold
isFeature and isDisplay are Boolean variables representing if the product was
featured (for example, in a mail out) or separately displayed (for example, a large cardboard display) in the store. These
Part 1: Omitted Variable Bias (20 marks)
In this section you will investigate the effect of omitted variable bias in a real world dataset. This section of this assignment will have you running regressions with progressively more independent variables. For example, in part d, the regression should have three independent variables: pricePerCan, isFeature, and isDisplay. You can complete this section using exclusively the log,summary and lm functions.
a) Load the data, and calculate the Price per Can by dividing totalRevenue by units.
b) Run a regression where log(units) is the dependent variable, and pricePerCan is
the independent variable.
c) Repeat the regression in part 2b, and add isFeature.as an independent variable. How did incorporating isFeature change the price coefficient? Explain the direction of the change in terms of the correlation with y and prices. Answer in at most 4 short sentences (4 marks)
d) Repeat the regression in part 2c, and add isDisplay.as an independent variable. How did incorporating isDisplay change the price coefficient? Explain the direction of the change in terms of correlation with y and prices. Answer in at most 4 short sentences (4 marks)
e) Repeat the regression in part 2d, and add factor(storeNum) as an independent variable. How did incorporating storeNum change the price coefficient? Explain the direction of the change in terms of the correlation with y and prices. Answer in 2-4 short sentences . Hint: storeNum should be treated as a categorical variable. (4 marks)
f) Repeat the regression in part 2e, and add factor(productNum) as an independent variable. How did incorporating productNum change the price coefficient? Explain the direction of the change in terms of the correlation with y and prices. To shorten your output, do not call the summary function here. Answer in 2-4 short sentences . Hint: productNum should be treated as a categorical variable. (4 marks)
Part 2: Interaction Effects (16 marks)
Throughout this section, I will refer to the analysis in part 1f as the initial regression. Recal that in 1f, log(units)was the dependent variable and the independent variables were pricePerUnit, isFeature, isDisplay, factor(storeNum), factor(productNum). Throughout this section, do not use the summary function as it will make your output too long to submit just report the lm output. Answer in 2-3 short sentences.
a) Run the initial regression, and include an interaction between isFeature with pricePerCan as an independent variable. In quantitative terms, what does this coefficient estimate mean? (4 marks)
b) Run the initial regression, and include an interaction between isDisplay with pricePerCan as an independent variable. In quantitative terms, what does this coefficient estimate mean? (4 marks)
c) Run the initial regression, and include an interaction between isDisplay with isFeature as an independent variable. In quantitative terms, what does this coefficient estimate mean? (4 marks)
d) Run the initial regression, and include an interaction between pricePerCan with factor(storeNum)as an independent variable. In quantitative terms, what does the coefficient estimate for the interaction between the 4th store and prices (i.e. pricePerCan:factor(storeNum)4 in the output) mean? (4 marks)
Reviews
There are no reviews yet.