8/27/2020 Assignment #1
Assignment #1
Submit Assignment
Due Friday by 10am Points 100 Submitting a file upload File Types rmd and pdf Available after Aug 21 at 8pm
All manipulation of the data must be done with R. If you do not include an R chunk for each task, you will receive no points for that task.
The Charles Book Club (CBC) was established in December of 1986, on the premise that a book club could differentiate itself through a deep understanding of its customer base and by delivering uniquely tailored offerings. CBC focused on selling specialty books by direct marketing through a variety of channels, including media advertising (TV, magazines, newspapers) and mailing. CBC is strictly a distributor and does not publish any of the books that it sells. In line with its commitment to understanding its customer base, CBC built and maintained a detailed database of its club members. Upon enrollment, readers were required to fill out an insert and mail it to CBC. Through this process, CBC has created an active database of 500,000 readers. CBC acquired most of these customers through advertising in specialty magazines.
CBC sent mailings to its club members each month containing its latest offering.
On the surface, CBC looked like they were very successful, mailing volume was increasing, book selection was diversifying and growing, their customer database was increasing; however, their bottom line profits were falling. The decreasing profits led CBC to revisit their original plan of using database marketing to improve its mailing yields and to stay profitable.
The dataset is presented to you in a format that you will encounter frequently in practice, a Microsoft Excel
spreadsheet: Charles_BookClub_modified.xls . You will need to first clean the data before attempting to do any analytics or visualization.
Gender: 0 = Female and 1 = Male.
M=Monetary value: Total money spent on CBC books.
F=Frequency: Total number of purchases in the chosen period (used as a proxy for frequency.) R=Last purchase (recency of purchase): Months since last purchase.
First purchase: Months since first purchase.
ChildBks: Number of childrens books purchased. YouthBks: Number of youth books purchased. CookBks: Number of cookbooks purchased. DoltYBks: Number of do-it-yourself books purchased. RefBks: Number of reference books purchased. ArtBks: Number of art books purchased.
GeogBks: Number of geography books purchased. ItalCook: Number of Italian cookbooks purchased. ItalHAtlas: Number of Italian atlases purchased. ItalArt: Number of Italian art books purchased.
Florence: A choice variable indicating whether the customer purchased the The Art History of Florence. 1 corresponds to a purchase and 0 corresponds to a nonpurchase.
Your submission will consist of an .Rmd and a .pdf document. Use headings formatted with R markdown syntax for each of the following tasks and questions. Use the R markdown text blocks to comment on your code in lieu of R comments in the {r} chunks. On all your plots be sure to put a meaningful title. Be sure to repeat the task instruction or question and identify the task or question by the numbers in this list:
1. How many observations are in the dataset?
2. List the variables and show their type.
3. How many observations have no books purchased (including Florence)? Delete those observations before proceding to the next step.
4. Recode the gender variable as a factor with two levels: M and F.
5. Recode the Florence variable as a factor with two levels: Yes and No.
6. Calculate the mean, standard deviation, skewness and kurtosis for monetary value.
7. Are there any outliers? If yes, how do you know?
8. If you detect the presence of any outliers, change them to missing.
9. Are there any missing values? If yes, count them and report the counts for each variable that has missing values.
10. Impute the missing values using mean substitution. Do this with one of the apply() family of functions (apply, sapply, lapply, etc.)
11. Produce a histogram of the continuous variables and display them all on one page.
12. Produce a bar chart for the counts of each of the ten books purchased.
13. Using dplyr filter() and mutate() verbs, construct 27 categories of RFM where each category is composed of combinations of values of High,
Medium and Low for R, F, and M. To do this you must divide each of these three variables into thirds (boundaries at 33 and 66 percentiles for each). At the end of the process youll have three new factors, one for R, one for F, and one for M with three levels each, High, Medium, and
https://smu.instructure.com/courses/78193/assignments/401065?module_item_id=718967
1/4
8/27/2020 Assignment #1
Low. Use these factors to produce three tables of means of M, one for each level of M. Each table will be a 3X3 cross tabulation of frequency
level by recency level.
14. What is the median monetary value per visit by gender?
15. Make a bubble plot of recency on the x-axis, monetary value on the y-axis, where the size of the points are proportional to first purchase and the
points are colored by gender.
Assignment Rubric
https://smu.instructure.com/courses/78193/assignments/401065?module_item_id=718967
2/4
8/27/2020 Assignment #1
Criteria
Ratings
Pts
Submitted ontime
20.0 pts Full Marks
0.0 pts No Marks
20.0 pts
Hadley Wikham Style Guide Compliance
functions are verbs; variables are nouns; indents correct;
curly braces positioned correctly;
variable and function names all lower case; correct use of spaces;
line lengths <= 80 15.0 pts Full Marks0.0 pts No Marks 15.0 ptsAll chunks run without errors 35.0 pts Full Marks0.0 pts No Marks35.0 ptsQ1How many observations are in the dataset? 1.5 pts Full Marks0.0 pts No Marks1.5 ptsQ2List the variables and show their type. 1.0 pts Full Marks0.0 pts No Marks1.0 ptsQ3How many observations have no books purchased (including Florence)? Delete those observations before proceding to the next step. 2.0 pts Full Marks0.0 pts No Marks2.0 ptsQ4Recode the gender variable as a factor with two levels: “M” and “F”. 1.5 pts Full Marks0.0 pts No Marks1.5 ptsQ5Recode the Florence variable as a factor with two levels: “Yes” and “No”. 1.5 pts Full Marks0.0 pts No Marks1.5 ptsQ6Calculate the mean, standard deviation, skewness and kurtosis for monetary value. 1.5 pts Full Marks0.0 pts No Marks1.5 ptsQ7Are there any outliers? If yes, how do you know? 1.0 pts Full Marks0.0 pts No Marks1.0 ptsQ8If you detect the presence of any outliers, change them to missing. 2.0 pts Full Marks0.0 pts No Marks2.0 ptsQ9Are there any missing values? If yes, count them and report the counts for each variable that has missing values. 1.0 pts Full Marks0.0 pts No Marks1.0 ptsQ10Impute the missing values using mean substitution. Do this with one of the apply() family of functions (apply, sapply, lapply, etc.) 2.0 pts Full Marks0.0 pts No Marks 2.0 ptshttps://smu.instructure.com/courses/78193/assignments/401065?module_item_id=7189673/4 8/27/2020 Assignment #1 CriteriaRatings Pts Q11Produce a histogram of the continuous variables and display them all on one page. 2.0 pts Full Marks0.0 pts No Marks 2.0 ptsQ12Produce a bar chart for the counts of each of the ten books purchased.2.0 pts Full Marks0.0 pts No Marks 2.0 pts Q13Using dplyr filter() and mutate() verbs, construct 27 categories of RFM where each category is composed of combinations of values of “High”, “Medium” and “Low” for R, F, and M. To do this you must divide each of these three variables into thirds (boundaries at 33 and 66 percentiles for each). At the end of the process you’ll have three new factors, one for R, one for F, and one for M with three levels each, High, Medium, and Low. Use these factors to produce three tables of means of M, one for each level of M. Each table will be a 3X3 cross tabulation of frequency level by recency level. 5.0 pts Full Marks0.0 pts No Marks 5.0 ptsQ14What is the median monetary value per visit by gender? 2.0 pts Full Marks0.0 pts No Marks2.0 ptsQ15Make a bubble plot of recency on the x-axis, monetary value on the y-axis, where the size of the points are proportional to first purchase and the points are colored by gender. 4.0 pts Full Marks0.0 pts No Marks 4.0 pts Total Points: 100.0 https://smu.instructure.com/courses/78193/assignments/401065?module_item_id=7189674/4
Reviews
There are no reviews yet.