Homework 2
Due Feb 12th, 2019 Tuesday by 5:00 PM submitted in canvas
Instructions:
1. Answer all the questions in the homework
2. The number represented in parenthesis, e.g. (2pt) represents two points for the question
3. Follow the sub points carefully
i) Datasets anywhere in the document appear in bold letters
ii) Variables in the datasets appear italicized
4. Please submit only one word or pdf file for the group in canvas by their due dates
5. Pleasereportonlytheoutputforeachquestion.Youmaywishtoprovidethecodeattheendof
the word or pdf document. Do not submit any R project code files (.R extension files).
6. Only one member of the group makes the submission on canvas. List the group name and
members names in the submitted file itself
1. Predicting personal loan acceptance using k-NN algorithm
Universal Bank is a relatively young bank growing rapidly in terms of overall customer
acquisition. The majority of these customers are liability customers (depositors) with varying sizes of relationship with the bank. The customer base of asset customers (borrowers) is quite small, and the bank is interested in expanding this base rapidly to bring in more loan business. In particular, it wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors).
A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise smarter campaigns with better target marketing. The goal is to use k-NN to predict whether a new customer will accept a loan offer. This will serve as the basis for the design of a new campaign.
The file personal_loan.csv contains data on 5000 customers. The description of the variables is given below
1
Homework 2
Due Feb 12th, 2019 Tuesday by 5:00 PM submitted in canvas
Variable
Description
age
Customer age in years
experience
Experience in years
income
Income in thousands of dollars
ccavg
Spending on credit cards
family
Family size
education
Education (undergrad, graduate, advanced)
mortgage
Value of house mortgage in thousands of dollars
securities
1 if customer has securities account with bank, 0 otherwise
cd_account
1 if customer has certificate of deposit account with bank, 0 otherwise
online
1 if customer uses Internet banking facilities, 0 otherwise
credit_card
1 if customer uses credit card issued by the bank, 0 otherwise
personal_loan (response)
accept if customer accepted the loan, reject otherwise
a) Summarize the variables in the data. For numeric variables, compute minimum, mean, median, maximum and standard deviation. For character and binary variables (securities_account, cd_account, online, credit_card), report the count of levels. (5pt)
b) Partition the data into training (60%) and validation (40%) sets. Perform a 3-NN classification on the training data and predict the loan status in the validation data. (15pt) (Hint/Instructions:
I. Set seed to 30
II. Create dummy variables for the variable education. Use the dummy variables rather
than using the character variable in the model
III. Do not normalize the binary/dummy variables. Only normalize the numeric
variables)
c) Report the confusion matrix, overall accuracy, specificity and sensitivity measures for the
validation data (10pt)
d) Run the k-NN algorithm varying the k from 1 to 20 and track the overall accuracy of
validation data. Report the results in a table. What is the value of k which gives best predictive performance? (15pt)
2
Homework 2
Due Feb 12th, 2019 Tuesday by 5:00 PM submitted in canvas
e) Choosethekwhichyouanswerinpart(d)ofthisquestion.Usethiskandrunthealgorithm on the entire data to predict the loan status for the following customer (5pt)
age = 40
experience = 10
income = 84
family = 2,
ccavg = 2
education = graduate mortgage = 0 securities_account = 0, cd_account = 0,
online = 1 credit_card = 1
2. Predicting Software Reselling Profits using Linear Regression
Tayko Software is a software catalog firm that sells games and educational software. It started out as a software manufacturer and then added third-party titles to its offerings. It recently revised its collection of items in a new catalog, which it mailed out to its customers. This mailing yielded 2000 purchases. Based on these data, Tayko wants to devise a linear regression model for predicting the spending amount that a purchasing customer will yield. The file tayko.csv contains information on 2000 purchases. The description of the variables is given below
Variable
Description
freq
Number of transactions in the preceding year
last_update
Number of days since last update to customer record
web
1 if customer purchased by web order at least once, 0 otherwise
gender
1 if customer is male, 0 otherwise
address_res
1 if it is a residential address, 0 otherwise
address_us
1 if it is a US address, 0 otherwise
Spending (response)
Amount spending by customer in test mailing (dollars)
3
Homework 2
Due Feb 12th, 2019 Tuesday by 5:00 PM submitted in canvas
a) Summarize the variables in the data. For numeric variables, compute minimum, mean, median, maximum and standard deviation. For binary variables (web, gender, address_res, address_us), report the count of levels. (5pt)
b) Summarize mean and standard deviation of spending by variables web, gender individually (4pt)
c) Partition the data into training (70%) and validation (30%) sets. Run a linear regression on the training data. Choose one numeric variable and one binary variable in the model results and interpret their coefficients in your own words. Which variables are insignificant? What is
the value of R-Square and interpret it in your own words. (15pt)
(Hint/Instruction: Set seed to 30)
d) Predict the spending in the validation data and report all the accuracy measures. (6pt)
e) Run the K-Fold Cross Validation with K = 10 and report the mean and standard deviation
of RMSE, MAPE measures. Compare (smaller or larger etc.) these measures with those obtained in part (c) (20pt)
(Hint/Instructions: Please try to code the K-Fold Cross Validation algorithm on your own. I have explained the concept and demonstrated the code in the class. Please meet me if you have trouble to build the code)
4
Reviews
There are no reviews yet.