Assignment #3
Analysis of FinTech Data
Download the data from the following link.
http://www.dropbox.com/sh/62m5jr0t4vpbeyp/AAASXqr3ZUlC71b3FYDxmLJJa?dl=0
The data is from a FinTech company in Hong Kong. The company lends money to a loan applicant. The company has a decision engine to automatically classify loan applications into approve, reject, and manual review. Once the application is classified into manual review by the engine, reviewers make a subjective judgement to reclassify the case into approve or reject. Fields in the dataset include:
id: loan application ID
loan_amount: requested amount in HKD
tenor: requested repayment periods in months
age: the applicants age at the time of the application
month_of_service: employment period of the current job
residential_status: rent, own, others
monthly_repayment: amount of monthly repayment for other loans
monthly_income: average monthly income during last three months
self_employed: whether the applicant is self-employed
bankrupted: whether the applicant is (was) bankrupted
housewife: whether the applicant is housewife
currently_employed: whether the applicant is employed as a full-time job
channel: loan application channel
language: tc (traditional Chinese), en (English)
manual_review: whether the application was manually reviewed
approved: whether the application was approved
manual_approved: whether the application was manually approved
credit_score: the applicants credit score (higher is better)
friends_facebook: the number of Facebook friends (the value NA indicates that the applicant did not want to provide his/her account with the company)
time_application: time of the day when the application was submitted
location: the location of the applicant when s/he submitted the application
default: whether the repayment is overdue as of June 2017
NOTE: Use all available resources to solve the problems. You can find a solution to most of the coding problems from the Internet. Google it, if you are stuck in the middle.
Q1. Make a new variable, named automatic_approved, which has the value t if approved by the decision engine, f if rejected by the decision engine, and NA if reviewed manually. How many cases are approved or rejected by their decision engine? How many are classified as manual review?
Q2. Compare the automatically approved cases and the automatically rejected cases. Conduct statistical tests on variables available in the dataset to answer the following subquestions.
1) Are they different in loan_amount?
2) Are they different in tenor?
3) Are they different in age?
4) Are they different in month_of_service? 5) Are they different in residential_status? 6) Are they different in monthly_income? 7) Are they different in bankrupted?
8) Are they different in currently_employed? 9) Are they different in channel?
10) Are they different in language?
11) Are they different in credit_score?
12) Are they different in friends_facebook? 13) Are they different in location_application?
Q3. Make a new variable, named automatic_approved_dummy, which has the value of 1 if automatic_approved = t, and 0 otherwise. Develop a regression model for approval by the decision engine using the DV of automatic_approved_dummy. Include all relevant independent variables in the model.
Q4. Based on the analysis results above, provide the logic behind the decision engine to judge approve.
Q5. Develop the best classification model to reduce their manual jobs. Which classification models will you choose? What is the sensitivity and specificity of your model? Provide a table that contains the sensitivity and specificity of your models.
Q6. Given that your classification model is not perfect, the managers have concerns that the new decision engine based on your classification model can accept the application which should be rejected, or reject the application which should be accepted. What is your suggestion to address their concerns?
Guideline for Assignment 3:
Submit your answer and R-code used for the analysis to YSCEC. Please include your student number and name in the header of the document. Your answer sheet should not exceed two A4 pages. Do not forget to submit the signed declaration.
Reviews
There are no reviews yet.