The tasks in this assignment aim at visualizing data with R programming and drawing business insights.
Part I. Employees Leaving ():part1.csv
:
?:
1.Satisfaction Level
2.Last evaluation
3.Number of projects
4.Average monthly working hours
5.Time spent at the company (years)
6.Whether they have had a work accident
7.Whether they have had a promotion in the last 5 years Departments (column sales)
8.Salary
9.Whether the employee has left
:
Q1. Load the data to your R system. How many variables and observations are in the data?
Q2. Generate the descriptive statistics for each variable.
Q3. Visualize the relationships between variables. Can you find any interesting
relationship?
Q4. Compare the employees who has left and who has remained. Visualize the comparison with a histogram.
Part II. FinTech Data ():part2.csv :
The data is from a FinTech company in Hong Kong. The company lends money to a loan applicant. The company has a decision engine to automatically classify loan applications into approve, reject, and manual review. Once the application is classified into manual review by the engine, reviewers make a subjective judgement to reclassify the case into approve or reject Fields in the dataset include:
id: loan application ID
loan_amount: requested amount in HKD
tenor: requested repayment periods in months
age: the applicants age at the time of the application
month_of_service: employment period of the current job
residential_status: rent, own, others
monthly_repayment: amount of monthly repayment for other loans
monthly_income: average monthly income during last three months
self_employed: whether the applicant is self-employed
bankrupted: whether the applicant is (was) bankrupted
housewife: whether the applicant is housewife
currently_employed: whether the applicant is employed as a full-time job
channel: loan application channel
language: tc (traditional Chinese), en (English)
manual_review: whether the application was manually reviewed
approved: whether the application was approved
manual_approved: whether the application was manually approved
credit_score: the applicants credit score (higher is better)
friends_facebook: the number of Facebook friends (the value NA indicates that the applicant did not want to provide his/her account with the company)
time_application: time of the day when the application was submitted
location: the location of the applicant when s/he submitted the application
default: whether the repayment is overdue as of June 2017
:
Q1. Load the data to your R system. How many variables and observations are in the data?
Q2. How many are currently employed? How many are self-employed among the currently employed?
Q3. What is the average monthly income of the whole sample? What is the average monthly income of the currently employed?
Q4. Generate the histogram of loan_amount. Can you find any interesting pattern from the graph? Can you guess the reason why the graph has the shape?
Reviews
There are no reviews yet.