[SOLVED] Homework 2 (Deadline: Feb 16, 2022)

$25

File Name: Homework_2_(Deadline:_Feb_16,_2022).zip
File Size: 329.7 KB

5/5 - (1 vote)

Homework 2 (Deadline: Feb 16, 2022)
1. The Institute for Statistics Education at Statistics.com offers online courses in statistics and business analytics, and is seeking information that will help in packaging and sequencing courses. Consider the data in the file CourseTopics.csv. These data are for purchases of online statistics courses at Statistics.com. Each row represents the courses attended by a single customer. The firm wishes to assess alternative sequencings and bundling of courses. Use association rules to analyze these data (with support = 0.01, and confidence = 0.5), and interpret the first two of the resulting rules (ranked by the lift ratio).
rm(list=ls())
# Load and clean data.

Copyright By Assignmentchef assignmentchef

course.df <- read.csv(“Coursetopics.csv”) course.mat <- as.matrix(course.df) head(course.mat, 10)Intro DataMining Survey Cat.Data Regression Forecast DOE SW## [1,] 1 1 0 0 0 0 0 0 ## [2,] 0 0 1 0 0 0 0 0 ## [3,] 0 1 0 1 1 0 0 1 ## [4,] 1 0 0 0 0 0 0 0 ## [5,] 1 1 0 0 0 0 0 0 ## [6,] 0 1 0 0 0 0 0 0 ## [7,] 1 0 0 0 0 0 0 0 ## [8,] 0 0 0 1 0 1 1 1 ## [9,] 1 0 0 0 0 0 0 0 ##[10,]0 001 0000library(arules)## Loading required package: Matrix## Attaching package: ‘arules’## The following objects are masked from ‘package:base’:## abbreviate, write## Apriori## Parameter specification:##confidence minval smax aremaval originalSupport maxtime support minlen# Recast incidence matrix into transactions list.course.trans <- as(course.mat, “transactions”)# Generate rules with the highest lift.options(digits = 2, scipen = 1)rules <- apriori(course.trans, parameter = list(supp= 0.01, conf = 0.5, target = “rules”))## 0.50.11 none FALSE##maxlen targetext##10rules TRUETRUE 5 0.01 1## Algorithmic control:##filter tree heap memopt load sort verbose## 0.1 TRUE TRUEFALSE TRUE2TRUE## Absolute minimum support count: 3## set item appearances …[0 item(s)] done [0.00s].## set transactions …[8 item(s), 365 transaction(s)] done [0.00s].## sorting and recoding items … [8 item(s)] done [0.00s].## creating transaction tree … done [0.00s].## checking subsets of size 1 2 3 4 done [0.00s].## writing … [54 rule(s)] done [0.00s].## creating S4 object… done [0.00s]. inspect(head(sort(rules, by = “lift”), 5))## lhsrhssupport confidence coverage## [1] {Intro, Regression, Forecast}## [2] {Intro, Survey, DOE}## [3] {Intro, DataMining, Cat.Data}## [4] {Intro, DataMining, Regression} => {Forecast}
## [5] {Intro, Survey, Cat.Data}
## lift count
## [1] 4.05
## [2] 3.84
## [3] 3.66
## [4] 3.65
## [5] 3.65
=> {Forecast}
0.014 0.50
0.014 0.50
=> {DataMining} 0.014 0.71
=> {Cat.Data} 0.011 0.80
=> {Regression} 0.016 0.75
2. The file UniversalBankFull.csv contains data on 5000 customers of Universal Bank. The data include customer demographic information (age, income, etc.), the customers relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal.Loan). Among these 5000 customers, only 480 (=9.6%) accepted the personal loan that was offered to them in the earlier campaign. In this question, we focus on two predictors: Online (whether or not the customer is an active user of online banking services) and Credit Card (CreditCard, does the customer hold a credit card issued by the bank), and the outcome Personal.Loan. Partition the data into
history (60%) and future (40%) sets. Consider the task of classifying a new customer (in the future set) who owns a bank credit card and is actively using online banking services. Using the naive Bayes classifier. Find P(Personal.Loan = 1|CreditCard=1, Online=1) and P(CreditCard=0|Personal.Loan=1)?
rm(list=ls())
#load the data
bank.df <- read.csv(“UniversalBankFull.csv”)#consider only the required variablesbank.df <- bank.df[ , c(13, 14, 10)]bank.df$Online <- as.factor(bank.df$Online) bank.df$CreditCard <- as.factor(bank.df$CreditCard) bank.df$Personal.Loan <- as.factor(bank.df$Personal.Loan) str(bank.df)## ‘data.frame’:5000 obs. of3 variables:## $Online :Factorw/2levels”0″,”1″:1111122121…## $CreditCard :Factorw/2levels”0″,”1″:1111211211…## $Personal.Loan:Factorw/2levels”0″,”1″:1111111112… #partition the data into history (60%) and future (40%) sets#set the seed for the random number generator for reproducing the partition. set.seed(12345)ntotal <- length(bank.df$Personal.Loan)#Sample row numbers randomly.nhistory.index <- sort(sample(ntotal, round(ntotal * 0.6)))history.df <- bank.df[nhistory.index, ]future.df <- bank.df[-nhistory.index, ]#check if variables in the dataset are correctly identified for their typesstr(bank.df)## ‘data.frame’: 5000 obs. of 3 variables:## $Online :Factorw/2levels”0″,”1″:1111122121… ## $CreditCard :Factorw/2levels”0″,”1″:1111211211… ## $Personal.Loan:Factorw/2levels”0″,”1″:1111111112…str(history.df)## ‘data.frame’: 3000 obs. of 3 variables:## $Online :Factorw/2levels”0″,”1″:1212121112… ## $CreditCard :Factorw/2levels”0″,”1″:2121111121… ## $Personal.Loan:Factorw/2levels”0″,”1″:1111211211…## ‘data.frame’: 2000 obs. of 3 variables:## $actual:Factorw/2levels”0″,”1″:1111111112… ## $ X0 : num 0.903 0.903 0.903 0.903 0.904 …## $ X1 : num 0.0974 0.0974 0.0974 0.0974 0.0963 …head(loan.combined.df[future.df$Online==1 & future.df$CreditCard == 1, ])# Find P(Personal.Loan = 1|CreditCard=1, Online=1)library(e1071)loan.nb <- naiveBayes(Personal.Loan ~ Online + CreditCard, data = history.df)## predict probabilitiesloan.pred.prob <- predict(loan.nb, newdata = future.df, type = “raw”)loan.combined.df <- data.frame(actual = future.df$Personal.Loan, loan.pred.prob)str(loan.combined.df)actualX0X1 # Find P(CreditCard=0|Personal.Loan=1)## Naive Bayes Classifier for Discrete Predictors## naiveBayes.default(x = X, y = Y, laplace = laplace)## A-priori probabilities:## 0 1## 0.902 0.098## Conditional probabilities:## ##Y ## ## ## ## ##Y ## ##0 0.41 0.591 0.41 0.59 CreditCard0 0.71 0.291 0.70 0.30Draw 40000 random variables following the standard normal distribution. Plot the histogram.Histogram of r40000 rm(list=ls())set.seed(100)r40000 <- rnorm(40000)hist(r40000, breaks= 200, probability=T, xlab=”value”, ylab=”density”)4 2 0 2 4A human resource manager at a small university in the US has been considering a change to the structure of employee benefits (in terms of healthcare coverage and pension savings). To get an idea of how receptive the faculty, administrators, and staff members might be to the proposed changes, she has decided to conduct a survey in which n = 188 respondents could register their support or opposition.0.0 0.1 0.2 0.3 0.4Use R and the data set benefits.csv to answer the following questions:a. Find the 95% confidence interval estimate of p.b. What sample size would you recommend to achieve a margin of error of 0.02, with confidence 0.99?(use p = 1/2)## agree ##1 1 ##2 0 ##3 0 ##4 1 ##5 1 ##6 1 rm(list=ls())benefit.df <- read.csv(‘./benefits.csv’) # read data head(benefit.df) str(benefit.df)## ‘data.frame’:188 obs. of 1 variable: ## $agree:int 1001111111…t.test(benefit.df, conf.level = 0.95)##One Sample t-test## data:benefit.df## t = 18, df = 187, p-value <2e-16## alternative hypothesis: true mean is not equal to 0## 95 percent confidence interval:##0.56 0.70## sample estimates:## mean of x##0.63## [1] 4147PME <- 0.02conf.level <- 0.99alpha <- 1 – conf.levelz = qnorm(1 – alpha/2)n = z2*p*(1-p)/PME2 CS: assignmentchef QQ: 1823890830 Email: [email protected]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] Homework 2 (Deadline: Feb 16, 2022)
$25