An organization has collected data on customer visits, transactions, operating system, and gender and desiresto build a model to predict revenue. For the moment, the goal is to prepare the data for modeling. Analyze thedata set in the following manner:1. Either install Base R and R Studio on your computer or create an account at RStudio.cloud andthen learn how to build R Markdown Notebooks to execute your code and organize your output into areadable report. For those working on Windows, you may also use Microsoft Open R.2. Download this data set and then upload the data into RStudio Cloud. Each row represents acustomers interactions with the organizations web store. The rst column is the number of visits of acustomer, the second the number of transactions of that customer, the third column is the customersoperating system, and the fourth column is the customers reported gender, while the last column isrevenue, , the total amount spent by that customer.3. Calculate the following summative statistics: total transaction amount (revenue), mean numberof visits, median revenue, standard deviation of revenue, most common gender. Exclude any caseswhere there is a missing value.4. Create a bar/column chart of gender (x-axis) versus revenue (y-axis). Omit missing values, ,where gender is or missing.5.What is the Pearson Moment of Correlation between number of visits and revenue? Comment onthe correlation.6. Which columns have missing data? How did you recognize them? How would you imputemissing values?7. Impute missing transaction and gender values. Use the mean for transaction (rounded to thenearest whole number) and the mode for gender.8. Split the data set into two equally sized data sets where one can be used for training a modeland the other for validation. Take every odd numbered case and add them to the training data set andevery even numbered case and add them to the validation data set, i.e., row 1, 3, 5, 7, etc. are trainingdata while rows 2, 4, 6, etc. are validation data.9. Calculate the mean revenue for the training and the validation data sets and compare them.Comment on the dierence.10. For many data mining and machine learning tasks, there are packages in R. Use thefunction to split the data set, so that 60% is used for training and 20% is used for testing, and another20% is used for validation. To ensure that your code is reproducible and that everyone gets the samei.e.i.e.NAsample()https://da5030.weebly.com/practice-1.html 2/3result, use the number 77654 as your seed for the random number generator. Use the code fragmentbelow for reference:
Only logged in customers who have purchased this product may leave a review.
Reviews
There are no reviews yet.