- Download the breast-cancer-dataset.csv from your D2L Assignment 1 link. Complete the following tasks :
- Read the file in SAS and display the contents using the import and print procedures.
- Develop a decision tree-based classification model using the hpsplit procedure of SAS.
- Navigate the contents of Results View by clicking on HPSplit breastcancer-dataset, and then by selecting Model Assessment. Examine the confusion matrix, fit statistics, and variable importance.
- Using the confusion matrix, compute the following assessment metrics accuracy, recall, and precision (see lecture for formulas).
Condition for marks: 3 points for accuracy, 1 point for precision, and 1 point for recall.
- Change the grow algorithm to gini and recompute the metrics from question 2. Does entropy build a more accurate classifier or gini?
Reference: UCI Machine Learning Repository [Breastcancer dataset]
Reviews
There are no reviews yet.