- Build a decision tree by taking as input a maximum depth and by randomly splitting the dataset as 80/20 split i.e., 80% for training and 20% for testing. Provide the accuracy by averaging over 10 random 80/20 splits. Consider that particular tree which provides the best test accuracy as the desired one.
- What is the best possible depth limit to be used for your dataset. Provide a plot
explaining the same.
- Perform the pruning operation over the tree obtained in question 2 using a valid
statistical test for comparison.
- Print the final decision tree obtained from question 3 following the hierarchical levels of
data attributes as nodes of the tree.
- A brief report explaining the procedure and the results
Dataset:
- COVID-19 percentage rate (aggregated):
It contains the time-series percentage increase in COVID-19 cases worldwide. The attributes are date, confirmed cases, recovered cases, number of deaths, and increase rate of confirmed cases. Target attribute is the rate of increase in confirmed cases. Filename: PercentageIncreaseCOVIDWorldwide.csv
Reviews
There are no reviews yet.