GOAL: Become familiar with use of WEKA workbench to invoke several data mining schemes. Use the graphical interfaces provided with Weka. Use the Explorer for Part 1, the Explorer and the Command Line Interface for Part 2, and the Experimenter for Part 3. See Weka home page for documentation.
1. Use the following learning schemes, with the default settings to analyse the weather data (in weather.arff): ZeroR (majority class); OneR; NaiveBayesSimple; J48
For test options, first choose Use training set, and then choose Percentage Split using default 66% percentage split. Report model percent error rate.
Which of these classifiers are you more likely to trust when determining whether to play? Why? What can you say about accuracy when using the training set data and when using a separate percentage to train?
2. Preparing and mining the data
- Take the file genes-leukemia.csv and convert it to Weka file genes-leukemia.arff.
Convert the file either using a text editor like emacs (brute force way) or find a Weka command that converts .csv file to .arff (a better way). - Target field is CLASS. Use J48 on genes-leukemia.arff with Use training set option.
- Use genes-leukemia.arff to create two subsets:
genes-leukemia-train.arff, with the first 38 samples (s1 s38) of the data genes-leukemia-test.arff, with the remaining 34 samples (s39 s72). - Train J48 on genes-leukemia-train.arff and specify Use training set as the test option. What decision tree do you get? What is its accuracy?
E. Now specify genes-leukemia-test.arff as the test set.
What decision tree do you get and how does its accuracy compare to the one in the previous question?
F. Now remove the field Source from the classifier (click checkmark next to Source, and click Remove). In this step you should remove Source from both training as well as testing set. Repeat steps D and E.
What do you observe? Does the accuracy on the test set improve and if so, why do you think it does?
G. Which classifier gives the highest accuracy on the test set?
Reviews
There are no reviews yet.