[SOLVED] R语言代写:COMP60711: Data Engineering – Part 2 Lab1

30 $

File Name: R语言代写:COMP60711:_Data_Engineering_–_Part_2_Lab1.zip
File Size: 555.78 KB

SKU: 3797798679 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


GOAL: Become familiar with use of WEKA workbench to invoke several data mining schemes. Use the graphical interfaces provided with Weka. Use the Explorer for Part 1, the Explorer and the Command Line Interface for Part 2, and the Experimenter for Part 3. See Weka home page for documentation. —————————————————————

1. Use the following learning schemes, with the default settings to analyse the weather data (in weather.arff): ZeroR (majority class); OneR; NaiveBayesSimple; J48

For test options, first choose “Use training set”, and then choose “Percentage Split” using default 66% percentage split. Report model percent error rate.

Which of these classifiers are you more likely to trust when determining whether to play? Why? What can you say about accuracy when using the training set data and when using a separate percentage to train?

2. Preparing and mining the data

  1. Take the file genes-leukemia.csv and convert it to Weka file genes-leukemia.arff.
    Convert the file either using a text editor like emacs (brute force way) or find a Weka command that converts .csv file to .arff (a better way).
  2. Target field is CLASS. Use J48 on genes-leukemia.arff with “Use training set” option.
  3. Use genes-leukemia.arff to create two subsets:
    genes-leukemia-train.arff, with the first 38 samples (s1 … s38) of the data genes-leukemia-test.arff, with the remaining 34 samples (s39 … s72).
  4. Train J48 on genes-leukemia-train.arff and specify “Use training set” as the test option. What decision tree do you get? What is its accuracy?

E. Now specify genes-leukemia-test.arff as the test set.
What decision tree do you get and how does its accuracy compare to the one in the previous question?

F. Now remove the field “Source” from the classifier (click checkmark next to Source, and click Remove). In this step you should remove “Source” from both training as well as testing set. Repeat steps D and E.
What do you observe? Does the accuracy on the test set improve and if so, why do you think it does?

G. Which classifier gives the highest accuracy on the test set?

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] R语言代写:COMP60711: Data Engineering – Part 2 Lab1
30 $