[SOLVED] algorithm network Coursework Assignment 2

$25

File Name: algorithm_network_Coursework_Assignment_2.zip
File Size: 386.22 KB

5/5 - (1 vote)

Coursework Assignment 2

The Diabetes Data set (provided in arff. format is available on the Blackboard) contains information about patients affected by the Diabetes disease. The task is to predict if these patients have or have not diabetes (Histology: Yes or No).

Each instance represents individual patients and their various medical attributes along with diabetes classification

Number of Instances: 768

Number of Attributes: 9

1Pregnancies:Number of pregnancies
2PG Concentration:Plasma glucose at 2 hours in an oral glucose tolerance test
3Diastolic BP:Diastolic Blood Pressure (mm Hg)
4Tri Fold Thick:Triceps Skin Fold Thickness (mm)
5Serum Ins:2-Hour Serum Insulin (mu U/ml)
6BMI:Body Mass Index:(weight in kg/ (height in m)^2)
7DP Function:Diabetes Pedigree Function
8Age:Age (years)
9Diabetes:Whether or not the person has diabetes

You should use the Weka data mining package, which is installed in the university computers and also available to download from: http://www.cs.waikato.ac.nz/~ml/weka/

You should hand in a report covering the following:

Select a suitable tree building algorithm and build a model. Describe the validation method you are using (data split for training and test sets). Interpret the output results (the accuracy rates/metrics, which attributes were used to make predictions, how many nodes and leaves you obtained).
Give a detailed technical description of the classification model (which algorithm is used, the tree induction method, which attribute selection criteria is used and how). Include a diagram showing the structure of the model that you built.
Vary the following parameters of the algorithm, report changes in the tree structure and accuracy rates:
Set the REP parameter (Reduced Error Pruning) to TRUE. Explain the meaning of this operation. Report and discuss any change in the model structure and accuracy.
Change the confidence factor to 15%, report and discuss any impact.
Set the parameter unpruned to TRUE, Report and discuss impact. Discuss the pruning method used for this algorithm.
Use other 2 models of your choice (for example, neural networks or SVM) to predict the histology. Compare results and discuss possible reasons of better or worse performance.
Show a confusion matrix for the model and interpret it. Show a ROC curve and a Lift chart and interpret them.
Convert a subtree path of the decision tree into a set of rules along the following attributes: Plasma Mass Age Plasma Pedigree Class Yes.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] algorithm network Coursework Assignment 2
$25