[Solved] NCTU-CS Assignment #2 -Decision Tree & Random Forest & KNN

$25

File Name: NCTU_CS_Assignment__2__Decision_Tree___Random_Forest___KNN.zip
File Size: 546.36 KB

SKU: [Solved] NCTU-CS Assignment #2 -Decision Tree & Random Forest & KNN Category: Tag:
5/5 - (1 vote)

Objective

  1. Data Input
  2. Data Preprocessing
    • Transform data format and shape so your model can process them.
    • Shuffle the data.
    • Transform label format so you can do the required two tasks described below Data section.
  3. Model Construction

For all the models, you need to do two tasks described below in the data section.

    • The data consists of both categorical and numerical features, and you have to treat them differently.
    • Three models must be constructed, Decision Tree, Random Forest, and K-Nearest Neighbor.
      1. For the Decision Tree model, you may use the following ID3 algorithm pseudocode. 10%
  1. ID3 (Examples, Target_Attribute, Attributes) Create a root node for the tree4. If all examples are positive, Return the single-node tree Root, with label = +.5. If all examples are negative, Return the single-node tree Root, with label = -.6. If the number of predicting attributes is empty, then Return the single node tree Root,7. with label = most common value of the target attribute in the examples.8. Otherwise Begin9. A The Attribute that best classifies examples.10. Decision Tree attribute for Root = A.11. For each possible value, vi, of A,12. Add a new tree branch below Root, corresponding to the test A = vi.13. Let Examples(vi) be the subset of examples that have the value vi for A14. If Examples(vi) is empty15. Then below this new branch add a leaf node with label = most common target value in the examples16. Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes {A})17. End18. Return Root

Note that you could implement any decision tree algorithm, not restrict to ID3. But you need to clarify which algorithm you used.

      1. For the Random Forest model, you must construct multiple Decision Tree models from randomly selected data (from the training subset) and perform voting for prediction. 10%
        • For the data selection, the following methods are all acceptable. You could choose one to implement.
          • Randomly select features
          • Randomly select samples
          • Both
        • The number of trees must be greater than or equal to 3. You need to try at least 3 different numbers of trees and compare the result.
        • Understand the difference between K-fold cross-validation and Random Forest. Confuse one with another, and you wont get this part of the score.
      2. For the KNN model, you need to modify categorical features to calculate distance. And you may need to normalize every feature to let your KNN model work as expected. 10%
        • You need to try at least 3 different K values and compare their results.
  1. Validation 5%
    • Two validation methods need to be implemented.
      1. Holdout validation with the ratio K-fold cross-validation with
          • Show the prediction and reasoning of 1-samples in the validation set. 10%
        • Random Forest
          • Describe the difference between boosting and bagging. 10%
        • KNN
          • Pick 2 features, draw and describe the KNN decision boundaries. 10%
            • You can pick 2 features to re-train the model, or just fix every other feature values.
          • Show the prediction and reasoning of 1-samples in the validation set. 10%
      2. Finish during class 20%
        • Submit your report and source codes to the newE3 system before class ends.
        • Finish time will be determined by the submission time.

      Data Student Performance Data Set

      • Data can be downloaded here:
      • Please NOTE that the last column is the label (G3)
      • Two datasets provided (Mathematics, Portuguese language) are both acceptable. You could choose one to analyze.
      • Followed by this paper, You will have to do 2 classification tasks:
        • Binary classification pass if
          1. school students school (binary: GP Gabriel Pereira or MS Mousinho da Silveira)
          2. sex students sex (binary: F female or M male)
          3. age students age (numeric: from 15 to 22)
          4. address students home address type (binary: U urban or R rural)
          5. famsize family size (binary: LE3 less or equal to 3 or GT3 greater than 3)
          6. Pstatus parents cohabitation status (binary: T living together or A apart)
          7. Medu mothers education (numeric: 0 none, 1 primary education (4th grade), 2 5th to 9th grade, 3 secondary education or 4 higher education)
          8. Fedu fathers education (numeric: 0 none, 1 primary education (4th grade), 2 5th to 9th grade, 3 secondary education or 4 higher education)
          9. Mjob mothers job (nominal: teacher, health care related, civil services (e.g. administrative or police), at_home or other)
          10. Fjob fathers job (nominal: teacher, health care related, civil services (e.g. administrative or police), at_home or other)
          11. reason reason to choose this school (nominal: close to home, school reputation, course preference or other)
          12. guardian students guardian (nominal: mother, father or other)
          13. traveltime home to school travel time (numeric: 1 <15 min., 2 15 to 30 min., 3 30 min. to 1 hour, or 4 >1 hour)
          14. studytime weekly study time (numeric: 1 <2 hours, 2 2 to 5 hours, 3 5 to 10 hours, or 4 >10 hours)
          15. failures number of past class failures (numeric: n if 1<=n<3, else 4)
          16. schoolsup extra educational support (binary: yes or no)
          17. famsup family educational support (binary: yes or no)
          18. paid extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
          19. activities extra-curricular activities (binary: yes or no)
          20. nursery attended nursery school (binary: yes or no)
          21. higher wants to take higher education (binary: yes or no)
          22. internet Internet access at home (binary: yes or no)
          23. romantic with a romantic relationship (binary: yes or no)
          24. famrel quality of family relationships (numeric: from 1 very bad to 5 excellent)
          25. freetime free time after school (numeric: from 1 very low to 5 very high)
          26. goout going out with friends (numeric: from 1 very low to 5 very high)
          27. Dalc workday alcohol consumption (numeric: from 1 very low to 5 very high)
          28. Walc weekend alcohol consumption (numeric: from 1 very low to 5 very high)
          29. health current health status (numeric: from 1 very bad to 5 very good)
          30. absences number of school absences (numeric: from 0 to 93)
          31. G1 first period grade (numeric: from 0 to 20)
          32. G2 second period grade (numeric: from 0 to 20)
          33. G3 final grade (numeric: from 0 to 20, output target)

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] NCTU-CS Assignment #2 -Decision Tree & Random Forest & KNN[Solved] NCTU-CS Assignment #2 -Decision Tree & Random Forest & KNN
$25