5/5 - (1 vote)

CS771A Assignment 2

The Boys
Rajarshi Dutta 200762 Udvas Basak 201056 Shivam Pandey 200938
1 Solution 1
1.1 Theory

Figure 1: Illustration of a working ID3 algorithm
We have used the Iterative Dichotomizer decision tree algorithm which is considered one of the most robustly used algorithms for supervised machine learning classification tasks. The algorithm works by recursively partitioning the training data based on their attributes until the decision tree produces the purest splits (referred to as the leaves). ID3 uses a top-down greedy approach to build a decision tree. The tree is basically constructed from the top and the greedy approach means that at each iteration the best feature is selected to split the node.
1.2 Our approach
The decision tree constructed to solve the given Wordle-Solvr problem consists of the following components:
1.2.1 process node
• For the process node function, every non-root node present in the ID3 tree constructed is considered and passed through the get entropy function which calculates the index of the vocabulary that minimizes entropy the most or maximizes the Information Gain. This index can be used to access the most appropriate query that can be used at that step.
Preprint. Under review.
• All words from list all words list are extracted and reveal function helps to uncover the mask between the query and the required word which is stored in split dict as the key. The value of the split dict contains an array of indices that returns the given mask when queried.
get entropy
• The get entropy function iterates over all of the words present in that node (possible candidate queries) and passes the array to the function calc entropy which returns the entropy of the current node.
• Another dictionary is maintained which given the mask, corresponds to a number of queries(which when queried with the word provides that particular mask). These values are used to calculate the sum of weighted entropy after splitting the node.
• The difference of the weighted sum of entropies after splitting the node with the initialentropy calculated from the parent node gives us the information gain used to produce the best split out of all words.
(1)
where, V = possible values in Attribute A,
S = set of examples X,
Sv = subset where XA = V
• Information gain indirectly describes the mutual information between the attribute and theclass of labels S.
calc entropy
• This function simply calculates the Shannon Entropy based on the formula where p represents the probability of each class label which can be produced by the corresponding split based on their attribute.
E(p) = −p · log2(p) (2)
2 Solution 2
The entire algorithm has been implemented in the submit.py file.
2

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[SOLVED] Cs771 –

Reviews

Whatsapp Us

[SOLVED] Cs771 –

Reviews

Related products

[SOLVED] Cse6242 – hw 2: tableau, d3 graphs and visualization

[SOLVED] Dbs211 lab 01

[SOLVED] Cs771 –

[SOLVED] Cs771 –

[Solved] CS771 Homework 2

[SOLVED] Cs771 homework 1