[SOLVED] 程序代写 Classification

30 $

File Name: 程序代写_Classification.zip
File Size: 254.34 KB

SKU: 1733588037 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Classification

What is a classification?
• Examples:

Copyright By PowCoder代写加微信 assignmentchef

• Analyze data to learn which loan applicants are risky and which are safe
• Analyze breast cancer data to predict which of the three specific treatments is better
• The classification task is aimed at predicting the class (category or label) of data
• It is a prediction problem
• Classes are discrete values that are not characterized by an order relationship

Classification: a two-step process
Learning (training) step
• Each tuple/sample is assumed to belong to a predefined class, as determined by the
class label attribute
• The set of tuples used for model construction is training set
• The model is represented as classification rules, decision trees, or mathematical formulae
Classification step
• Estimate accuracy of the model
• Theknownlabeloftestsampleiscomparedwiththeclassifiedresultfromthemodel
• Accuracyrateisthepercentageoftestsetsamplesthatarecorrectlyclassifiedbythemodel • Testsetisindependentoftrainingset(otherwiseoverfitting)
• If the accuracy is acceptable, use the model to classify new data

Learning step: example
Training Data
Classification Algorithms
Classifier (Model)
Assistant Prof
IF rank = ‘professor’ OR years > 6
THEN tenured = ‘yes’

Classification step: example
Classifier
Testing Data
Unseen Data
(Jeff, Professor, 4)
Assistant Prof
Associate Prof

Decision tree (1)
• Flow-chart like structure
• Each internal node denotes a test on an attribute • Each branch represents an outcome of the test
• Each leaf node represents a label/class
• Given a tuple X, its attribute values are tested against the decision tree, visiting a path in the tree
• The leaf node reached holds the predicted class for X
• Decision trees can easily be converted into classification rules

Decision tree (2)
Why are decision tree classifiers so popular?
• Their construction does not require domain knowledge or parameter setting
• They can handle multidimensional data
• Their representation is intuitive
• They usually have good accuracy

Decision tree: example
o3v1e.r.4c0ast student? yes
age income student credit_rating buys_comput
credit rating?
excellent fair

Decision tree induction: algorithm
• Principle
• Greedy algorithm
• Tree is constructed in a top-down recursive divide-and-conquer manner
• Algorithm
• At start, all the training examples are at the root
• Tuple are partitioned recursively based on selected attributes
• Test attributes are selected on the basis of a heuristic or statistical measure
(e.g., information gain)
• Stopping conditions
• All samples for a given node belong to the same class
• There are no remaining attributes for further partitioning – majority voting is
employed for classifying the leaf
• There are no tuples left in a branch

Partitioning scenarios
Discrete values
Continuous values
Discrete values producing a binary tree

age income student credit_rating buys_comput
credit_ra ting
buys_comp uter
credit_rat ing
buys_comp uter
credit_rat ing
buys_comp uter
Decision tree induction: example (1)
o3v1e.r.4c0ast

Decision tree induction: example (2)
o3v1e.r.4c0ast student? yes
age income student credit_rating buys_comput
credit rating?
excellent fair
credit_ra ting
buys_comp uter
buys_comp uter
buys_comp uter
credit_ra ting
buys_comp uter

Decision tree induction: example (3)
o3v1e.r.4c0ast student? yes
age income student credit_rating buys_comput
credit rating?
excellent fair

Attribute Selection Measure
• Heuristic for selecting the splitting criterion that best separates a given data partition of class-labeled training tuples into individual classes
• Ideally each partition would be pure, that is, all tuples in a given partition belong to the same class
• Provides a ranking of attributes
• Select the attribute with highest score for the measure • Determine a split point or a splitting subset
• Split tuples according to the selected criterion
• Popular measures: information gain, gain ratio, gini index, …

Information gain: entropy
• Entropy: measures the uncertainty associated with a random variable • For a discrete random variable Y taking m distinct values {y1,…,ym}:
H ” = − ∑ /’ 0 1 & ‘ l o g ( & – )
where pi = P(Y=yi)
• Higher entropyàhigher uncertainty • Lower entropyàlower uncertainty

Information gain
• Choose the attribute with highest information gain
• Minimizes the information needed to classify the tuples in the resulting
partitions
• Reflects least randomness or impurity in partitions
• Information gain minimizes the expected number of tests for classification

Information gain: Info(D)
• Expected information needed to classify a tuple in D is
I n f o 5 = − ∑ /’ 0 1 & ‘ l o g 2 ( & – )
• m: number of classes
• pi: probability that an arbitrary tuple in D belongs to class Ci, estimated as
|Ci,D|/|D| where Ci,D is the set of tuples in class Ci in D
• log2: base 2 is used since information is encoded in bits
• Info(D) is the entropy of D
• Info(D) measures the average amount of information needed to identify the class label of a tuple in D

Info(D): example
Credit_rating
Buy_computer
middle aged
middle aged
middle aged
middle aged
C1 represents ‘yes’ C2 represents ‘no’

Info(D): example
Credit_rating
Buy_computer
middle aged
middle aged
middle aged
middle aged
C1 represents ‘yes’, |C1|=9 C2 represents ‘no’, |C2|=5
Info(D)= − 7 9:; 7 − < 9:; < 18 2 18 18 2 18Information gain: InfoA(D)• If A is a discrete attribute with v values {a1,…,av}, D is split into v partitions {D1,…,Dv}• Measure the amount of information still needed to obtain an exact classification after attribute A has been used for partitioningI n f o A 5 = ∑ D@ 0 1 | ? @ | A B C : 5 @ |?|• It measures the expected information required to classify a tuple in D based on the partitioning of A• The smaller the expected information still required, the greater the purity of the partitions InfoA(D): exampleCredit_ratingBuy_computermiddle agedmiddle agedmiddle agedmiddle agedConsider attribute age• D1 represents ‘youth’• D2 prepresents ‘middle aged’• D3 represents ‘senior’• 3yes • 2noInfoage(D)= InfoA(D): exampleCredit_ratingBuy_computermiddle agedmiddle agedmiddle agedmiddle agedConsider attribute age• D1 represents ‘youth’• D2 prepresents ‘middle aged’• D3 represents ‘senior’• 3yes • 2noInfo (D)= < −E9:; E−F9:; F + age 18 < 2< < 2<8 −89:;8 + 18 8 28< −F9:; F−E9:; E 18 < 2< < 2<Information gain: Gain(A)• Information gain for attribute A is the difference between the original information requirement and the one obtained partitioning accordingGain K =Info 5 −InfoA 5• The attribute with highest gain is chosen as splitting attribute at node NGain(A): exampleCredit_ratingBuy_computermiddle agedmiddle agedmiddle agedmiddle agedInfo(D) = 0.940Infoage(D) = 0.694Gain(age) = 0.940 – 0.694 = 0.246Gain(income) = 0.029 Gain(student) = 0.151 Gain(credit_rating) = 0.048Age has the highest gain and is selected Continuous valued attributes• If A is a continuous attribute, it is necessary to find the best split point• order the values of A in increasing order• the midpoint between each pair of adjacent values is considered as a possible split-point (ai + ai+1)/2 is the split point between ai and ai+1• evaluate InfoA 5 for each split point and choose the one with minimum InfoA 5• Once identified the split point• D1: set of tuples with A <= split_point • D2: set of tuples with A > split_point

Gain ratio
• Information gain measure is biased towards attributes with a large number of values
• Gain ratio applies a normalization to information gain using a split information value computed as
SplitInfo 5 =−∑D 9:; A @01|?| 2 |?|
GainRatio(A)=Gain(A)/SplitInfoA 5

GainRatio(A): example
Credit_rating
Buy_computer
middle aged
middle aged
middle aged
middle aged
Consider attribute income
• D1 represents ‘high’ (4 tuples)
• D2 prepresents ‘medium’ (6 tuples)
• D3 represents ‘low’ (4 tuples)
SplitInfo (D)=−89:;8 −
6 9:; 6 − 14 2 14
8 9:; 8 18 2 18
= 1.557 GainRatio(income)= 0.029/1.557 = 0.019

Overfitting and tree pruning
• Overfitting: An induced tree may overfit the training data
• Too many branches, some may reflect anomalies due to noise or outliers • Poor accuracy for unseen samples
• Two approaches to avoid overfitting
• Prepruning: Halt tree construction early ̵ do not split a node if this would result in the goodness measure falling below a threshold
• Difficult to choose an appropriate threshold
• Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees
• Use a set of data different from the training data to decide which is the “best pruned tree”

Tree pruning: example

Bayesian classification
• A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities
• Foundation: Based on Bayes’ Theorem.
• Performance: A simple Bayesian classifier, naïve Bayesian classifier, has comparable
performance with decision tree and selected neural network classifiers
• Incremental: Each training example can incrementally increase/decrease the probability that an hypothesis is correct — prior knowledge can be combined with observed data
• Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

Bayes’ Theorem
• X is a tuple, called evidence
• H is an hypothesis, such as the fact that X belongs to class C
• Classification problem: determine P(H|X)
• P(H|X) is the probability that H holds given evidence of tuple X (posterior
probability)
• According to Bayes’ theorem R S T = U T S U(V) U(W)

Probabilities
• P(H) is the prior probability that H holds, independently from X
• the probability that a costumer will buy a computer (H), regardless of age, income, or
any other information
• P(X) is the probability that tuple X is observed
• the probability that a person from our set of costumers is 35 years-old and earns 40k (X)
• P(H|X) is the posterior probability of H conditioned X
• probability that costumer will buy a computer (H) given that we know the costumers’
age, which is 35, and income, which is 40k (X)
• P(X|H) is the posterior probability of X conditioned H
• probability that costumer X, is 35 years-old and earns 40k (X), given that we know that the costumer will buy a computer (H)

Naıve Bayesian Classification (1)
• D: training set of tuples, each represented by an n-dimensional vector X=(x1,…,xn) over attributes A1,…,An
• C1, …, Cm: set of m classes
• Given a tuple X, it belongs to the class having the highest posterior
probability conditioned on X:
P(Ci|X) > P(Cj|X) for 1<=j<=m, j<>i
• Note that R X- T = U T X- U(Y’) and P(X) is constant for all classes,
thus we need to maximize R T X- R(X-)

Naıve Bayesian Classification (2)
• If class prior probabilities are not known, it is common to assume that P(C1)=P(C2)=…=P(Cn) and we need to maximize R T X-
• For datasets with many attributes, it is expensive to compute R T X- • Naive assumption: class conditional independence, hence
R T X- = ∏]^01 R([|X-) = P x1 Ci × ⋯×P(xn|Ci)

Estimating R([|X-)
• R([|X-) is estimated from training tuples
• Categorical attributes: number of tuples in class Ci in D having value xk for attribute Ak divided by |Ci,D|
• Continuous valued attributes: assuming that Ak has a Gaussian (normal) distribution with mean d and standard deviation e
2s2 g(x,μ,s)= 1 e-(x-μ)2
2p s P(X|Ci)=g(xk,μCi ,sCi )

Naıve Bayesian Classifier: example (1)
C1: buys_computer = ‘yes’ C2: buys_computer = ‘no’
Data to be classified: X = (age <=30,income = medium, student = yes credit_rating = fair)redit_rbautiyns Naıve Bayesian Classifier: example (2)P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 P(buys_computer = “no”) = 5/14= 0.357Compute P(X|Ci) for each classP(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4Compute P(X|Ci)[X = (age<=30 , income=medium, student=yes, credit_rating = fair)]P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019Compute P(X|Ci)*P(Ci)P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007àX belongs to class (“buys_computer = yes”)age income studentcredit_rbautiynsg_comp 0-probability problem• Naïve Bayesian prediction requires each conditional probability to be non-zero. Otherwise, the predicted probability will be zero• Suppose a dataset with 1000 tuples, income=low (0), income= medium (990), and income = high (10)• Use Laplacian correction (or Laplacian estimator) • Adding 1 to each caseProb(income = low) = 1/1003 Prob(income = medium) = 991/1003 Prob(income = high) = 11/1003• The “corrected” prob. estimates are close to their “uncorrected” counterparts Naıve Bayesian Classifier: pros and cons• Advantages• Easy to implement• Good results obtained in most of the cases• Disadvantages• Assumption: class conditional independence, therefore loss of accuracy • Practically, dependencies exist among variables• E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.• Dependencies among these cannot be modeled by Naïve Bayes ClassifierAccuracy and error measures• How can we measure the accuracy of a classi程序代写 CS代考加微信: assignmentchef QQ: 1823890830 Email: [email protected]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 程序代写 Classification
30 $