[SOLVED] CS代考计算机代写 Bayesian algorithm Machine Learning 10-601

30 $

File Name: CS代考计算机代写_Bayesian_algorithm_Machine_Learning_10-601.zip
File Size: 621.72 KB

SKU: 3422457446 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Machine Learning 10-601
Tom M. Mitchell
Machine Learning Department Carnegie Mellon University
January 28, 2015
Today:
•Naïve Bayes
•discrete-valued Xi’s
•Document classification
•Gaussian Naïve Bayes
•real-valued Xi’s
•Brain image classification
Readings: Required:
•Mitchell: “Naïve Bayes and Logistic Regression”
(available on class website)
Optional
•Bishop 1.2.4
•Bishop 4.2

Recently:
•BayesclassifierstolearnP(Y|X)
•MLEandMAPestimatesforparametersofP
•Conditionalindependence
•Naïve Bayes à make Bayesian learning practical
Next:
•Textclassification
•NaïveBayesandcontinuousvariablesXi:
•GaussianNaïveBayesclassifier •LearnP(Y|X)directly
•Logisticregression,Regularization,Gradientascent •NaïveBayesorLogisticRegression?
•Generativevs.Discriminativeclassifiers

Naïve Bayes in a Nutshell
Bayes rule:
Assuming conditional independence among Xi’s: So, classification rule for Xnew = < X1, …, Xn > is:

Example: Live in Sq Hill? P(S|G,D,B)
•S=1 iff live in Squirrel Hill
•G=1 iff shop at SH Giant Eagle
P(S=1) : P(D=1 | S=1) : P(D=1 | S=0) : P(G=1 | S=1) : P(G=1 | S=0) : P(B=1 | S=1) : P(B=1 | S=0) :
Tom: D=1, G=0, B=0 P(S=1|D=1,G=0,B=0) =
•D=1 iff Drive or Carpool to CMU •B=1 iff Birthday is before July 1
P(S=0) : P(D=0 | S=1) : P(D=0 | S=0) : P(G=0 | S=1) : P(G=0 | S=0) : P(B=0 | S=1) : P(B=0 | S=0) :
P(S=1) P(D=1|S=1) P(G=0|S=1) P(B=0|S=1) _____________________________________________________________________________
[P(S=1) P(D=1|S=1) P(G=0|S=1) P(B=0|S=1) + P(S=0) P(D=1|S=0) P(G=0|S=0) P(B=0|S=0)]

Another way to view Naïve Bayes (Boolean Y): Decision rule: is this quantity greater or less than 1?

Another way to view Naïve Bayes (Boolean Y): Decision rule: is this quantity greater or less than 1?

Naïve Bayes: classifying text documents
•Classify which emails are spam?
•Classify which emails promise an attachment?
How shall we represent text documents for Naïve Bayes?

Learning to classify documents: P(Y|X)
•Ydiscretevalued. –e.g., Spam or not
•X= =document
•Xi is a random variable describing…

Learning to classify documents: P(Y|X)
•Ydiscretevalued. –e.g., Spam or not
•X= =document
•Xi is a random variable describing…
Answer 1: Xi is boolean, 1 if word i is in document, else 0
e.g., Xpleased = 1
Issues?

Learning to classify documents: P(Y|X)
•Ydiscretevalued. –e.g., Spam or not
•X= =document
•Xi is a random variable describing…
Answer 2:
•Xi represents the ith word position in document
•X1 = “I”, X2 = “am”, X3 = “pleased”
•and,let’sassumetheXiareiid(indep,identicallydistributed)

Learning to classify document: P(Y|X) the “Bag of Words” model
•Y discrete valued. e.g., Spam or not
•X= =document
•Xi are iid random variables. Each represents the word at its position i in the document
•Generatingadocumentaccordingtothisdistribution= rolling a 50,000 sided die, once for each word position in the document
•Theobservedcountsforeachwordfollowa???distribution

Multinomial Distribution

Multinomial Bag of Words
aardvark 0 about 2 all 2 Africa 1 apple 0 anxious 0 …
gas 1 …
oil 1 …
Zaire 0

MAP estimates for bag of words
Map estimate for multinomial
What β’s should we choose?

Naïve Bayes Algorithm – discrete Xi
•TrainNaïveBayes(examples) for each value yk
estimate
for each value xij of each attribute Xi
estimate •Classify(Xnew)
prob that word xij appears in position i, given Y=yk
* Additional assumption: word probabilities are position independent

For code and data, see
www.cs.cmu.edu/~tom/mlbook.html
click on “Software and Data”

What if we have continuous Xi ? Eg., image classification: Xi is real-valued ith pixel

What if we have continuous Xi ? Eg., image classification: Xi is real-valued ith pixel
Naïve Bayes requires P(Xi | Y=yk), but Xi is real (continuous)
Common approach: assume P(Xi | Y=yk) follows a Normal (Gaussian) distribution

What if we have continuous Xi ? Eg., image classification: Xi is real-valued ith pixel
Naïve Bayes requires P(Xi | Y=yk), but Xi is real (continuous)
Common approach: assume P(Xi | Y=yk) follows a Normal (Gaussian) distribution

Gaussian
Distribution
(also called “Normal”)
p(x) is a probability density function, whose integral (not sum) is 1

What if we have continuous Xi ? Gaussian Naïve Bayes (GNB): assume
Sometimes assume variance •is independent of Y (i.e., σi), •or independent of Xi (i.e., σk) •or both (i.e., σ)

Gaussian Naïve Bayes Algorithm – continuous Xi (but still discrete Y)
•Train Naïve Bayes (examples) for each value yk
estimate*
for each attribute Xi estimate
•class conditional mean , variance •Classify (Xnew)
* probabilities must sum to 1, so need estimate only n-1 parameters…

jth training example
ith feature
kth class
δ()=1 if (Yj=yk) else 0
Estimating Parameters: Y discrete, Xi continuous Maximum likelihood estimates:

How many parameters must we estimate for Gaussian Naïve Bayes if Y has k possible values, X= ?

GNB Example: Classify a person’s cognitive state, based on brain image
•reading a sentence or viewing a picture?
•reading the word describing a “Tool” or “Building”? •answering the question, or getting confused?

Mean activations over all training examples for Y=“bottle”
Y is the mental state (reading “house” or “bottle”) Xi are the voxel activities,
this is a plot of the μ’s defining P(Xi | Y=“bottle”)
average
fMRI activation
high
below average

Classification task: is person viewing a “tool” or “building”?
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
statistically significant p<0.05 p4 p8 p6p11p5 p7p10p9 p2p12p3 p1ParticipantsClassification accuracyClassification accuracy Where is information encoded in the brain?Accuracies of cubical 27-voxel classifiers centered at each significant voxel [0.7-0.8]Naïve Bayes: What you should know•DesigningclassifiersbasedonBayesrule•Conditionalindependence –What it is–Why it’s important•NaïveBayesassumptionanditsconsequences–Which (and how many) parameters must be estimated under different generative models (different forms for P(X|Y) )•and why this matters•HowtotrainNaïveBayesclassifiers –MLE and MAP estimates–with discrete and/or continuous inputs XiQuestions to think about:•Can you use Naïve Bayes for a combination of discrete and real-valued Xi?•How can we easily model just 2 of n attributes as dependent?•What does the decision surface of a Naïve Bayes classifier look like?•How would you select a subset of Xi’s?

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS代考计算机代写 Bayesian algorithm Machine Learning 10-601
30 $