[SOLVED] CS Bayesian network Bayesian algorithm Machine Learning 10-601

$25

File Name: CS_Bayesian_network_Bayesian_algorithm_Machine_Learning_10-601.zip
File Size: 584.04 KB

5/5 - (1 vote)

Machine Learning 10-601
Tom M. Mitchell
Machine Learning Department Carnegie Mellon University
February 25, 2015
Today:
Graphical models
Bayes Nets:
Inference Learning EM
Readings:
Bishop chapter 8
Mitchell chapter 6

Midterm
In class on Monday, March 2
Closed book
You may bring a 8.511 cheat sheet of notes
Covers all material through today
Be sure to come on time. Well start at 11am sharp

Bayesian Networks Definition
A Bayes network represents the joint probability distribution over a collection of random variables
A Bayes network is a directed acyclic graph and a set of conditional probability distributions (CPDs)
Eachnodedenotesarandomvariable
Edgesdenotedependencies
For each node Xi its CPD defines P(Xi | Pa(Xi))
Thejointdistributionoverallvariablesisdefinedtobe
Pa(X) = immediate parents of X in the graph

What You Should Know
Bayesnetsareconvenientrepresentationforencoding dependencies / conditional independence
BN=GraphplusparametersofCPDs
Defines joint distribution over variables Can calculate everything else from that Though inference may be intractable
Readingconditionalindependencerelationsfromthe graph
Each node is cond indep of non-descendents, given only its parents
X and Y are conditionally independent given Z if Z D-separates every path connecting X to Y
Marginal independence : special case where Z={}

Inference in Bayes Nets
Ingeneral,intractable(NP-complete)
Forcertaincases,tractable
Assigning probability to fully observed set of variables
Or if just one variable unobserved
Or for singly connected graphs (ie., no undirected loops)
Belief propagation
SometimesuseMonteCarlomethods
Generate many samples according to the Bayes Net distribution, then count up the results
Variationalmethodsfortractableapproximate solutions

Example
BirdfluandAllegiesbothcauseSinusproblems
SinusproblemscauseHeadachesandrunnyNose

Prob. of joint assignment: easy
Supposeweareinterestedinjoint assignment
What is P(f,a,s,h,n)?
lets use p(a,b) as shorthand for p(A=a, B=b)

Prob. of marginals: not so easy
HowdowecalculateP(N=n)?
lets use p(a,b) as shorthand for p(A=a, B=b)

Generating a sample from joint distribution: easy
How can we generate random samples drawn according to P(F,A,S,H,N)?
Hint: random sample of F according to P(F=1) = F=1 : drawavalueofruniformlyfrom[0,1]
if r< then output F=1, else F=0lets use p(a,b) as shorthand for p(A=a, B=b)Generating a sample from joint distribution: easyHow can we generate random samples drawn according to P(F,A,S,H,N)?Hint: random sample of F according to P(F=1) = F=1 : drawavalueofruniformlyfrom[0,1]if r< then output F=1, else F=0Solution:drawarandomvaluefforF,usingitsCPDthendrawvaluesforA,forS|A,F,forH|S,forN|SGenerating a sample from joint distribution: easyNote we can estimate marginalslike P(N=n) by generating many samplesfrom joint distribution, then count the fraction of samples for which N=nSimilarly, for anything else we care about P(F=1|H=1, N=0) weak but general method for estimating any probability term… Inference in Bayes NetsIngeneral,intractable(NP-complete)Forcertaincases,tractableAssigning probability to fully observed set of variablesOr if just one variable unobservedOr for singly connected graphs (ie., no undirected loops)Variable elimination Belief propagationOftenuseMonteCarlomethodse.g., Generate many samples according to the Bayes Netdistribution, then count up the resultsGibbs samplingVariationalmethodsfortractableapproximatesolutions see Graphical Models course 10-708 Learning of Bayes NetsFourcategoriesoflearningproblemsGraph structure may be known/unknownVariable values may be fully observed / partly unobservedEasycase:learnparametersforgraphstructureis known, and data is fully observedInterestingcase:graphknown,datapartlyknownGruesomecase:graphstructureunknown,datapartly unobservedLearning CPTs from Fully Observed Data Example:Considerlearning the parameterMaxLikelihoodEstimateisFluAllergy SinusHeadacheNosekth training example Rememberwhy?(x) = 1 if x=true, = 0 if x=false lets use p(a,b) as shorthand for p(A=a, B=b) MLE estimate of from fully observed data MaximumlikelihoodestimateOurcase:FluSinusAllergyNoseHeadache Estimate from partly observed data WhatifFAHNobserved,butnotS? CantcalculateMLEFluSinusAllergyNose Headache LetXbeallobservedvariablevalues(overallexamples) LetZbeallunobservedvariablevaluesCantcalculateMLE:WHAT TO DO?Estimate from partly observed data WhatifFAHNobserved,butnotS? CantcalculateMLEFluSinusAllergyNose HeadacheLetXbeallobservedvariablevalues(overallexamples) LetZbeallunobservedvariablevaluesCantcalculateMLE:EM seeks* to estimate:* EM guaranteed to find local maximum EM seeks estimate:here, observed X={F,A,H,N}, unobserved Z={S}FluSinusAllergyNoseHeadache EM Algorithm – InformallyEM is a general procedure for learning from partly observed data Given observed variables X, unobserved Z (X={F,A,H,N}, Z={S})Begin with arbitrary choice for parameters Iterate until convergence:E Step: estimate the values of unobserved Z, using M Step: use observed values plus E-step estimates to derive a better Guaranteed to find local maximum. Each iteration increasesEM Algorithm – PreciselyEM is a general procedure for learning from partly observed data Given observed variables X, unobserved Z (X={F,A,H,N}, Z={S}) DefineIterate until convergence:E Step: Use X and current to calculate P(Z|X,) M Step: Replace current by Guaranteed to find local maximum. Each iteration increasesE Step: Use X, , to Calculate P(Z|X,) observed X={F,A,H,N}, FluAllergySinusNoseunobserved Z={S}How? Bayes net inference problem.Headache lets use p(a,b) as shorthand for p(A=a, B=b) E Step: Use X, , to Calculate P(Z|X,) observed X={F,A,H,N}, FluAllergySinusNoseunobserved Z={S}How? Bayes net inference problem.Headachelets use p(a,b) as shorthand for p(A=a, B=b) EM and estimatingobserved X = {F,A,H,N}, unobserved Z={S}FluSinusAllergyNoseHeadache E step: Calculate P(Zk|Xk; ) for each training example, kM step: update all relevant parameters. For example:Recall MLE was: EM and estimatingFluSinusAllergyMore generally, Headache Nose Given observed set X, unobserved set Z of boolean values E step: Calculate for each training example, kthe expected value of each unobserved variableM step:Calculate estimates similar to MLE, but replacing each count by its expected countUsing Unlabeled Data to Help Train Naive Bayes ClassifierLearn P(Y|X)YYX1X2X3X4100110010000010?0110?0101X1 X2 X3 X4E step: Calculate for each training example, kthe expected value of each unobserved variableEM and estimatingGiven observed set X, unobserved set Y of boolean valuesE step: Calculate for each training example, kthe expected value of each unobserved variable YM step: Calculate estimates similar to MLE, but replacing each count by its expected count lets use y(k) to indicate value of Y on kth exampleEM and estimatingGiven observed set X, unobserved set Y of boolean valuesE step: Calculate for each training example, kthe expected value of each unobserved variable YM step: Calculate estimates similar to MLE, but replacing each count by its expected countMLE would be:From [Nigam et al., 2000] Experimental EvaluationNewsgrouppostings20 newsgroups, 1000/groupWebpageclassificationstudent, faculty, course, project4199 web pagesReutersnewswirearticles 12,902 articles90 topics categories 20 Newsgroupsword w ranked by P(w|Y=course) / P(w|Y = course) Using one labeled example per class 20 Newsgroups Bayes Nets What You Should KnowRepresentationBayes nets represent joint distribution as a DAG + ConditionalDistributionsD-separation lets us decode conditional independence assumptionsInferenceNP-hard in generalFor some graphs, some queries, exact inference is tractable Approximate methods too, e.g., Monte Carlo methods, …LearningEasy for known graph, fully observed data (MLEs, MAP est.) EM for partly observed data, known graph

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS Bayesian network Bayesian algorithm Machine Learning 10-601
$25