Machine Learning 10-601
Tom M. Mitchell
Machine Learning Department Carnegie Mellon University
February 23, 2015
Today:
Graphical models
Bayes Nets:
Representing distributions
Conditional independencies
Simple inference
Simple learning
Readings:
Bishop chapter 8, through 8.2
Mitchell chapter 6
Bayes Nets define Joint Probability Distribution in terms of this graph, plus parameters
Benefits of Bayes Nets:
Representthefulljointdistributioninfewer parameters, using prior knowledge about dependencies
Algorithmsforinferenceandlearning
Bayesian Networks Definition
A Bayes network represents the joint probability distribution over a collection of random variables
A Bayes network is a directed acyclic graph and a set of conditional probability distributions (CPDs)
Eachnodedenotesarandomvariable
Edgesdenotedependencies
For each node Xi its CPD defines P(Xi | Pa(Xi))
Thejointdistributionoverallvariablesisdefinedtobe
Pa(X) = immediate parents of X in the graph
Bayesian Network
StormClouds
Nodes = random variables
A conditional probability distribution (CPD) is associated with each node N, defining P(N | Parents(N))
Parents
P(W|Pa)
P(W|Pa)
L, R
0
1.0
L, R
0
1.0
L, R
0.2
0.8
L, R
0.9
0.1
Lightning
Rain
Thunder
WindSurf
WindSurf
The joint distribution over all variables:
Bayesian Networks
CPDforeachnodeXi describes P(Xi | Pa(Xi))
Chain rule of probability: But in a Bayes net:
StormClouds
Inference in Bayes Nets
Parents
P(W|Pa)
P(W|Pa)
L, R
0
1.0
L, R
0
1.0
L, R
0.2
0.8
L, R
0.9
0.1
Lightning
Rain
WindSurf
Thunder
WindSurf
P(S=1, L=0, R=1, T=0, W=1) =
StormClouds
Learning a Bayes Net
Parents
P(W|Pa)
P(W|Pa)
L, R
0
1.0
L, R
0
1.0
L, R
0.2
0.8
L, R
0.9
0.1
Lightning
Rain
WindSurf
Thunder
WindSurf
Consider learning when graph structure is given, and data = { } What is the MLE solution? MAP?
Algorithm for Constructing Bayes Network
Choose an ordering over variables, e.g., X1, X2, Xn
Fori=1ton
Add Xi to the network
Select parents Pa(Xi) as minimal subset of X1 Xi-1 such that
Notice this choice of parents assures
(by chain rule)
(by construction)
Example
BirdfluandAllegiesbothcauseNasalproblems NasalproblemscauseSneezesandHeadaches
What is the Bayes Network for X1,X4 with NO assumed conditional independencies?
What is the Bayes Network for Naive Bayes?
What do we do if variables are mix of discrete and real valued?
Bayes Network for a Hidden Markov Model
Implies the future is conditionally independent of the past, given the present
Unobserved state:
Observed output:
St-2 St-1
Ot-2 Ot-1
St St+1 St+2
Ot Ot+1 Ot+2
Conditional Independence, Revisited
Wesaid:
Each node is conditionally independent of its non-descendents,
given its immediate parents.
Doesthisrulegiveusalloftheconditionalindependence relations implied by the Bayes network?
No!
E.g., X1 and X4 are conditionally indep given {X2, X3} But X1 and X4 not conditionally indep given X3
For this, we need to understand D-separation
X1
X4
X2
X3
Easy Network 1: Head to Tail
prove A cond indep of B given C? ie., p(a,b|c) = p(a|c) p(b|c)
A
C
B
lets use p(a,b) as shorthand for p(A=a, B=b)
Easy Network 2: Tail to Tail
A
C
B
prove A cond indep of B given C? ie., p(a,b|c) = p(a|c) p(b|c)
lets use p(a,b) as shorthand for p(A=a, B=b)
Easy Network 3: Head to Head
A
C
B
prove A cond indep of B given C? ie., p(a,b|c) = p(a|c) p(b|c)
lets use p(a,b) as shorthand for p(A=a, B=b)
Easy Network 3: Head to Head
prove A cond indep of B given C? NO!
Summary:
p(a,b)=p(a)p(b)
p(a,b|c)NotEqualp(a|c)p(b|c)
Explaining away. e.g.,
A=earthquake B=breakIn
C=motionAlarm
A
C
B
X and Y are conditionally independent given Z, if and only if X and Y are D-separated by Z.
[Bishop, 8.2.2] Suppose we have three sets of random variables: X, Y and Z
X and Y are D-separated by Z (and therefore conditionally indep, given Z) iff every path from every variable in X to every variable in Y is blocked
A path from variable X to variable Y is blocked if it includes a node in Z such that either
AZBAZB
1.arrows on the path meet either head-to-tail or tail-to-tail at the node and this node is in Z
2.or, the arrows meet head-to-head at the node, and neither the node, nor
any of its descendants, is in Z
ACB
D
X and Y are D-separated by Z (and therefore conditionally indep, given Z) iff every path from every variable in X to every variable in Y is blocked
A path from variable A to variable B is blocked if it includes a node such that either
1. arrows on the path meet either head-to-tail or tail-to-tail at the node and this node is in Z
2. or, the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, is in Z
X1 indep of X3 given X2? X3 indep of X1 given X2? X4 indep of X1 given X2?
X1
X4
X2
X3
X and Y are D-separated by Z (and therefore conditionally indep, given Z) iff every path from any variable in X to any variable in Y is blocked by Z
A path from variable A to variable B is blocked by Z if it includes a node such that either
1. arrows on the path meet either head-to-tail or tail-to-tail at the node and this node is in Z
2. the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, is in Z
X4 indep of X1 given X3?
X4 indep of X1 given {X3, X2}? X4 indep of X1 given {}?
X1
X4
X2
X3
X and Y are D-separated by Z (and therefore conditionally indep, given Z) iff every path from any variable in X to any variable in Y is blocked
A path from variable A to variable B is blocked if it includes a node such that either
1.arrows on the path meet either head-to-tail or tail-to-tail at the node and this node is in Z
2.or, the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, is in Z
a indep of b given c? a indep of b given f ?
Markov Blanket
from [Bishop, 8.2]
What You Should Know
Bayesnetsareconvenientrepresentationfor encoding dependencies / conditional independence
BN=GraphplusparametersofCPDs
Defines joint distribution over variables Can calculate everything else from that Though inference may be intractable
Readingconditionalindependencerelationsfromthe graph
Each node is cond indep of non-descendents, given only its parents
D-separation
Explaining away
Inference in Bayes Nets
Ingeneral,intractable(NP-complete)
Forcertaincases,tractable
Assigning probability to fully observed set of variables
Or if just one variable unobserved
Or for singly connected graphs (ie., no undirected loops)
Belief propagation
Formultiplyconnectedgraphs
Junction tree
SometimesuseMonteCarlomethods
Generate many samples according to the Bayes Net distribution, then count up the results
Variationalmethodsfortractableapproximate solutions
Reviews
There are no reviews yet.