5/5 - (1 vote)

Machine Learning 10-601
Tom M. Mitchell
Machine Learning Department Carnegie Mellon University
February 18, 2015
Today:
Graphical models
Bayes Nets:
Representing distributions
Conditional independencies
Simple inference
Simple learning
Readings:
Bishop chapter 8, through 8.2

Graphical Models
Key Idea:
Conditional independence assumptions useful
but Naive Bayes is extreme!
Graphical models express sets of conditional independence assumptions via graph structure
Graph structure plus associated parameters define joint probability distribution over set of variables
10-601
Two types of graphical models:
Directed graphs (aka Bayesian Networks)
Undirected graphs (aka Markov Random Fields)

Graphical Models Why Care?
AmongmostimportantMLdevelopmentsofthedecade
Graphicalmodelsallowcombining:
Prior knowledge in form of dependencies/independencies Prior knowledge in form of priors over parameters
Observed training data
Principledand~generalmethodsfor Probabilistic inference
Learning
Usefulinpractice
Diagnosis, help systems, text analysis, time series models,

Conditional Independence
Definition: X is conditionally independent of Y given Z, if the probability distribution governing X is independent of the value of Y, given the value of Z
Which we often write
E.g.,

Marginal Independence
Definition: X is marginally independent of Y if
Equivalently, if
Equivalently, if

Represent Joint Probability Distribution over Variables

Describe network of dependencies

Bayes Nets define Joint Probability Distribution in terms of this graph, plus parameters
Benefits of Bayes Nets:
Representthefulljointdistributioninfewer parameters, using prior knowledge about dependencies
Algorithmsforinferenceandlearning

Bayesian Networks Definition
A Bayes network represents the joint probability distribution over a collection of random variables
A Bayes network is a directed acyclic graph and a set of conditional probability distributions (CPDs)
Eachnodedenotesarandomvariable
Edgesdenotedependencies
For each node Xi its CPD defines P(Xi | Pa(Xi))
Thejointdistributionoverallvariablesisdefinedtobe
Pa(X) = immediate parents of X in the graph

Bayesian Network
StormClouds
Nodes = random variables
A conditional probability distribution (CPD) is associated with each node N, defining P(N | Parents(N))
Parents
P(W|Pa)
P(W|Pa)
L, R
0
1.0
L, R
0
1.0
L, R
0.2
0.8
L, R
0.9
0.1
Lightning
Rain
Thunder
WindSurf
WindSurf
The joint distribution over all variables:

Bayesian Network
StormClouds
What can we say about conditional independencies in a Bayes Net?
One thing is this:
Each node is conditionally independent of its non-descendents, given only its immediate parents.
Parents
P(W|Pa)
P(W|Pa)
L, R
0
1.0
L, R
0
1.0
L, R
0.2
0.8
L, R
0.9
0.1
Lightning
Rain
WindSurf
Thunder
WindSurf

Some helpful terminology
Parents = Pa(X) = immediate parents Antecedents = parents, parents of parents, Children = immediate children
Descendents = children, children of children,

Bayesian Networks
CPDforeachnodeXi describes P(Xi | Pa(Xi))
Chain rule of probability says that in general: But in a Bayes net:

StormClouds
How Many Parameters?
Parents
P(W|Pa)
P(W|Pa)
L, R
0
1.0
L, R
0
1.0
L, R
0.2
0.8
L, R
0.9
0.1
Lightning
Rain
WindSurf
Thunder
WindSurf
To define joint distribution in general?
To define joint distribution for this Bayes Net?

StormClouds
Inference in Bayes Nets
Parents
P(W|Pa)
P(W|Pa)
L, R
0
1.0
L, R
0
1.0
L, R
0.2
0.8
L, R
0.9
0.1
Lightning
Rain
WindSurf
Thunder
WindSurf
P(S=1, L=0, R=1, T=0, W=1) =

StormClouds
Learning a Bayes Net
Parents
P(W|Pa)
P(W|Pa)
L, R
0
1.0
L, R
0
1.0
L, R
0.2
0.8
L, R
0.9
0.1
Lightning
Rain
WindSurf
Thunder
WindSurf
Consider learning when graph structure is given, and data = { } What is the MLE solution? MAP?

Algorithm for Constructing Bayes Network
Choose an ordering over variables, e.g., X1, X2, Xn
Fori=1ton
Add Xi to the network
Select parents Pa(Xi) as minimal subset of X1 Xi-1 such that
Notice this choice of parents assures
(by chain rule)
(by construction)

Example
BirdfluandAllegiesbothcauseNasalproblems NasalproblemscauseSneezesandHeadaches

What is the Bayes Network for X1,X4 with NO assumed conditional independencies?

What is the Bayes Network for Naive Bayes?

What do we do if variables are mix of discrete and real valued?

Bayes Network for a Hidden Markov Model
Implies the future is conditionally independent of the past, given the present
Unobserved state:
Observed output:
St-2 St-1
Ot-2 Ot-1
St St+1 St+2
Ot Ot+1 Ot+2

What You Should Know
Bayesnetsareconvenientrepresentationforencoding dependencies / conditional independence
BN=GraphplusparametersofCPDs
Defines joint distribution over variables Can calculate everything else from that Though inference may be intractable
Readingconditionalindependencerelationsfromthe graph
Each node is cond indep of non-descendents, given only its parents
Explaining away
See Bayes Net applet: http://www.cs.cmu.edu/~javabayes/Home/applet.html

Inference in Bayes Nets
Ingeneral,intractable(NP-complete)
Forcertaincases,tractable
Assigning probability to fully observed set of variables
Or if just one variable unobserved
Or for singly connected graphs (ie., no undirected loops)
Belief propagation
Formultiplyconnectedgraphs
Junction tree
SometimesuseMonteCarlomethods
Generate many samples according to the Bayes Net distribution, then count up the results
Variationalmethodsfortractableapproximate solutions

Example
BirdfluandAllegiesbothcauseSinusproblems
SinusproblemscauseHeadachesandrunnyNose

Prob. of joint assignment: easy
Supposeweareinterestedinjoint assignment
What is P(f,a,s,h,n)?
lets use p(a,b) as shorthand for p(A=a, B=b)

Prob. of marginals: not so easy
HowdowecalculateP(N=n)?
lets use p(a,b) as shorthand for p(A=a, B=b)

Generating a sample from joint distribution: easy
How can we generate random samples drawn according to P(F,A,S,H,N)?
lets use p(a,b) as shorthand for p(A=a, B=b)

Generating a sample from joint distribution: easy
Note we can estimate marginals
like P(N=n) by generating many samples
from joint distribution, then count the fraction of samples for which N=n
Similarly, for anything else we care about P(F=1|H=1, N=0)
weak but general method for estimating any probability term
lets use p(a,b) as shorthand for p(A=a, B=b)

Prob. of marginals: not so easy
But sometimes the structure of the network allows us to be cleveravoid exponential work
eg., chain
ABCDE

Inference in Bayes Nets
Ingeneral,intractable(NP-complete)
Forcertaincases,tractable
Assigning probability to fully observed set of variables
Or if just one variable unobserved
Or for singly connected graphs (ie., no undirected loops)
Variable elimination Belief propagation
Formultiplyconnectedgraphs Junction tree
SometimesuseMonteCarlomethods
Generate many samples according to the Bayes Net
distribution, then count up the results
Variationalmethodsfortractableapproximate solutions

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[SOLVED] CS Bayesian network Hidden Markov Mode chain algorithm Bayesian Java Machine Learning 10-601

Reviews

Whatsapp Us

[SOLVED] CS Bayesian network Hidden Markov Mode chain algorithm Bayesian Java Machine Learning 10-601

Reviews

Related products

[SOLVED] COP 3223 Program #1: Vacation Planning

[Solved] Program that reads in the file climate_data_2017_numeric.csv

[Solved] Python Program 8 solved

[SOLVED] Naughty Receiver – Reliable Data Transfer

[Solved] Python program to figure out if it is better to pay off your loans or pay off the minimal

[Solved] Program that has three functions: sepia(), remove_all_red(), and gray_scale()