[SOLVED] CS代考计算机代写 Bayesian network Bayesian algorithm Machine Learning 10-601

30 $

File Name: CS代考计算机代写_Bayesian_network_Bayesian_algorithm_Machine_Learning_10-601.zip
File Size: 781.86 KB

SKU: 2695056619 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Machine Learning 10-601
Tom M. Mitchell
Machine Learning Department Carnegie Mellon University
February 25, 2015
Today:
•Graphical models
•Bayes Nets:
•Inference •Learning •EM
Readings:
•Bishop chapter 8
•Mitchell chapter 6

Midterm
•In class on Monday, March 2
•Closed book
•You may bring a 8.5×11 “cheat sheet” of notes
•Covers all material through today
•Be sure to come on time. We’ll start precisely at 12 noon

Bayesian Networks Definition
A Bayes network represents the joint probability distribution over a collection of random variables
A Bayes network is a directed acyclic graph and a set of conditional probability distributions (CPD’s)
•Eachnodedenotesarandomvariable
•Edgesdenotedependencies
•For each node Xi its CPD defines P(Xi | Pa(Xi))
•Thejointdistributionoverallvariablesisdefinedtobe
Pa(X) = immediate parents of X in the graph

What You Should Know
•Bayesnetsareconvenientrepresentationforencoding dependencies / conditional independence
•BN=GraphplusparametersofCPD’s
–Defines joint distribution over variables –Can calculate everything else from that –Though inference may be intractable
•Readingconditionalindependencerelationsfromthe graph
–Each node is cond indep of non-descendents, given only its parents
–X and Y are conditionally independent given Z if Z D-separates every path connecting X to Y
–Marginal independence : special case where Z={}

Inference in Bayes Nets
•Ingeneral,intractable(NP-complete)
•Forcertaincases,tractable
–Assigning probability to fully observed set of variables
–Or if just one variable unobserved
–Or for singly connected graphs (ie., no undirected loops)
•Belief propagation
•SometimesuseMonteCarlomethods
–Generate many samples according to the Bayes Net distribution, then count up the results
•Variationalmethodsfortractableapproximate solutions

Example
•BirdfluandAllegiesbothcauseSinusproblems
•SinusproblemscauseHeadachesandrunnyNose

Prob. of joint assignment: easy
•Supposeweareinterestedinjoint assignment
What is P(f,a,s,h,n)?
let’s use p(a,b) as shorthand for p(A=a, B=b)

Prob. of marginals: not so easy
•HowdowecalculateP(N=n)?
let’s use p(a,b) as shorthand for p(A=a, B=b)

Generating a sample from joint distribution: easy
How can we generate random samples drawn according to P(F,A,S,H,N)?
Hint: random sample of F according to P(F=1) = θF=1 : •drawavalueofruniformlyfrom[0,1]
•if r<θ then output F=1, else F=0let’s use p(a,b) as shorthand for p(A=a, B=b) Generating a sample from joint distribution: easyHow can we generate random samples drawn according to P(F,A,S,H,N)?Hint: random sample of F according to P(F=1) = θF=1 : •drawavalueofruniformlyfrom[0,1]•if r<θ then output F=1, else F=0Solution:•drawarandomvaluefforF,usingitsCPD•thendrawvaluesforA,forS|A,F,forH|S,forN|SGenerating a sample from joint distribution: easyNote we can estimate marginalslike P(N=n) by generating many samplesfrom joint distribution, then count the fraction of samples for which N=nSimilarly, for anything else we care about P(F=1|H=1, N=0)à weak but general method for estimating any probability term… Inference in Bayes Nets•Ingeneral,intractable(NP-complete)•Forcertaincases,tractable–Assigning probability to fully observed set of variables–Or if just one variable unobserved–Or for singly connected graphs (ie., no undirected loops)•Variable elimination •Belief propagation•OftenuseMonteCarlomethods–e.g., Generate many samples according to the Bayes Netdistribution, then count up the results–Gibbs sampling•Variationalmethodsfortractableapproximatesolutions see Graphical Models course 10-708 Learning of Bayes Nets•Fourcategoriesoflearningproblems–Graph structure may be known/unknown–Variable values may be fully observed / partly unobserved•Easycase:learnparametersforgraphstructureis known, and data is fully observed•Interestingcase:graphknown,datapartlyknown•Gruesomecase:graphstructureunknown,datapartly unobserved Learning CPTs from Fully Observed Data •Example:Considerlearning the parameter•MaxLikelihoodEstimateisFluAllergySinus HeadacheNosekth training example •Rememberwhy?δ(x) = 1 if x=true, = 0 if x=false let’s use p(a,b) as shorthand for p(A=a, B=b) MLE estimate of from fully observed data •Maximumlikelihoodestimate •Ourcase:FluSinusAllergyNose HeadacheEstimate from partly observed data •WhatifFAHNobserved,butnotS? •Can’tcalculateMLEFluSinusAllergyNose Headache •LetXbeallobservedvariablevalues(overallexamples) •LetZbeallunobservedvariablevalues•Can’tcalculateMLE:•WHAT TO DO?Estimate from partly observed data •WhatifFAHNobserved,butnotS? •Can’tcalculateMLEFluSinusAllergyNose •LetXbeallobservedvariablevalues(overallexamples) •LetZbeallunobservedvariablevalues•Can’tcalculateMLE:•EM seeks* to estimate:* EM guaranteed to find local maximumHeadache•EM seeks estimate:•here, observed X={F,A,H,N}, unobserved Z={S}FluSinusAllergyNoseHeadacheEM Algorithm – InformallyEM is a general procedure for learning from partly observed data Given observed variables X, unobserved Z (X={F,A,H,N}, Z={S})Begin with arbitrary choice for parameters θ Iterate until convergence:•E Step: estimate the values of unobserved Z, using θ•M Step: use observed values plus E-step estimates to derive a better θGuaranteed to find local maximum. Each iteration increasesEM Algorithm – PreciselyEM is a general procedure for learning from partly observed data Given observed variables X, unobserved Z (X={F,A,H,N}, Z={S}) Define Iterate until convergence:•E Step: Use X and current θ to calculate P(Z|X,θ) •M Step: Replace current θ by Guaranteed to find local maximum. Each iteration increasesE Step: Use X, θ, to Calculate P(Z|X,θ) observed X={F,A,H,N}, FluAllergySinusNoseunobserved Z={S}•How? Bayes net inference problem.Headache let’s use p(a,b) as shorthand for p(A=a, B=b) E Step: Use X, θ, to Calculate P(Z|X,θ) observed X={F,A,H,N}, FluAllergySinusNoseunobserved Z={S}•How? Bayes net inference problem.Headachelet’s use p(a,b) as shorthand for p(A=a, B=b) EM and estimatingobserved X = {F,A,H,N}, unobserved Z={S}FluSinusAllergyNoseHeadache E step: Calculate P(Zk|Xk; θ) for each training example, kM step: update all relevant parameters. For example: Recall MLE was: EM and estimatingFluSinusAllergyMore generally, Headache Nose Given observed set X, unobserved set Z of boolean values E step: Calculate for each training example, kthe expected value of each unobserved variableM step:Calculate estimates similar to MLE, but replacing each count by its expected countUsing Unlabeled Data to Help Train Naïve Bayes ClassifierLearn P(Y|X)YYX1X2X3X4100110010000010?0110?0101 X1 X2 X3 X4E step: Calculate for each training example, kthe expected value of each unobserved variableEM and estimatingGiven observed set X, unobserved set Y of boolean valuesE step: Calculate for each training example, kthe expected value of each unobserved variable YM step: Calculate estimates similar to MLE, but replacing each count by its expected count let’s use y(k) to indicate value of Y on kth exampleEM and estimatingGiven observed set X, unobserved set Y of boolean valuesE step: Calculate for each training example, kthe expected value of each unobserved variable YM step: Calculate estimates similar to MLE, but replacing each count by its expected countMLE would be:From [Nigam et al., 2000] Experimental Evaluation•Newsgrouppostings–20 newsgroups, 1000/group•Webpageclassification–student, faculty, course, project–4199 web pages•Reutersnewswirearticles –12,902 articles–90 topics categories 20 Newsgroupsword w ranked by P(w|Y=course) / P(w|Y ≠ course) Using one labeled example per class 20 Newsgroups Bayes Nets – What You Should Know•Representation–Bayes nets represent joint distribution as a DAG + ConditionalDistributions–D-separation lets us decode conditional independence assumptions•Inference–NP-hard in general–For some graphs, some queries, exact inference is tractable –Approximate methods too, e.g., Monte Carlo methods, …•Learning–Easy for known graph, fully observed data (MLE’s, MAP est.) –EM for partly observed data, known graph

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS代考计算机代写 Bayesian network Bayesian algorithm Machine Learning 10-601
30 $