[SOLVED] CS代考计算机代写 Bayesian network Bayesian chain algorithm Bias, Variance and Error

30 $

File Name: CS代考计算机代写_Bayesian_network_Bayesian_chain_algorithm_Bias,_Variance_and_Error.zip
File Size: 847.8 KB

SKU: 9024563409 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Bias, Variance and Error

Bias and Variance
given algorithm that outputs estimate the bias of the estimator:
the variance of estimator:
e.g., estimator for probability n independent coin flips
what is its bias? variance?
for , we define:
of heads, based on

Bias and Variance
given algorithm that outputs estimate for , we define: the bias of the estimator:
the variance of estimator:
which estimator has higher bias? higher variance?

Bias – Variance decomposition of error
Reading: Bishop chapter 9.1, 9.2
•Consider simple regression problem f:XàY y = f(x) + ε
noise N(0,σ) deterministic
Define the expected prediction error:
expectation over training D
learned estimate of f(x)

Sources of error
What if we have perfect learner, infinite data? – Our learned h(x) satisfies h(x)=f(x)
– Still have remaining, unavoidable error
σ2

Sources of error
•What if we have only n training examples? •What is our expected error
– Taken over random training sets of size n, drawn from distribution D=p(x,y)

Sources of error

L2 vs. L1 Regularization
Gaussian P(W)
àL2 regularization
Laplace P(W)
àL1 regularization
w2
constant P(Data|W)
w2
w1
w1
constant P(W)

Summary
•Biasofparameterestimators
•Varianceofparameterestimators
•Wecandefineanalogousnotionsforestimators (learners) of functions
•Expectederrorinlearnedfunctionscomesfrom
–unavoidable error (invariant of training set size, due to noise)
–bias (can be caused by incorrect modeling assumptions) –variance (decreases with training set size)
•MAPestimatesgenerallymorebiasedthanMLE –but bias vanishes as training set size à
•RegularizationcorrespondstoproducingMAPestimates –L2 / Gaussian prior / leads to smaller weights
–L1 / Laplace prior / leads to fewer non-zero weights

Machine Learning 10-601
Tom M. Mitchell
Machine Learning Department Carnegie Mellon University
February 18, 2015
Today:
•Graphical models
•Bayes Nets:
•Representing distributions
•Conditional independencies
•Simple inference
•Simple learning
Readings:
•Bishop chapter 8, through 8.2

Graphical Models
•Key Idea:
–Conditional independence assumptions useful
–but Naïve Bayes is extreme!
–Graphical models express sets of conditional independence assumptions via graph structure
–Graph structure plus associated parameters define joint probability distribution over set of variables
10-601
•Two types of graphical models:
–Directed graphs (aka Bayesian Networks)
–Undirected graphs (aka Markov Random Fields)

Graphical Models – Why Care?
•AmongmostimportantMLdevelopmentsofthedecade
•Graphicalmodelsallowcombining:
–Prior knowledge in form of dependencies/independencies –Prior knowledge in form of priors over parameters
–Observed training data
•Principledand~generalmethodsfor –Probabilistic inference
–Learning
•Usefulinpractice
–Diagnosis, help systems, text analysis, time series models, …

Conditional Independence
Definition: X is conditionally independent of Y given Z, if the probability distribution governing X is independent of the value of Y, given the value of Z
Which we often write
E.g.,

Marginal Independence
Definition: X is marginally independent of Y if
Equivalently, if
Equivalently, if

Represent Joint Probability Distribution over Variables

Describe network of dependencies

Bayes Nets define Joint Probability Distribution in terms of this graph, plus parameters
Benefits of Bayes Nets:
•Representthefulljointdistributioninfewer parameters, using prior knowledge about dependencies
•Algorithmsforinferenceandlearning

Bayesian Networks Definition
A Bayes network represents the joint probability distribution over a collection of random variables
A Bayes network is a directed acyclic graph and a set of conditional probability distributions (CPD’s)
•Eachnodedenotesarandomvariable
•Edgesdenotedependencies
•For each node Xi its CPD defines P(Xi | Pa(Xi))
•Thejointdistributionoverallvariablesisdefinedtobe
Pa(X) = immediate parents of X in the graph

Bayesian Network
StormClouds
Nodes = random variables
A conditional probability distribution (CPD) is associated with each node N, defining P(N | Parents(N))
Parents
P(W|Pa)
P(¬W|Pa)
L, R
0
1.0
L, ¬R
0
1.0
¬L, R
0.2
0.8
¬L, ¬R
0.9
0.1
Lightning
Rain
Thunder
WindSurf
WindSurf
The joint distribution over all variables:

Bayesian Network
StormClouds
What can we say about conditional independencies in a Bayes Net?
One thing is this:
Each node is conditionally independent of its non-descendents, given only its immediate parents.
Parents
P(W|Pa)
P(¬W|Pa)
L, R
0
1.0
L, ¬R
0
1.0
¬L, R
0.2
0.8
¬L, ¬R
0.9
0.1
Lightning
Rain
WindSurf
Thunder
WindSurf

Some helpful terminology
Parents = Pa(X) = immediate parents Antecedents = parents, parents of parents, … Children = immediate children
Descendents = children, children of children, …

Bayesian Networks
•CPDforeachnodeXi describes P(Xi | Pa(Xi))
Chain rule of probability says that in general:
But in a Bayes net:

StormClouds
How Many Parameters?
Parents
P(W|Pa)
P(¬W|Pa)
L, R
0
1.0
L, ¬R
0
1.0
¬L, R
0.2
0.8
¬L, ¬R
0.9
0.1
Lightning
Rain
WindSurf
Thunder
WindSurf
To define joint distribution in general?
To define joint distribution for this Bayes Net?

StormClouds
Inference in Bayes Nets
Parents
P(W|Pa)
P(¬W|Pa)
L, R
0
1.0
L, ¬R
0
1.0
¬L, R
0.2
0.8
¬L, ¬R
0.9
0.1
Lightning
Rain
WindSurf
Thunder
WindSurf
P(S=1, L=0, R=1, T=0, W=1) =

StormClouds
Learning a Bayes Net
Parents
P(W|Pa)
P(¬W|Pa)
L, R
0
1.0
L, ¬R
0
1.0
¬L, R
0.2
0.8
¬L, ¬R
0.9
0.1
Lightning
Rain
WindSurf
Thunder
WindSurf
Consider learning when graph structure is given, and data = { } What is the MLE solution? MAP?

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS代考计算机代写 Bayesian network Bayesian chain algorithm Bias, Variance and Error
30 $