[SOLVED] R C math graph network Bayesian Gaussian Processes Part I The Linear Algebra of Inference

$25

File Name: R_C_math_graph_network_Bayesian_Gaussian_Processes__Part_I_The_Linear_Algebra_of_Inference.zip
File Size: 847.8 KB

5/5 - (1 vote)

Gaussian ProcessesPart I The Linear Algebra of Inference
Philipp Hennig
MLSS 2013 29 August 2013
Max Planck Institute for Intelligent Systems Department of Empirical Inference
Tubingen, Germany
1,

Carl Friedrich Gauss 17771855
Paying Tolls with A Bell
fxe 22
1
x2
2
2,

The Gaussian distribution
Multivariate Form
N x; ,
4 2 0 1 4 6 8
1 exp1 x 1 x 2N212 2
x,RN,RNN
is positive semidefinite, i.e.
vv0forallvRN
Hermitian, all eigenvalues0
8 6 4
2 0 2 4
3,

Why Gaussian?
an experiment
40
20
0.1 0 0.1 0 5102 5102
nothing in the real world is Gaussian except sums of i.i.d. variables
But nothing in the real world is linear either!
Gaussians are for inference what linear maps are for algebra.
4,

Closure Under Multiplication
multiple Gaussian factors form a Gaussian
N x; a, AN x; b, BN x; c, CN a; b, AB CA1B11 cCA1aB1b
8 6 4
2 0 2
44 2 0 1
4 6 8
5,

Closure Under Multiplication
multiple Gaussian factors form a Gaussian
N x; a, AN x; b, BN x; c, CN a; b, AB CA1B11 cCA1aB1b
8 6 4
2 0 2
44 2 0 1
4 6 8
5,

Closure Under Multiplication
multiple Gaussian factors form a Gaussian
N x; a, AN x; b, BN x; c, CN a; b, AB CA1B11 cCA1aB1b
8 6 4
2 0 2
44 2 0 1
4 6 8
5,

Closure under Linear Maps
Linear Maps of Gaussians are Gaussians
8 6 4
2 0 2
pzNz;,
pAzN Az, A, AA
Here: A1, 0.5
4
4 2 0 1 4 6 8
6,

Closure under Marginalization
projections of Gaussians are Gaussian
8 6 4
2 0 2

projection with A1 0
xxxxy xxx Nx; ,dyNx; ,
this is the sum rule
px,ydy pyxpxdypx
so every finitedim Gaussian is a marginal of infinitely many more
y y yx yy
4
4 2 0 1 4 6 8
7,

Closure under Conditioning
cuts through Gaussians are Gaussians
pxy px,y Nx;1y ,1py x xy yy y xx xy yy yx
8 6 4
2 0 2
4
4 2 0 1 4 6 8
this is the product rule
so Gaussians are closed under
the rules of probability
8,

Bayesian Inference
explaining away
5
0
px N x; ,
Nx1; 1 ,32 0 x2 0.5 0 32
05
9,

Bayesian Inference
explaining away
5
0
px N x; ,
Nx1; 1 ,32 0
pyx,N y; Ax; 2
N 6;1 0.61,2 x2
05
x2 0.5 0 32
9,

Bayesian Inference
explaining away
5
0
px N x; ,
Nx1; 1 ,32 0
px2,y pxpyx 05
x2 0.5 0 32
pyx,N y; Ax; 2
N 6;1 0.61,2
x2 px
9,

Bayesian Inference
explaining away
5
px N x; ,
Nx1; 1 ,32 0
0
pyx,N y; Ax; 2
N 6;1 0.61,2
px2,y pxpyx 05
x2 0.5 0 32
x2 px
N x; AAA21yA, AAA21A
N x1 ; 3.9 ,3.4 3.4 x2 2.3 3.4 7.0
9,

What can we do with this?
linear regression
given yRN, pyf, whats f? 20
10
0
10
8 6 4 2 0 2 4 6 8 x
10 ,
y

A prior
over linear functions
fxw1 w2xxw pwNw;,
1 pfNf;,xx xxx
11 ,

A prior
over linear functions
fxw1 w2xxw pwNw;,
1 pfNf;,xx xxx
12 ,

The posterior
over linear functions
pyw,XNy;Xw,2I pwy,XNw;XXX 2I1yX,
XXX 2I1Xx
13 ,

The posterior
over linear functions
pyw,XNy;Xw,2I pfxy,XNfx;xxXXX 2I1yX,
xxxX X X2I1X x
13 ,

prior on w
F2;
phiabsxfunpower,a,0:F1; muzerosF,1;
SigmaeyeF;
number of featuresa1; a
features of x
marginal stddev, for plottinggives Y,X,sigma
prior on fx
n100; xlinspace6,6,n;
phixphix;
m phixmu;
kxxphixSigmaphix;pfxNm,kxx sbsxfunplus,m,cholkxx1.0e8eyenrandnn,3;samples from prior
stdpisqrtdiagkxx; loaddata.mat; NlengthY;
prior
phiXMkXX
GR
kxXA
mpostvpostspost
stdpo
on YfX
phiX;
phiXmu;
phiXSigmaphiX;
kXXsigma2eyeN; cholG;

features of datapfXNM,kXX
pYNM,kXX 2I most expensive step: ON 3
covfx, fX kxXprecompute for reuse
phixSigma kxXR;
phiX;
pfxYNmkxXkXX 2I1Y M, sqrtdiagvpost;marginal stddev, for plotting
mARYM;
kxxAA;kxx kxXkXX 2I1kXx bsxfunplus,mpost,cholvpost1.0e8eyenrandnn,3;samples
pwN,test points
14 ,

A More Realistic Dataset
General Linear Regression
20
10
0
10
fxx w
?
8 6 4 2 0 2 4 6 8 x
15 ,
y

fxw1 w2xxw
1
x
x
16 ,

prior on w
F2;
phiabsxfunpower,a,0:F1; muzerosF,1;
SigmaeyeF;
number of featuresa1; a
features of x
marginal stddev, for plottinggives Y,X,sigma
prior on fx
n100; xlinspace6,6,n;
phixphix;
m phixmu;
kxxphixSigmaphix;pfxNm,kxx sbsxfunplus,m,cholkxx1.0e8eyenrandnn,3;samples from prior
stdpisqrtdiagkxx; loaddata.mat; NlengthY;
prior
phiXMkXX
GR
kxXA
mpostvpostspost
stdpo
on YfX
phiX;
phiXmu;
phiXSigmaphiX;
kXXsigma2eyeN; cholG;

features of datapfXNM,kXX
pYNM,kXX 2I most expensive step: ON 3
covfx, fX kxXprecompute for reuse
phixSigma kxXR;
phiX;
pfxYNmkxXkXX 2I1Y M, sqrtdiagvpost;marginal stddev, for plotting
mARYM;
kxxAA;kxx kxXkXX 2I1kXx bsxfunplus,mpost,cholvpost1.0e8eyenrandnn,3;samples
pwN,test points
17 ,

Cubic Regression
phiabsxfunpower,a,0:3;
fxxw x1 x x.2 x.3
18 ,

Cubic Regression
phiabsxfunpower,a,0:3;
fxxw x1 x x.2 x.3
18 ,

Septic Regression ?
phiabsxfunpower,a,0:7;
fxxw x1 x x.2x.7
19 ,

Septic Regression ?
phiabsxfunpower,a,0:7;
fxxw x1 x x.2x.7
19 ,

Fourier Regression
phia2cosbsxfuntimes,a8,0:8, sinbsxfuntimes,a8,1:8;
xcosx cos2x cos3x . . . sinx sin2x . . .
20 ,

Fourier Regression
phia2cosbsxfuntimes,a8,0:8, sinbsxfuntimes,a8,1:8;
xcosx cos2x cos3x . . . sinx sin2x . . .
20 ,

Step Regression
phia12bsxfunlt,a,linspace8,8,16;
x12x8 8x x7 7x
21 ,

Step Regression
phia12bsxfunlt,a,linspace8,8,16;
x12x8 8x x7 7x
21 ,

Another Kind of Step Regression
phiabsxfungt,a,linspace8,8,16;
xx8 8x x7 7x
22 ,

Another Kind of Step Regression
phiabsxfungt,a,linspace8,8,16;
xx8 8x x7 7x
22 ,

V Regression
phiabsxfunminus,absbsxfunminus,a,linspace8,8,16,linspace8,8,16;
xx88 x77 x66
23 ,

V Regression
phiabsxfunminus,absbsxfunminus,a,linspace8,8,16,linspace8,8,16;
xx88 x77 x66
23 ,

Legendre Regression
phiabsxfuntimes,legendre13,a8,0.15.0:13;
xb0P0x, b1P1x, . . . , b13P13x Pnx1 dn x21n 2nn! dxn
24 ,

Legendre Regression
phiabsxfuntimes,legendre13,a8,0.15.0:13;
xb0P0x, b1P1x, . . . , b13P13x Pnx1 dn x21n 2nn! dxn
24 ,

Eiffel Tower Regression
phiaexpabsbsxfunminus,a,8:1:8;
xex8 ex7 ex6 . . .
25 ,

Eiffel Tower Regression
phiaexpabsbsxfunminus,a,8:1:8;
xex8 ex7 ex6 . . .
25 ,

Bell Curve Regression
phiaexp0.5bsxfunminus,a,8:1:8.2;
121212 xe2 x8 e2 x7 e2 x6
26 ,

Bell Curve Regression
phiaexp0.5bsxfunminus,a,8:1:8.2;
121212 xe2 x8 e2 x7 e2 x6
26 ,

Multiple Inputs
all this works for in multiple dimensions, too
RNR fRNR
27 ,

Multiple Inputs
all this works for in multiple dimensions, too
28 ,

Multiple Outputs
slightly more confusing, but no algebraic problem
RRM fRRM covfit,fjtl,itl,jt l
f1t1,,f1tN,f2t1,,f2tN,,fMt1,fMtN
are just some covarying Gaussian variables requires careful matrix algebra

29 ,

Multiple Outputs
learning paths
RRM fRRM covfit,fjtl,itl,jt l
f1t1,,f1tN,f2t1,,f2tN,,fMt1,fMtN
are just some covarying Gaussian variables requires careful matrix algebra

30 ,

Multiple Outputs
learning paths
RRM fRRM covfit,fjtl,itl,jt l
f1t1,,f1tN,f2t1,,f2tN,,fMt1,fMtN
are just some covarying Gaussian variables requires careful matrix algebra

31 ,

How many features should we use?
lets look at that algebra again
pfxy,XNfx;xxXXX 2I1yX, xxxX X X2I1X x

theres no lonelyin there
all objects involvingare of the form
the mean function
the kernel
once these are known, cost is independent of the number of features
remember the code:
MphiXmu;
mphixmu;
kXXphiXSigmaphiX; kxxphixSigmaphix;
kxXphixSigmaphiX;
pfXNM,kXXpfxNm,kxx
covfx,fXkxX
32 ,

prior on w
F2;
phiabsxfunpower,a,0:F1; muzerosF,1;
SigmaeyeF;
number of featuresa1; a
features of x
marginal stddev, for plottinggives Y,X,sigma
prior on fx
n100; xlinspace6,6,n;
phixphix;
m phixmu;
kxxphixSigmaphix;pfxNm,kxx sbsxfunplus,m,cholkxx1.0e8eyenrandnn,3;samples from prior
stdpisqrtdiagkxx; loaddata.mat; NlengthY;
prior
phiXMkXX
GR
kxXA
mpostvpostspost
stdpo
on YfX
phiX;
phiXmu;
phiXSigmaphiX;
kXXsigma2eyeN; cholG;

features of datapfXNM,kXX
pYNM,kXX 2I most expensive step: ON 3
covfx, fX kxXprecompute for reuse
phixSigma kxXR;
phiX;
pfxYNmkxXkXX 2I1Y M, sqrtdiagvpost;marginal stddev, for plotting
mARYM;
kxxAA;kxx kxXkXX 2I1kXx bsxfunplus,mpost,cholvpost1.0e8eyenrandnn,3;samples
pwN,test points
33 ,

prior
Fphikmu
2; absxfunpower,a,0:F; a,bphiaphib; azerossizea,1;
number of featuresa1; akernelmean function
belief on fx
n100; xlinspace6,6,n;
mmux;
kxxkx,x;pfxNm,kxx sbsxfunplus,m,cholkxx1.0e8eyenrandnn,3;samples from prior
stdpisqrtdiagkxx; loaddata.mat; NlengthY;
test points
marginal stddev, for plottinggives Y,X,sigma
prior
MkXX
GR
kxXA
mpostvpostspost
stdpo
on YfX muX; kX,X;
kXXsigma2eyeN; cholG;
kx,X;
pfXNM,kXXpYNM,kXX 2I
kxXR;
mARYM;
precompute for reuse
pfxYNmkxXkXX 2I1Y M, kxxAA;kxx kxXkXX 2I1kXx bsxfunplus,mpost,cholvpost1.0e8eyenrandnn,3;samples
sqrtdiagvpost;marginal stddev, for plotting

most expensive step: ON 3 covfx , fX kxX
34 ,

Features are cheap, so lets use a lot
an example DJC MacKay, 1998
For simplicity, lets fix 2 cmax cminI
The elements of xx are
2c c F
F
x x max min xxij lilj
l1
phiaexp0.5bsxfunminus,a,8:1:8.2.s.2;
xexpxcl2
l
xixj
F

2c cx x 2 F cl1xi xj2 max min exp i j exp 2
22
2c cF x c 2 x c 2max min exp i l exp j l
F l1 22 22
F 42l 2
35 ,

Features are cheap, so lets use a lot
an example DJC MacKay, 1998
x x 2 F cl1xi xj2 x x max min exp i j exp 2

now increase F , such thatof features in c becomes xixj
ij
2c c
F 42l 2
F c cmax cmin
expx x 2 cmax expc 1xi xj2 dc 2ij2
42 cmin 2
let cmin , cmax
xixj 22expxixj2 42

36 ,

Exponentiated Squares
phiaexp0.5bsxfunminus,a,linspace8,8,10.2 .ell.2;
37 ,

Exponentiated Squares
phiaexp0.5bsxfunminus,a,linspace8,8,30.2 .ell.2;
37 ,

Exponentiated Squares
ka,b5exp0.25bsxfunminus,a,b.2;

aka. radial basis function, squaredexponential kernel
37 ,

Exponentiated Squares
ka,b5exp0.25bsxfunminus,a,b.2;

aka. radial basis function, squaredexponential kernel
37 ,

What just happened?
kernelization to infinitely many features
Definition
A function kXXR is a Mercer kernel if, for any finite collection Xx1,,xN, the matrix kXXRNN with elements
kXX,i,jkxi,xj is positive semidefinite.
38 ,
Lemma
Any kernel that can be written as
kx, x lxlx dl
is a Mercer kernel. assuming integral over positive set
Proof: XXN , vRN
2
vkXXv N vilxiN vjlxjdl vilxi dl0iji

What just happened?
Gaussian process priors
Definition
A function kXXR is a Mercer kernel if, for any finite collection Xx1,,xN, the matrix kXXRNN with elements
kXX,i,jkxi,xj is positive semidefinite.
Let XR be any function, kXXR be a Mercer kernel.
A Gaussian process pfGPf;,k is a probability distribution over the function fXR, such that every finite restriction to function values fX fx1,,fxNisaGaussiandistributionpfXNfX;X,kXX.
39 ,
Definition

Those step functions
phiabsxfungt,a,linspace8,8,5.sqrt5;
40 ,

Those step functions
phiabsxfungt,a,linspace8,8,20.sqrt20;
40 ,

Those step functions
phiabsxfungt,a,linspace8,8,100.sqrt100;
40 ,

Those step functions
ka,btheta.2bsxfunmin,a8,b816;
covfxi,fxjxi cxj cdcminxi,xjcmin cmin

aka. the Wiener process
40 ,

Those step functions
ka,btheta.2bsxfunmin,a8,b816;
f1 f2 f3 f4 f5
y1 y2 y3 y4 y5
40 ,

Those other stepfunctions
phia12bsxfunlt,a,linspace8,8,5; Wahba, 1990
41 ,

Those other stepfunctions
phia12bsxfunlt,a,linspace8,8,20; Wahba, 1990
41 ,

Those other stepfunctions
phia12bsxfunlt,a,linspace8,8,100; Wahba, 1990
41 ,

Those other stepfunctions
ka,b1c2cabsbsxfunminus,a,b16; Wahba, 1990
covf , f 1b 12x c12x c1dc1b2bx xxixj0ij ij
aka. linear splines
41 ,

Those other stepfunctions
ka,b1c2cabsbsxfunminus,a,b16; Wahba, 1990
covf , f 1b 12x c12x c1dc1b2bx xxixj0ij ij
aka. linear splines
41 ,

Those linear features
Wahba, 1990
phiabsxfunminus,absbsxfunminus,a,linspace8,8,5,linspace8,8,5;
42 ,

Those linear features
Wahba, 1990
phiabsxfunminus,absbsxfunminus,a,linspace8,8,20,linspace8,8,20;
42 ,

Those linear features
Wahba, 1990
phiabsxfunminus,absbsxfunminus,a,linspace8,8,100,linspace8,8,100;
42 ,

Those linear features
Wahba, 1990
ka,btheta.211cbsxfuntimes,a8,b8.16c . 3absbsxfunminus,a,b16.3bsxfunplus,a8.16.3,b8.16.3;
covf ,f 1x x b 1x ccx ccdc xi xj ij 0 i j
11bxixjbxi xj3 x3 y3 aka. cubicsplines
3 42 ,

Those linear features
Wahba, 1990
ka,btheta.211cbsxfuntimes,a8,b8.16c . 3absbsxfunminus,a,b16.3bsxfunplus,a8.16.3,b8.16.3;
covf ,f 1x x b 1x ccx ccdc xi xj ij 0 i j
11bxixjbxi xj3 x3 y3 aka. cubicsplines
3 42 ,

Exponentially suppressed polynomials
phiabsxfuntimes,bsxfunpower,a.9,0:1,c.0:1; Minka, 2000
1
covf ,f blxlxl 0b1 1x,x 1 xi xj i j i j
l0
43 ,

Exponentially suppressed polynomials
phiabsxfuntimes,bsxfunpower,a.9,0:2,c.0:2; Minka, 2000
2
covf ,f blxlxl 0b1 1x,x 1 xi xj i j i j
l0
43 ,

Exponentially suppressed polynomials
phiabsxfuntimes,bsxfunpower,a.9,0:10,c.0:10; Minka, 2000
10
covf ,f blxlxl 0b1 1x,x 1 xi xj i j i j
l0
43 ,

Exponentially suppressed polynomials
ka,btheta.2 . 1.1cbsxfuntimes,a.8,b.8; Minka, 2000
covf ,f blxlxl 1 0b1 1x,x1 xi xj i j 1bxx i j
ij
l0
43 ,

Exponentially suppressed polynomials
ka,btheta.2 . 1.1cbsxfuntimes,a.8,b.8; Minka, 2000
covf ,f blxlxl 1 0b1 1x,x1 xi xj i j 1bxx i j
ij
l0
43 ,

Exponentially decaying periodic features
phiabsxfuntimes,cosbsxfuntimes,a8,0:2,c.0:2, bsxfuntimes,sinbsxfuntimes,a8,1:2,c.1:2;
T. Minka, 2000
2
covf , f 1 blcos2lxcos2lx sin2lxsin2lxxixj ijij
l0 0b1
44 ,

Exponentially decaying periodic features
phiabsxfuntimes,cosbsxfuntimes,a8,0:20,c.0:20, bsxfuntimes,sinbsxfuntimes,a8,1:20,c.1:20;
T. Minka, 2000
20
covf , f 1 blcos2lxcos2lx sin2lxsin2lxxixj ijij
l0 0b1
44 ,

Exponentially decaying periodic features
phiabsxfuntimes,cosbsxfuntimes,a8,0:50,c.0:50, bsxfuntimes,sinbsxfuntimes,a8,1:50,c.1:50;
T. Minka, 2000
50
covf , f 1 blcos2lxcos2lx sin2lxsin2lxxixj ijij
l0 0b1
44 ,

Exponentially decaying periodic features
T. Minka, 2000
ka,btheta.2 . 2pi . asin2 crcubsxfuntimes,a,b . sqrtbsxfuntimes,12crcua.2,12crcub.2 ;
covf , f 1 blcos2lxcos2lx sin2lxsin2lxxixj ijij
l0
21b2 2bcos2xi xj 0b1 44 ,
1
1b22

Exponentially decaying periodic features
T. Minka, 2000
ka,btheta.2 . 2pi . asin2 crcubsxfuntimes,a,b . sqrtbsxfuntimes,12crcua.2,12crcub.2 ;
covf , f 1 blcos2lxcos2lx sin2lxsin2lxxixj ijij
l0
21b2 2bcos2xi xj 0b1 44 ,
1
1b22

White Noise
the limit of block functions
limIxi cIxj cdcxi xj 0
but were cheating a little height of blocks goes to 0!
white noise is a concept, more than a proper limit
if you make no assumptions, you learn nothing
45 ,

White Noise
the limit of block functions
limIxi cIxj cdcxi xj 0
but were cheating a little height of blocks goes to 0!
white noise is a concept, more than a proper limit
if you make no assumptions, you learn nothing
45 ,

White Noise
the limit of block functions
limIxi cIxj cdcxi xj 0
but were cheating a little height of blocks goes to 0!
white noise is a concept, more than a proper limit
if you make no assumptions, you learn nothing
45 ,

White Noise
the limit of block functions
limIxi cIxj cdcxi xj 0
but were cheating a little height of blocks goes to 0!
white noise is a concept, more than a proper limit
if you make no assumptions, you learn nothing
45 ,

White Noise
the limit of block functions
limIxi cIxj cdcxi xj 0
but were cheating a little height of blocks goes to 0!
white noise is a concept, more than a proper limit
if you make no assumptions, you learn nothing
45 ,

That gcd kernel
ka,bgcdbsxfuntimes,a,onessizeb,bsxfuntimes,onessizea,b;
46 ,

That gcd kernel
ka,bgcdbsxfuntimes,a,onessizeb,bsxfuntimes,onessizea,b;
46 ,

That gcd kernel
ka,bgcdbsxfuntimes,a,onessizeb,bsxfuntimes,onessizea,b;
46 ,

Summary
Gaussians are closed under
linear projectionmarginalizationsum rulelinear restrictionconditioningproduct rule
they provide the linear algebra of inference
combine with nonlinear features , get nonlinear regression
in fact, number of features can be infinite
nonparametric Gaussian process regression Tomorrow:
so what are kernels? What is the set of kernels?
how should we design GP models?s
how powerful are those models?s
47 ,

Bibliography

D.J.C. MacKay
Introduction to Gaussian Processes
in Bishop, C.M. ed., Neural Networks and Machine Learning, Springer, 1998
C.E. RasmussenC.K.I. Williams
Gaussian Processes for Machine Learning
MIT Press, 2006
T. Minka
Deriving quadrature rules from Gaussian processes
Tech. Report 2000
G. Wahba
Spline Models for Observational Data
SIAM CBMSNSF reg. conf. series in applied mathematics, 1990
48 ,

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] R C math graph network Bayesian Gaussian Processes Part I The Linear Algebra of Inference
$25