1 [50 Marks] Expectation Maximisation
Consider a model with continuous observed variables x RD and hidden variables t {0,1}K and z RQ. The hidden variable t is a K-dimensional binary random variable with a 1-of-K representation, where tk {0,1} and Pk tk = 1, i.e. exactly one component of tk is equal to 1 while all others are equal to 0. The prior distribution over t is given by
p(tk = 1|) = k, (1)
where mixing weights satisfy 0 k 1 and = 1. This can also be
written in the form | ||
K p(t|) = Yktk. | (2) |
k=1
COMP9418, UNSW Sydney Advanced Topics in Statistical Machine Learning, 18s2
Hidden variable z is a Q-dimensional continuous random variable with prior distribution
p(z|) = p(z) = N(0,I). (3)
The conditional likelihood of x given z and tk = 1 is a Gaussian defined as
p(x|z,tk = 1,) = N(x|Wkz + bk,), (4)
where Wk RDQ, bk RD and RDD is a diagonal covariance matrix. Another way to express this is
K
p(x|z,t,) = YN(x|Wkz + bk,)tk. (5)
k=1
Let us collectively denote the set of all observed variables by X and hidden variables by Z and T . The joint distribution is denoted by p(Z,T,X|), and is governed by the set of model parameters .
In the questions below, unless otherwise stated explicitly, you must show all your working. Omission of details or derivations may yield a reduction in the corresponding marks.
- [5 marks] Draw the graphical representation for this probabilistic model, making sure to include the parameters in the graph. (Non-random variables can be included similarly to random variables, except that circles are not drawn around them).
- [5 marks] In terms of K,D,Q, give an expression for the number of parameters we are required to estimate under this model.
- [10 marks] In the E-step of the expectation maximization (em) algorithm, we are required to compute the expected sufficient statistics of the posterior over hidden variables. The posterior responsibility of mixture component k for a data-point n is expressed as
def old
rnk = p(tnk = 1|xn, ) = Ep(tnk|x,old)[tnk]. (6)
The conditional posterior over local hidden factor zn is a Gaussian with mean mnk and covariance Cnk,
p(zn|tnk = 1,xn,old) = N(zn|mnk,Cnk). (7)
The covariance is given by
C, (8)
where
def
mnk = Ep(zn|tnk=1,xn,old)[zn], and S. (9) i) [5 marks] Give analytical expressions for the responsibilities rnk and the expected sufficient statistics mnk and Snk in terms of the old model parameters old.
- [1 marks] To de-clutter notation and simplify subsequent analysis, it is helpful to introduce augmented factor loading matrix and hidden factor vector,
W , and z . (10)
Accordingly, give expressions for the sufficient statistics of the conditional posterior on augmented hidden factor vectors,
def E n nk n old m nk = p(z |t =1,x , )[zn], and S
Note you need only express this in terms of mnk and Snk.
COMP9418, UNSW Sydney Advanced Topics in Statistical Machine Learning, 18s2
- [4 marks] Show that the sufficient statistics of the joint posterior factorise as follows,
.
- [10 marks] Write down the full expression for the expected complete-data log likelihood (also known as auxiliary function) for this model,
Q(,old) def= Ep(Z,T|X,old)[logp(Z,T,X|)]. (12) e) [20 marks] Optimize the auxiliary function Q w.r.t. model parameters to obtain M-step updates. Show all your working and highlight each individual update equation.
2 [50 Marks] Practical Part
See Jupyter notebook comp9418 ass2.ipynb.
Reviews
There are no reviews yet.