COMP2610/COMP6261
Tutorial 5 Sample Solutions
Tutorial 5: Probabilistic inequalities and Mutual Information
Young Lee and Bob Williamson
Tutors: Debashish Chakraborty and Zakaria Mhammedi
Week 5 (21st 25th August), Semester 2, 2017
1. Consider a discrete variable X taking on values from the set X . Let pi be the probability of each state, with
i = 1, . . . , |X |. Denote the vector of probabilities by p. We saw in lectures that the entropy of X satisfies:
H(X) log|X |,
with equality if and only if pi =
1
|X |
for all i, i.e.p is uniform. Prove the above statement using Gibbs inequality,
which says
|X |
i=1
pi log2
pi
qi
0
for any probability distributions p,q over |X | outcomes, with equality if and only if p = q.
Solution.
Gibbs inequality tells us that for any two probability vectors p = (p1, . . . , p|X |) and q = (q1, . . . , q|X |):
|X |
i=1
pi log
pi
qi
0
with equality if and only if p = q. If we take q to be the vector representing the uniform distribution
q1 = . . . = q|X | =
1
|X |
, then we get
0
|X |
i=1
pi log
pi
1
|X |
=
|X |
i=1
pi log pi +
|X |
i=1
pi log|X | = H(p) + log|X |
with equality if and only if p is the uniform distribution. Moving H(p) to the other side gives the inequality.
1
2. LetX be a discrete random variable. Show that the entropy of a function ofX is less than or equal to the entropy
of X by justifying the following steps:
H(X, g(X))
(a)
= H(X) +H(g(X)|X)
(b)
= H(X);
H(X, g(X))
(c)
= H(g(X)) +H(X|g(X))
(d)
H(g(X)).
Thus H(g(X)) H(X).
Solution.
(a) This is using the chain rule of entropy,
i.e. H(X,Y ) = H(X) +H(Y | X) where Y = g(X)
(b) GivenX , we can determine g(X) since it is fixed, being a function ofX . This means no uncertainty remains
about g(X) when X is given. Thus, H(g(X) | X) = 0 since
x p(x)p(g(X) | X = x) = 0.
(c) This is also using the chain rule of entropy,
i.e. H(X,Y ) = H(Y ) +H(X | Y ) where Y = g(X)
(d) In this case, H(X | g(X)) 0 since the conditional entropy of a discrete random variable is non-negative.
If g(X) has one-to-one mapping with X , then H(X, g(X)) H(g(X)).
Combining parts (b) and (d), we obtain H(X) H(g(X)).
3. Random variables X , Y , Z are said to form a Markov chain in that order (denoted by X Y Z) if their
joint probability distribution can be written as:
p(X,Y, Z) = p(X) p(Y |X) p(Z|Y )
(a) Suppose (X,Y, Z) forms a Markov chain. Is it possible for I(X;Y ) = I(X;Z)? If yes, give an example of
X,Y, Z where this happens. If no, explain why not.
(b) Suppose (X,Y, Z) does not form a Markov chain. Is it possible for I(X;Y ) I(X;Z)? If yes, give an
example of X,Y, Z where this happens. If no, explain why not.
Solution.
(a) Yes; pick Z = Y .
Reason: The data processing inequality guarantees I(X;Y ) I(X;Z). Here we want to verify that
equality is possible. If we look at the proof of the data processing inequality, we just need to find a Z where
I(X;Y |Z) = 0.
For Z = Y , intuitively, conditioning on Z, the reduction in uncertainty in X when we know Y is zero,
because Z already tells us everything that Y can. Formally,I(X;Y ) = I(X;Z) because the random
variables Y and Z have the same distribution. Note: to formally check that Z is conditionally independent
of X given Y , we can check p(Z = z,X = x|Y = y) = p(Z = z|Y = y) p(X = x|Y = y) for all
possible x, y, z. The reason is that the left and right hand sides are zero when y 6= z; and when y = z, they
both equal p(X = x|Y = y) as p(Z = z|X = x, Y = y) = 1 in this case.
(b) Yes; pick X , Z independent, and let Y = X + Z (assuming the outcomes are numeric).
Reason: Z is not conditionally independent ofX given Y ; intuitively, knowingX +Z andX tells us what
Z is. So (X,Y, Z) does not form a Markov chain. However, since X , Z are independent, I(X;Z) = 0.
Since mutual information is non-negative, I(X;Y ) 0 = I(X;Z).
2
4. If X Y Z, then show that
(a) I(X;Z) I(X;Y )
(b) I(X;Y |Z) I(X;Y )
Proof in lecture 9
5. A coin is known to land heads with probability 1
5
. The coin is flipped N times for some even integer N .
(a) Using Markovs inequality, provide a bound on the probability of observing N
2
or more heads.
(b) Using Chebyshevs inequality, provide a bound on the probability of observing N
2
or more heads. Express
your answer in terms of N .
(c) For N {2, 4, . . . , 20}, in a single plot, show the bounds from part (a) and (b), as well as the exact
probability of observing N
2
or more heads.
Solution.
X1, . . . , XN represents N flips, where, independent bernoulli random variable, Xi = 1 represents observing
head from a coin flip and Xi = 0 represents observing tail. Suppose XN = 1N
N
i=1Xi. So, the probability of
observing N
2
heads can be expressed as p(XN 12 ) and p(Xi = 1) =
1
5
for each i.
(a) Using Markovs Inequality,
p(XN
1
2
)
E[XN ]
N
2
=
N
i=1
E[Xi]
N
1
2
=
1
5
1
2
=
2
5
p(XN
N
2
)
2
5
(b) We need to calculate the variance of the bernoulli random variable: V ar(X) = p(1 p)
V ar[Xi] = (
1
5
)(1
1
5
) =
4
25
Using the definition of variance and its properties,
V ar(XN ) = V ar[
1
N
N
i=1
Xi] =
N
i=1 V ar[Xi]
N2
=
N( 4
25
)
N2
=
4
25N
Using Chebyshevs inequality,
p(| XN E[XN ] | )
V ar(XN )
2
p(| XN
1
5
|
3
10
)
4
25N
( 3
10
)2
p(XN
1
2
)
16
9N
(c) The exact probability of a k heads is given by the binomial distribution:
P (X = k) =
(
N
k
)
(
1
5
)k(
4
5
)Nk
3
So, the probability of seeing N/2 or more heads is
P (X N/2) =
N
k=N/2
P (X = k)
=
N
k=N/2
(
N
k
)
(
1
5
)k(
4
5
)Nk
Another way to calculate the exact probability
p(XN
1
2
) = 1 p(XN <12)This can be done in Matlab using (1-binocdf(floor(0.5.*n-0.5), n, 0.2))Here floor(0.5.*n-0.5) simply brings the value of n to an integer less than n/2 for each value of n. Forexample, a value of n=10 would lead floor(0.5.*n-0.5) value of 4, which is what we want.The code for the plot above is included below:1 n = 2 : 2 : 2 0 ;23 % Markov I n e q u a l i t y4 y_m = 2 / 5 ;56 % Chebyshev I n e q u a l i t y7 y_c = 16 . / ( 9 . n ) ;89 % Exac t P r o b a b i l i t i e s10 y_e = 1b i n o cd f ( f l o o r ( 0 . 5 . n0.5) , n , 0 . 2 ) ;1112 p l o t ( n , y_c , k )13 ho ld on ;14 p l o t ( [ 2 2 0 ] , [ y_m y_m ] , m )15 ho ld on ;16 p l o t ( n , y_e , b )17 ho ld on ;18 s e t ( gca , f o n t s i z e , 14)1920 t i t l e ( Markov s bound , Chebyshev s bound and e x a c t p r o b a b i l i t y o f o b e s e r v i n g more t h anN/2 heads )21 y l a b e l ( P r o b a b i l i t y )22 x l a b e l ( Number o f co i n f l i p s (N) )23 l e g end ( Chebyshev , Markov , Exac t ) ;4
Reviews
There are no reviews yet.