, , , , , , , , ,

[SOLVED] Ece 50024 / stat 59800 homework 3 exercise 1 suppose that we are given a dataset d

$25

File Name: Ece_50024___stat_59800_homework_3_exercise_1_suppose_that_we_are_given_a_dataset_d.zip
File Size: 772.44 KB

5/5 - (1 vote)

Exercise 1
Suppose that we are given a dataset D
def = {xn}
N
n=1, where each sample xn ∈ R
d
is an iid copy of the random
variable X. For simplicity, we assume that the distribution of X is a multi-dimensional Gaussian of mean
µ ∈ R
d and covariance Σ ∈ R
d×d
. We further assume that the mean vector µ is known and is given.Therefore, the likelihood of observing a sample xn is fully controlled by the covariance matrix, i.e.,
p(xn | Σ) = 1
(2π)
d/2|Σ|
1/2
expn

1
2
(xn − µ)
T Σ
−1
(xn − µ)
o
(1)Taking into consideration of all the samples in the dataset D, the likelihood of D is
p(D | Σ) = Y
N
n=1

1
(2π)
d/2|Σ|
1/2
expn

1
2
(xn − µ)
T Σ
−1
(xn − µ)
o
. (2)The goal of this analytical exercise is to derive the maximum-likelihood estimate of Σ:
Σb ML = argmax
Σ
p(D | Σ). (3)
To make things simpler we assume that Σ and Σe =
1
N
PN
n=1(xn−µ)(xn−µ)
T are invertible in this exercise.(a) Recall that the trace operator is defined as tr[A] = Pd
i=1[A]i,i. Prove the matrix identity
x
T Ax = tr[AxxT
], (4)
where A ∈ R
d×d
.(b) Show that the likelihood function in (3) can be written as:
p(D|Σ) = 1
(2π)Nd/2

−1
|
N/2
exp (

1
2
tr”
Σ
−1 X
N
n=1
(xn − µ)(xn − µ)
T
#). (5)
1
(c) Let Σe =
1
N
PN
n=1(xn − µ)(xn − µ)
T
, and let A = Σ
−1Σe, and λ1, …, λd be the eigenvalues of A. Show
that the result from the previous part leads to:
p(D|Σ) = 1
(2π)Nd/2|Σe|N/2
Y
d
i=1
λi
!N/2
exp (

N
2
X
d
i=1
λi
)
(6)
Hint: For matrix A with eigenvalues λ1, …, λd, tr[A] = Pd
i=1 λi
.(d) Find λ1, . . . , λd such that (6) is maximized.(e) With the choice of λi given in (d), prove that the ML estimate Σb ML is
Σb ML =
1
N
X
N
n=1
(xn − µ)(xn − µ)
T
. (7)
(f) What would be the alternative way of finding Σb ML? You do not need to prove. Just briefly mention
the idea.
(g) If µ is also estimated from the data so that it is µb =
1
N
PN
n=1 xn, the ML estimate Σb ML = (1/N)
PN
n=1(xn−
µb)(xn −µb)
T will be a biased estimate of the covariance matrix because E[Σb ML] ̸= Σ. Can you suggest
an unbiased estimate Σb unbias such that E[Σb unbias] = Σ? No need to prove. Just state the result.Exercise 2
In this exercise I want you to implement a Bayesian decision rule for a (super classical) problem of image
segmentation. The image we work with consists of a cat and some grass 1
.The size of this image is 500×375
pixels. The left hand side of Figure 1 shows the image, and the right hand side of Figure 1 shows a manually
labeled “ground truth”. Your task is to do as much as you can to extract the cat from the grass, and compare
your result with the ”ground truth”.Figure 1: The “Cat and Grass” image.
Preparation Steps (No need to hand in)
First of all, go to the course website and download the data. Write the Python script to read the data and
convert it into a data matrix.
train_cat = np.matrix(np.loadtxt(‘train_cat.txt’, delimiter = ‘,’))
train_grass = np.matrix(np.loadtxt(‘train_grass.txt’, delimiter = ‘,’))
1
Image Source: http://www.robots.ox.ac.uk/vgg/data/pets/
2
The data matrices are 64 × K1 and 64 × K0, respectively, where K1 is the number of training samples for
Class 1 (cat), and K0 is the number of training samples for Class 0 (grass).Throughout this exercise, you need to read images and extract patches. To read an image, you can call cv2
library or you can call plt.imread. For example, you can do
Y = plt.imread(‘cat_grass.jpg’) / 255The decision making of this problem is performed for every pixel. Therefore, you need to write a for loop
to loop through all the pixels of the image. Moreover, you need to extract 8 × 8 neighbors surrounding each
pixel. These can be done using the following commands:
M,N = Y.shape
for i in range(M-8):
for j in range(N-8):
block = Y[i:i+8, j:j+8] # This is a 8×8 block
#
# Something
#To make your life easier, it is okay to set the running index i in range(M-8) by neglecting the boundary
pixels. In this case, the ground truth mask will have 8 rows and 8 columns less.The Bayesian decision rule we are going to implement is based on the posterior distribution. We define the
likelihood functions:
pX|Y (x|C1) = 1
(2π)
d/2|Σ1|
1/2
expn

1
2
(x − µ1
)
T Σ
−1
1
(x − µ1
)
o
,
pX|Y (x|C0) = 1
(2π)
d/2|Σ0|
1/2
expn

1
2
(x − µ0
)
T Σ
−1
0
(x − µ0
)
o
, (8)
and also the prior distributions pY (C1) = π1 and pY (C0) = π0. For simplicity, we assume that π1 =
K1
K1+K0
and π0 =
K0
K1+K0
. The Bayesian decision rule says that
pY |X(C1|x) ≷
C1
C0
pY |X(C0|x), (9)
which is based on the posterior distribution.(a) Substitute the multi-dimensional Gaussian likelihood (8) and the priors π1 and π0 into (9). Show that
the decision rule is equivalent to

1
2
(x − µ1
)
T Σ
−1
1
(x − µ1
) + log π1 −
1
2
log |Σ1| ≷
C1
C0

1
2
(x − µ0
)
T Σ
−1
0
(x − µ0
) + log π0 −
1
2
log |Σ0|.
(b) Estimate µ1
, µ0
, Σ1, Σ0, π1 and π0 in Python.Report:
(i) The first 2 entries of the vector µ1 and the first 2 entries of the vector µ0
.
(ii) The first 2 × 2 entries of the matrix Σ1 and the first 2 × 2 entries of the matrix Σ0.
(iii) The values of π1 and π0.(c) Write a double for loop to loop through the pixels of the testing image. At each pixel location, consider
a 8 × 8 neighborhood. This will be the testing vector x ∈ R
64. Dump this testing vector x into the
decision rule you proved in (a), and determine whether the testing vector belongs to Class 1 or Class
0. Repeat this for other pixel locations.for i in range(M-8):
for j in range(N-8):
block = Y[i:i+8, j:j+8]
#
# Something
#
prediction[i,j] = # Something
If you do everything right, you will get a binary image. Submit this predicted binary image. Remark:
My program runs for about 10-15 seconds. If your code takes forever to run, something is wrong.(d) Consider the ground truth image truth.png. Report the mean absolute error (MAE) between your
prediction and the ground truth:
MAE = 1
# of pixels
X
i,jprediction[i, j] − truth[i, j].
Report your MAE. Remark: Because we are not dealing with the boundary pixels (which explains why I
set i in range(M-8)), when computing the MAE you need to set the true mask to truth[0:M-8, 0:N-8].(e) Go to the internet and download an image with similar content: an animal on grass or something like
that. Apply your classifier to the image, and submit your resulting mask. You probably do not have
the ground truth mask, so please just show the predicted mask. Does it perform well? If not, what
could go wrong? Write one to two bullet points to explain your findings. Please be brief.Exercise 3
The objective of this exercise is to plot the ROC curve. You may want to read Chapter 9.4 and Chapter 9.5
of Prof. Stanley Chan’s book.
(a) The Bayesian decision rule you derived in Exercise 2 is actually equivalent to the likelihood ratio test:
pX|Y (x|C1)
pX|Y (x|C0)

C1
C0
τ, (10)
for some threshold constant τ . Determine τ that corresponds to the decision rule in Exercise 2.(b) Implement this likelihood ratio test rule for different values of τ . For every τ , compute the number
of true positives and the number of false positives. Then, we can define the probability of detection
pD(τ ) and the probability of false alarm pF (τ ) as:
pD(τ ) = # true positives
total # of positives in ground truth
pF (τ ) = # false positives
total # of negatives in ground truth.Plot the ROC curve. That is, plot pD(τ ) as a function of pF (τ ). Your ROC curve should cover the
range [0, 1] × [0, 1]. Remark: Generating this ROC curve will take a minute or two.(c) On your ROC curve, mark a red dot to indicate the operating point of the Bayesian decision rule.(d) Implement a linear regression classifier for this problem, and plot the ROC curve. The idea is to
construct a matrix system:

X1
X0

| {z }
=A
θ =

1
−1

| {z }
=b
,
4
where X1 ∈ R
K1×d and X0 ∈ R
K0×d are the training data matrix of Class 1 and Class 0. Solve the
regression problem
θb = argmin
θ∈Rd
∥Aθ − b∥
2
.During testing, write a double for loop to through all the 8 × 8 neighbors of the image pixels. The
decision rule per neighbor is
θb
T
x ≷
C1
C0
τ, (11)
where θb is the trained model parameter, and x is the testing 8 × 8 neighbor. By varying the threshold
τ , you can obtain another ROC curve. Plot it.Exercise 4: Project Check Point
For this checkpoint, you need to play with the existing implementation that you identified in the previous
checkpoint. First, please paste the paragraphs you wrote from Check Point 2 into your overleaf repository
of the final project, if you haven’t done so. Then, add a new section for this checkpoint. In the new section,
write about the following things:• What are the technical issues you encounter when you run the existing implementation?
• Have you overcome these issues? If so, what did you do that successfully removed the issue? If not,
what have you tried so far?• Generate some preliminary results, e.g., plots, numbers, etc. using the existing implementation.
Present the preliminary results in the document and describe the results.
Please attach the PDF generated from the repoistory to the end of your homework. This checkpoint aims
to motivate you to understand your paper in concrete details. Don’t panic if you find it very hard to run
the existing implementation, you can state what you have tried for this checkpoint and leverage the most of
office hours to get your issue resolved.

Shopping Cart
[SOLVED] Ece 50024 / stat 59800 homework 3 exercise 1 suppose that we are given a dataset d[SOLVED] Ece 50024 / stat 59800 homework 3 exercise 1 suppose that we are given a dataset d
$25