Let us assume a two-class classification problem where each class is modeled by a 2D
Gaussian distribution G(μ1, Σ1) and G(μ2, Σ2).
1. Generate 100,000 samples from each 2D Gaussian distribution (i.e., 200,000 samples
total) using the following parameters (i.e., each sample (x,y) can be thought as a feature
vector):
1
1
1
1
1 0
0 1
2
4
4
2
1 0
0 1
Notation:
x
y
2
2
0
0
x
y
Note: this is not the same as sampling the 2D Gaussian functions; see “Generating
Gaussian Random Numbers“on the course’s webpage for more information on how to
generate the samples using the Box-Muller transformation. A link to C code has been
provided on the webpage. Since the code generates samples for 1D distributions, you
would need to call the function twice to get a 2D sample (x, y); use (μx, σx) for the x
sample and (μy, σy) for the y sample.
Note: ranf() is not defined in the standard library and that you would need to
implement it yourself using rand(); for example:
/* ranf – return a random double in the [0,m] range.*/
double ranf(double m) {
return (m*rand())/(double)RAND_MAX;
}
(m=1 in our case)
a. Assuming P(ω1) = P(ω2)
i. Design a Bayes classifier for minimum error.
ii. Plot the Bayes decision boundary together with the generated samples
to better visualize and interpret the classification results.
iii. Report (i) the number of misclassified samples for each class separately
and (ii) the total number of misclassified samples.
iv. Plot the Chernoff bound as a function of β and find the optimum β for the
minimum.
v. Calculate the Bhattacharyya bound. Is it close to the experimental error?
b. Repeat part (a) for P(ω1) = 0.2 and P(ω2) = 0.8. For comparison purposes, use
exactly the same 200,000 samples from (a) in these experiments.
2. Repeat parts (1.a) and (1.b) using the following parameters (i.e., you need to generate
new sample sets):
1
1
1
1
1 0
0 1
2
4
4
2
4 0
0 8
3. Repeat part (2.b) (i.e., P(ω1) ≠ P(ω2)) using the minimum-distance classifier and
compare your results (i.e., misclassified samples) with those obtained in part (2.b). For
comparison purposes, use exactly the same 200,000 samples as in part 2.
1. In the previous assignment, you designed a Bayes classifier assuming the following 2D
Gaussian distributions:
1
1
1
1
1 0
0 1
2
4
4
2
1 0
0 1
In this assignment, you will assume that you do not know the true parameters of the
Gaussian distributions and that you need to estimate them from the training data using
the Maximum Likelihood (ML) approach.
a. Using the same 200,000 samples from the previous assignment, estimate the
parameters of each distribution using ML. Then, classify all 200,000 samples
assuming P (ω1) = P (ω2); count the number of misclassified samples and compare
your results to those obtained in assignment 1.
b. Repeat experiment (1.a) using 1/100 of the samples from each distribution (randomly
selected) to estimate the parameters of that distribution using ML. Then, classify all
200,000 samples assuming P (ω1) = P (ω2); count the number of misclassified
samples and compare your results to those obtained in experiment (1.a).
2. Repeat problem 1 using the samples (same as in Assignment 1) from the following 2D
Gaussian distributions:
1
1
1
1
1 0
0 1
2
4
4
2
4 0
0 8
3. Face detection using skin color is a popular approach. While color images are typically
in RGB format, most techniques transform RGB to a different color space (e.g.,
chromatic, HSV, etc.). This is because RGB values are more sensitive to changes of
brightness due to illumination changes.
a. Implement the skin-color methodology of [Yang96 “A Real-time Face Tracker”]
which uses the chromatic color space. To build the skin color model, use
Training_1.ppm (and ref1.ppm), shown in Figure 1, which are available from the
course’s webpage. To test your method, use Training_3.ppm (and ref3.ppm) and
Training_6.ppm (and ref6.ppm), which are also available from the course’s
webpage. To quantitatively evaluate the performance of your method, generate
ROC plots (i.e., false positives (FP) vs false negatives (FN)) by varying the skincolor threshold. A FP would be a non-face pixel which was classified as skincolor while a FN would be a face pixel which was classified as non-skin color. To
compute the FPs and FNs for each test image, use the corresponding reference
images.
b. Repeat (3.a) using the YCbCr color space In the YCbCr color space, the
luminance information is contained in Y component; and, the chrominance
information is in Cb and Cr. Therefore, Y should not be used in the skin color
model. The RGB components can be converted to the YCbCr components using
the following transformation:
Y = 0.299R + 0.587G + 0.114B
Cb = -0.169R – 0.332G + 0.500B
Cr = 0.500R – 0.419G – 0.081B
Figure 1. Training_1.ppm and ref1.ppm images.
For comparison purposes, plot the ROC curves in the same graph.
Note: Irfanview is a nice tool for image display/manipulation. Sample code to read/write
color images in PPM format can be found in my CS 302 webpage:
https://www.cse.unr.edu/~bebis/CS302/
Information on the PPM image file format can be found here:
https://paulbourke.net/dataformats/ppm/
https://www.cse.unr.edu/~bebis/CS302/Lectures/IP.ppt
In this project, you will implement the eigenface approach [2] and perform experiments to
evaluate its performance and the effect of several factors on recognition performance.
1. Eigenface implementation
Read carefully and understand the steps of the eigenface approach. Use jacobi.c from
“Numerical Recipes in C” for computing the eigenvalues/eigenvectors of a symmetric
matrix (Warning: the [0] location of an array is NOT used in “Numerical Recipes”; start
storing your data at location [1]). Your program should run in two modes: training and
testing.
Training: In training mode, your program will read in the training face images and compute
the average face and eigenfaces. It will then project each training face image onto the
eigenspace and compute its representation in that space (i.e., the coefficients of
projection Ωk , k=1,2,..,M, where M is the number of training face images). Finally, your
program will store into a file the coefficients Ωk , the average face, and the eigenfaces.
Testing: In testing mode, your program will read in the coefficients Ωk , the average face,
and the eigenfaces. Then, it will decide how many eigenfaces to keep (i.e., this could be
done in an interactive mode where the user determines the percentage of the information
to be preserved). Use the images in a test set (see below) to evaluate face recognition
performance. Given a test image, your program will need to project it onto the eigenspace
and compute its projection coefficients Ω. To recognize the face in the test image, you will
need to find the closet match Ωk to Ω (i.e., distance in face space (difs)). Let’s call ek = ||Ωk
− Ω|| where the distance is computed using the Mahalanobis distance.
Very important: to make sure that your program works correctly, try the following: given
an image I, (i) project it onto the eigen-space, (ii) reconstruct it using all eigenfaces; let’s
call the reconstructed image
ˆ
I
, (iii) compute ||I –
ˆ
I
|| (i.e., distance from face space (dffs)
using Euclidean distance). The difference should be very small; if it is not, then your code
is not working correctly. Do not proceed unless you have been able to verify this step.
2. Datasets
To test eigenface recognition, you will use images from the FERET face database [1].
FERET contains a large number of images acquired during different photo sessions and
has a good variety of gender, ethnicity and age groups. The lighting conditions, face
orientation and time of capture vary. In this project, you will concentrate on frontal face
poses named as fa (frontal image) or fb (alternative frontal image, taken during a different
photo session). All faces have been normalized with regards to orientation, position, and
size. Also, they have been masked to include only the face region (i.e., upper body and
background were cropped out). The first subset (fa) contains 1204 images from 867
subjects while the second subset (fb) contains 1196 images from the 866 subjects (i.e.,
there is one subject in fa who is not in fb). You have been provided with two different sizes
for each image: low resolution (16 x 20) and high resolution (48 x 60). All datasets can be
downloaded from the course’s webpage:
FA_L (fa, low resolution), FA_H (fa, high resolution)
FB_L (fb, low resolution), FB_H (fb, high resolution)
The file naming convention for the FERET database is as follows:
nnnnn_yymmdd_xx_q.pgm
where nnnnn is a five digit integer that uniquely identifies the subject, yymmdd indicates
the year, month, and date when the photo was taken, xx is a lowercase character string
(i.e., either fa or fb), and q is a flag (e.g., indicating whether the subject wears glasses – not
always present).
3. Experiments
(a) Use fa_H for training (i.e., to compute the eigenfaces and build the gallery set) and
fb_H for testing. So, there will be 1203 images for training and 1196 images for testing
(query).
(a.I) Show (as images) the following:
o The average face
o The eigenfaces corresponding to the 10 largest eigenvalues.
o The eigenfaces corresponding to the 10 smallest eigenvalues.
(a.II) Choose the top eigenvectors (eigenfaces) preserving 80% of the information
in the data as the basis. Project both training and query images onto this basis
after subtracting the average face to obtain the eigen-coefficients. Then, compute
the Mahalanobis distance between the eigen-coefficient vectors for each pair of
training and query images as the matching distance. Please note that for each
query image, there will be 1203 matching distances (i.e., obtained by matching the
query with each image in the gallery dataset).Choose the top N face gallery
images (i.e., N is a parameter, see below) having the highest similarity score with
the query face. (i.e., N smallest matching distances). If the query image is among
the N most similar faces retrieved, then it is considered as a correct match,
otherwise; it is considered as an incorrect match.
Count the number of correct matches and divide it by the total number of images in
the test set (e.g., 1196) to report the identification accuracy. Draw the Cumulative
Match Characteristic (CMC) curve [1] by varying N from 1 to 50. CMC shows the
probability of the query being among the top N faces retrieved from the gallery. The
faster the CMC curve approaches the value one, the better the matching algorithm
is (see graph below).
(Part a.III) Assuming N=1, show 3 query images which are correctly matched,
along with the corresponding best matched training samples.
(Part a.IV) Assuming N=1, show 3 query images which are incorrectly matched,
along with the corresponding mismatched training samples.
(Part a.V) Repeat (a.II – a.IV) by keeping the top eigenvectors corresponding to
90% and 95% of the information in the data. Plot the CMC curves on the same
graph for comparison purposes. If there are significant differences in terms of
identification accuracy in (a.II) and (a.V), try to explain why. If there are no
significant differences, explain why too.
(b) In this experiment, you will test the performance of the eigenface approach on faces
not in the gallery set (i.e., intruders). For this, remove all the images of the first 50 subjects
from fa_H; let’s call the reduced set as fa2_H. Perform recognition using fa2_H for training
(gallery) and fb_H for testing (query). In this experiment, use the eigenvectors
corresponding to 95% of the information in the data. To reject intruders, you would need to
threshold ek (i.e., accept the match only of ek < T). In this case, the choice of the threshold
T is very important. A high threshold value will increase False Positives (FP) while a low
threshold value will decrease the number of True Positives (TP). To find out what is a good
threshold value, you would need to vary the value of T and compute (FP, TP) for each
value. Then, you would need to plot the (FP, TP) values in a graph (i.e., ROC graph; see
below).
0.7
0 1
1
false positive rate
true
positive
rate
# true positives
# non-intruders (positives)
0.1
# false positives
# intruders (negatives)
Graduate Students Only – Experiments using low-resolution face images.
(c) Repeat experiment (a) using fa_L for training (gallery) and fb_L for testing.
(d) Remove all the images of the first 50 subjects from fa_L; let’s call the reduced set as
fa2_L. Repeat experiment (b) using fa2_L for training (gallery) and fb_L for testing.
(e) What is the effect of using low-resolution images? Are there any significant differences
in identification performance? Explain.
References
[1] Phillips, J. and Moon, H. and Risvi, S. and Rauss, J., “The FERET Evaluation
Methodology for Face Recognition Algorithms”, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 22, no. 10, pp. 1090-1104, 2000.
[2] M. Turk and A. Pentland, Face Recognition Using Eigenfaces, Computer Vision and
Pattern Recognition Conference, 1991.
In this assignment, you will experiment with two different classifiers for gender classification:
SVMs and Bayesian classifier.
Data Set and experiments: The dataset to be used in your experiments contains 400 frontal
images from 400 distinct people, representing different races, with different facial expressions,
and under different lighting conditions. The 400 images have been equally divided between
males and females. Histogram equalization has been applied to each normalized image to
account for different lighting conditions.
The data, which is available from the course’s webpage, contains images of two different sizes: 16×20 and 48×60; you would need to experiment with
each image size separately and compare your results. For each classifier, you need to report
the average error rate using a three-fold cross-validation procedure. For this, we have randomly
divided the dataset three times as follows:
Fold 1: Training (69M, 65F), Validation (73M, 60F), Test (58M, 75F)
Fold 2: Training (62M, 72F), Validation (58M, 75F), Test (80M, 53F)
Fold 3: Training (71M, 63F), Validation (67M, 66F), Test (62M, 71F)
Note that the validation set is typically used for parameter optimization. Since you will not need
to optimize any parameters in this assignment, use both the validation set and test set for
testing purposes by simply combining them into one set. Using each fold, compute the test error
and then average all three errors to report the average error.
For each image, we have pre-computed its eigen-face representation; you should be
training/testing each classifier using the first 30 eigen-features only (i.e., the ones corresponding
to the top 30 eigenvectors). The file naming convention for each file is as follows: trPCA_xx for
training, valPCA_xx for validation, and tsPCA_xx for testing; see “descr” file for more
information. Note that the eigenvalues and eigenvectors have been provided in the files EVs_xx
and PCs_xx for completeness, however, you will not need them in your experiments.
Experiment 1: Apply Support Vector Machines (SVMs) for gender classification. You will be
using the LibSVM implementation. Experiment both with polynomial and RBF kernels as well as
different C values. For consistency, try d=1, 2, and 3 for the polynomial kernel (note that
LibSVM provides two extra parameters for the polynomial kernel; to be consistent with the
lecture, set γ=1 and c0=0). In the case of the RBF kernel, try σ=1, 10, and 100. For the C value,
try C=1, 10, 100, and 1,000. Report your best results both for the 16×20 and 48×60
datasets. Warning: make sure that the data is provided in the format required by LibSVM;
otherwise, it will not work correctly.
Experiment 2: For comparison purposes, apply the Bayes classifier for the same problem.
Model the male and female classes using a Gaussian distribution and use ML estimation to
estimate the parameters for each class. Use equal prior probabilities (e.g., P(ω1)= P(ω2)).
Compare your results with those obtained using SVMs.
1. Cover Page. The cover page should contain Project title, Project number, Course
number, Student’s name, Date due, and Date handed in.
2. Technical discussion. This section should include the techniques used and the principal
equations (if any) implemented.
3. Discussion of results. A discussion of results should include major findings in terms of
the project objectives, and make clear reference to any figures generated.
4. Division of work: Include a statement that describes how the work was divided between
the two group members.
5. Program listings. Includes listings of all programs written by the student. Standard
routines and other material obtained from other sources should be acknowledged by
name, but their listings should not be included.
A hard copy is required for items 1-4, submitted to the instructor in the beginning of the class on
the due date. Item 5 should be emailed to the instructor, as a zip file, before class on the due
date.

![[SOLVED] Cs 479/679 programming assignments 1 to 4 solution](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[Solved] Brain Structure Project](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
Reviews
There are no reviews yet.