The objective of this lab is for you to explore the behavior of Gaussian classifiers in Matlab by applying them to some datasets. The TA will first demonstrate the results that Gaussian classifiers (of various covariance matrix types) give on toy datasets and the MNIST dataset, including the corresponding ROC curve or confusion matrix. Then you will replicate those results and further explore the datasets and classifiers. We provide you with the following:
- The script m sets up the problem (toy dataset or MNIST) and plots various figures. The actual algorithms are implemented in the functions below.
- m and other functions from the Gaussian mixture tools train a Gaussian classifier and do other things.
Look at those functions and try to understand how they work and how they produce plots. It will help you to see how the machine learning algorithms were implemented, become proficient in Matlab and produce useful plots.
I Datasets
Construct your own toy datasets in 1D and 2D, such as Gaussian classes with more or less overlap, or classes with curved
shapes as in the 2moons dataset. You will also use the MNIST dataset of handwritten digits .
II Using a Gaussian classifier
In a Gaussian classifier, for each class k = 1,,K, the class-conditional probability distribution p(Ck|x) for a point x RD is a Gaussian distribution with parameters (k,k) (mean vector of D 1 and covariance matrix of D D). Also, each class has another parameter, its proportion k = p(Ck) [0,1] (its prior distribution). We will consider 6 types of covariance matrix k:
Non-shared (separately for each class) | Shared (equal for all classes) | |
full | F: k | f: k = k |
diagonal | D: k = Dk | d: k = D k |
isotropic | I: k = k2I | i: k = 2I k |
To use a Gaussian classifier, we need to solve two problems:
- Training: to learn the parameters from a training set. This is given by the maximum likelihood estimate (MLE). For the prior distribution and mean vectors, this is:
prior distribution: mean vector:
class k
where the sum above is over the data points in class k. For the covariance matrix, the MLE depends on the type of covariance:
F: k = N1 Pclass k (xn k)(xn k)T (the covariance of points in class k) f:
D: Dk = diag(k) (the diagonal elements of k) d: diag(k).
I: , where kd2 = dth diagonal element of k i:
So the shared-case MLE is the weighted average of the non-shared MLE over the classes using k as weights. Also, we ensure each covariance matrix is full-rank by adding to its diagonal a small positive number, e.g. 1010.
- Testing: to compute the posterior probabilities for a test point x (typically not in the training set, although it can be any point in RD). This is done by m, see lab02.m.
Given these posterior probabilities, we can classify x as argmaxk{1,,K} p(C = k|x), or construct an ROC curve (for K = 2) or confusion matrix (for any K) over a test set.
For each classifier, we plot the following figures:
1
- For 1D datasets: p(x|C), p(x|C)p(C), p(C|x), for each class C = 1,,k, and maxC p(C|x) for each x
- For 2D datasets: contour plot of p(x|C) for each class and class boundaries.
- For any dataset: we plot either the confusion matrix (for K > 2) or the ROC curve (for K = 2) and give the area-under-the-curve (AUC) value.
Explore the classifiers and plots with different datasets, number of classes, classes with more or less overlap, etc. See the end of file lab02.m for suggestions of things to explore.
Reviews
There are no reviews yet.