- Let X = {x1,,xn} be a set of n samples drawn i.i.d. from an univariate distribution with density function p(x|), where is an unknown parameter. In general, will belong to a specified subset of R, the set of real numbers. For the following choices of p(x|), derive the maxmimum likelihood estimate of based on the samples X:[1]
- p(x|) = x1 ,0 x 1,0 < < .
- p(x|) = 1 ,0 x , >
- Let X = {x1,,xn},xi Rd be a set of n samples drawn i.i.d. from a multivariate Gaussian distribution in Rd with mean Rd and covariance matrix Rdd. Recall that the density function of a multivariate Gaussian distribution is given by:
.
- Derive the maximum likelihood estimates for the mean and covariance based on the sample set X.1,2
- Let n be the maximum likelihood estimate of the mean. Is n a biased estimate of the true mean ? Clearly justify your answer by computing E[n].
- Let n be the maximum likelihood estimate of the covariance matrix. Is n a biased estimate of the true covariance ? Clearly justify your answer by computing
E[n].
- Table 1 specifies the misclassification costs for a 3-class problem including a Reject option. Assume that a model has been trained using training data, and the model can output posterior probabilities P(C1|xtest),P(C2|xtest),P(C3|xtest) for any given test point xtest.
- Assume = 10. For a given xtest, let the posterior probabilities for the three classes be: P(C1|xtest) = 0.5,P(C2|xtest) = 0.25,P(C3|xtest) = 0. Using Table 1, compute the risks for predicting x to be C1, C2,C3, and Reject respectively. Including Reject as a possible option, what would your predicted class for xtest be? You have to show details of your computation and justify your answer.
Predicted Class
|
Table 1: Misclassification costs for a 3-class problem including a Reject option.
- Assume = 5. For a given xtest, let the posterior probabilities for the three classes be: P(C1|xtest) = 0.4,P(C2|xtest) = 0.5,P(C3|xtest) = 0. Using Table 1, compute the risks for predicting x to be C1, C2,C3, and Reject respectively. Including Reject as a possible option, what would your predicted class for xtest be? You have to show details of your computation and justify your answer.
Programming assignment:
The next problem involves programming. For Question 3, we will be using the 2-class classification datasets from Boston50, Boston75, and the 10-class classification dataset from Digits which were used in Homework 1.
- We will develop two parametric classifiers by modeling each classs conditional distribution p(x|Ci) as multivariate Gaussians with (a) full covariance matrix i and (b) diagonal covariance matrix i. In particular, using the training data, we will compute the maximum likelihood estimate of the class prior probabilities p(Ci) and the class conditional probabilities p(x|Ci) based on the maximum likelihood estimates of the mean i and the (full/diagonal) covariance i for each class Ci. The classification will be done based on the following discriminant function:
gi(x) = logp(Ci) + logp(x|Ci) .
We will develop code for a class MultiGaussClassify with two key functions:
MultiGaussClassify.fit(self,X,y,diag) and MultiGaussClassify.predict(self,X).
For fit(self,X,y,diag), the inputs (X,y) are respectively the feature matrix and class labels, and diag is boolean (TRUE or FALSE) which indicates whether the estimated class covariance matrices should be a full matrix (diag=FALSE) or a diagonal matrix (diag=TRUE).
For predict(X), the input X is the feature matrix corresponding to the test set and the output should be the predicted labels for each point in the test set.
For the class, the init (self,k,d) function can initialize the parameters for each class to be uniform prior, zero mean, and identity covariance, i.e., p(Ci) = 1/k, i = 0 and i = I, i = 1,,k. Here, the number of classes k and the dimensionality d of features is passed as an argument to the constructor of MultiGaussClassify.
We will compare the performance of three models:
- MultiGaussClassify with full class covariance matrices,
- MultiGaussClassify with diagonal covariance matrices, and
- LogisticRegression[2]
applied to three datasets: Boston50, Boston75, and Digits. Using my cross val with 5-fold cross-validation, report the error rates in each fold as well as the mean and standard deviation of error rates across folds for the three models applied to the three classification datasets You will have to submit (a) code and (b) summary of results:
(a) Code: You will have to submit code for MultiGaussClassify as well as a wrapper code hw2q3(). For the class, please use the following template: class MultiGaussClassify:
def init (self, k, d):
def fit(self, X, y, diag=False):
def predict(self, X):
Your class MultiGaussClassify should not inherit any base class in sklearn. Again, the three functions you must implement in the MultiGaussClassify class are init , fit, and predict.
The wrapper code hw2q3() (main file) has no input and is used to prepare the datasets, and make calls to my cross val(method,X,y,k) to generate the error rate results for each dataset and each method. The code for my cross val(method,X,y,k) must be yours (e.g., code you developed in HW1 with modifications as needed) and you cannot use cross val score() in sklearn. For the method argument in my cross val, you can call the method corresponding to MultiGaussClassify with full covariance matrix as just multigaussclassify and the method corresponding to MultiGaussClassify with diagonal covariance matrix as multigaussdiagclassify.
The results should be printed to terminal (not generating an additional file in the folder). Make sure the calls to my cross val(method,X,y,k) are made in the following order and add a print to the terminal before each call to show which method and dataset is being used:
- MultiGaussClassify with full covariance matrix on Boston50,
- MultiGaussClassify with full covariance matrix on Boston75,
- MultiGaussClassify with full covariance matrix on Digits,
- MultiGaussClassify with diagonal covariance matrix on Boston50,
- MultiGaussClassify with diagonal covariance matrix on Boston75,
- MultiGaussClassify with diagonal covariance matrix on Digits,
- LogisticRegression with Boston50,
- LogisticRegression with Boston75, and
- LogisticRegression with Digits.
For example, the first call to my cross val(method,X,y,k) should result in the following output:
Error rates for MultiGaussClassify with full covariance matrix on Boston50:
Fold 1: ###
Fold 2: ###
Fold 5: ###
Mean: ###
Standard Deviation: ###
(b) Summary of results: For each dataset and each method, report the test set error rates for each of the k = 5 folds, the mean error rate over the k folds, and the standard deviation of the error rates over the k folds. Make a table to present the results for each method and each dataset (9 tables in total). Each column of the table represents a fold, and add two columns at the end to show the overall mean error rate and standard deviation over the k folds. For example:
Error rates for MGC with full cov matrix on Boston50 | ||||||
Fold 1 | Fold 2 | Fold 3 | Fold 4 | Fold 5 | Mean | SD |
# | # | # | # | # | # | # |
[1] You have to show the details of your derivation. A correct answer without the details will not get any credit. 2You can use material from the Matrix Cookbook and/or the textbook for your derivation.
[2] You should use LogisticRegression from scikit-learn, similar to HW1.
Reviews
There are no reviews yet.