- Generate 100 random training points from each of the following two distributions: N(20,5) and N(35,5). Write a program that employs the Parzen window technique with a Gaussian kernel to estimate the density, bp(x), using all 200 points. Note that this density conforms to a single bimodal
- [15 points] Plot the estimated density function for each of the following window widths: h = 0.01,0.1,1,10. [Note: You can estimate the density at discrete values of x in the [0,55] interval with a step-size of 1.]
- [10 points] Repeat the above after generating 500 training points from each of the two distributions, and then 1,000 training points from each of the two distributions.
- [5 points] Discuss how the estimated density changes as a function of the window width and the number of training points.
- Consider the dataset available here. It consists of two-dimensional patterns, x = [x1, x2]t, pertaining to 3 classes (1,2,3). The feature values are indicated in the first two columns while the class labels are specified in the last column. The priors of all 3 classes are the same and a 0-1 loss function is assumed. Partition this dataset into a training set (the first 250 patterns of each class) and a test set (the remaining 250 patterns of each class).
- [10 points] Let
p([x1, x2]t|1) N([0,0]t,4I), p([x1, x2]t|2) N([10,0]t,4I), p([x1, x2]t|3) N([5,5]t,5I),
where I is the 2 2 identity matrix. What is the error rate on the test set when the Bayesian decision rule is employed for classification? Report the confusion matrix as well.
- [15 points] Suppose p([x1, x2]t|i) N(i,i), i = 1,2,3, where the is and is are unknown. Use the training set to compute the MLE of the is and the is. What is the error rate on the test set when the Bayes decision rule using the estimated parameters is employed for classification? Report the confusion matrix as well.
- [15 points] Suppose the form of the distributions of p([x1, x2]t|i), i = 1,2,3 is unknown. Assume that the training dataset can be used to estimate the density at a point using the Parzen window technique (a spherical Gaussian kernel with h = 1). What is the error rate on the test set when the Bayes decision rule is employed for classification? Report the confusion matrix as well.
- [10 points] Implement the 1-nearest neighbor (1-NN) method for classifying the patterns in the test set. What is the error rate of the 1-NN method on the test set? Report the confusion matrix as well.
- [20 points] The iris (flower) dataset consists of 150 4-dimensional patterns belonging to three classes (setosa=1, versicolor=2, and virginica=3). There are 50 patterns per class. The 4 features correspond to (a) sepal length in cm, (b) sepal width in cm, (c) petal length in cm, and (d) petal width in cm. Note that the class labels are indicated at the end of every pattern.
Design a K-NN classifier for this dataset. Choose the first 25 patterns of each class for training the classifier (i.e., these are the prototypes) and the remaining 25 patterns of each class for testing the classifier. [Note: Any ties in the K-NN classification scheme should be broken at random.]
- In order to study the effect of K on the performance of the classifier, report the confusion matrix for K=1,5,9,13,17,21.
- Plot the classification accuracy as a function of K. Discuss your observations.
- [10 points] Based on the notation developed in class, write down the Sequential Backward Selection (SBS) algorithm and the Sequential Floating Backward Selection (SFBS) algorithm.
Reviews
There are no reviews yet.