[Solved] EE5934 Assignment 1: K- Nearest Neighbours

$25

File Name: EE5934_Assignment_1__K__Nearest_Neighbours.zip
File Size: 395.64 KB

SKU: [Solved] EE5934 Assignment 1: K- Nearest Neighbours Category: Tag:
5/5 - (1 vote)

In this assignment, you are required to implement a k-NN classifier to perform image classification on the USPS dataset. First, set up the environment and load the dataset as follows.

In [1]:

# Run this cell to set up the environment and data import osimport syssys.path.append(../../homework/) import matplotlib.pyplot as pltimport numpy as np %load_ext autoreload%autoreload 2 # Load the USPS datausps_dir = ../data/usps.h5import h5pywith h5py.File(usps_dir, r) as hf: train = hf.get(train) trainX = train.get(data)[:] trainy = train.get(target)[:] test = hf.get(test) testX = test.get(data)[:] testy = test.get(target)[:] # A small subset of USPS for you to run kNN on# as kNN is a little bit slowsub_trainX, sub_trainy = trainX[:5000], trainy[:5000]sub_testX, sub_testy = testX[:1000], testy[:1000]

kNN

Next, implement the k-NN classifier which consists of the following two stages:

  • Training stage: k-NN classifier loads the training data and remembers it;
  • Testing stage: k-NN classifies every test image by comparing with all training images and transferring the labels of the k most similar training examples.

Training Stage:

In [2]:

from knn import KNN classifier = KNN()classifier.train(sub_trainX, sub_trainy)

Testing Stage:

Implement the k-NN classifier by completing the class method KNN.predict in homework/Assignment1/knn.py and evaluate the k-NN classification error.

In [3]:

y_pred = classifier.predict(sub_testX, k=1) acc = np.sum(y_pred == sub_testy) / len(sub_testy)print(Accuracy: , acc)

Accuracy: 0.942

Cross Validation:

To find the best k, it is tempting to test different k on the test data. However, this leads to overfitting the test data.

Read Section 5.3.1 of the Deep Learning book for information about cross validation which is a technique for testing a model on unseen data.

Complete the cell below to carry out cross-validation as follows: split the training data and the corresponding labels into 5 subsets(folds), and then do a 5-fold cross validation to test different values of k.

In [4]:

# k_to_accuracies is a dictionary to hold the classification accuracies obtained when# running cross-validation for different values of k. After running cross-validation,# k_to_accuracies[k] would comprise 5 classification accuracy values found for the # particular value of k.k_to_accuracies = {} ################################################################################# TODO: WRITE CODE FOR THE FOLLOWING ## Perform 5-fold cross validation to find the best value of k as follows: for ## each value of k being considered, run the k-NN algorithm 5 times where in ## each run, one fold is used as validation data while the other folds are ## used as training data. Store the accuracies for all folds for each value ## of k in the k_to_accuracies dictionary. #################################################################################k_cands = [1,2,3,4,5,7,9,12,15,18,22,26,30]trainX_folds = np.split(sub_trainX,5)trainy_folds = np.split(sub_trainy,5) for k in k_cands: accs = [] for i in range(5): trainX_fold = trainX_folds.copy() trainy_fold = trainy_folds.copy() valX_fold = trainX_fold.pop(i) valy_fold = trainy_fold.pop(i) trainX_fold = np.concatenate(trainX_fold) trainy_fold = np.concatenate(trainy_fold) classifier.train(trainX_fold, trainy_fold) y_pred = classifier.predict(valX_fold, k) acc = np.sum(y_pred == valy_fold) / len(valy_fold) accs.append(acc) k_to_accuracies[k] = accs################################################################################# END OF YOUR CODE ################################################################################# # Print out the computed accuraciesfor k in sorted(k_to_accuracies): for accuracy in k_to_accuracies[k]: print(k = %d, accuracy = %f % (k, accuracy))

k = 1, accuracy = 0.960000

k = 1, accuracy = 0.975000

k = 1, accuracy = 0.950000

k = 1, accuracy = 0.953000

k = 1, accuracy = 0.961000

k = 2, accuracy = 0.957000

k = 2, accuracy = 0.975000

k = 2, accuracy = 0.947000

k = 2, accuracy = 0.937000

k = 2, accuracy = 0.948000

k = 3, accuracy = 0.957000

k = 3, accuracy = 0.973000

k = 3, accuracy = 0.950000

k = 3, accuracy = 0.954000

k = 3, accuracy = 0.958000

k = 4, accuracy = 0.951000

k = 4, accuracy = 0.972000

k = 4, accuracy = 0.946000

k = 4, accuracy = 0.948000

k = 4, accuracy = 0.952000

k = 5, accuracy = 0.955000

k = 5, accuracy = 0.969000

k = 5, accuracy = 0.946000

k = 5, accuracy = 0.950000

k = 5, accuracy = 0.954000

k = 7, accuracy = 0.950000

k = 7, accuracy = 0.967000

k = 7, accuracy = 0.943000

k = 7, accuracy = 0.938000

k = 7, accuracy = 0.953000

k = 9, accuracy = 0.947000

k = 9, accuracy = 0.964000

k = 9, accuracy = 0.938000

k = 9, accuracy = 0.936000

k = 9, accuracy = 0.943000

k = 12, accuracy = 0.940000

k = 12, accuracy = 0.960000

k = 12, accuracy = 0.931000

k = 12, accuracy = 0.932000

k = 12, accuracy = 0.936000

k = 15, accuracy = 0.931000

k = 15, accuracy = 0.953000

k = 15, accuracy = 0.920000

k = 15, accuracy = 0.926000

k = 15, accuracy = 0.935000

k = 18, accuracy = 0.933000

k = 18, accuracy = 0.953000

k = 18, accuracy = 0.919000

k = 18, accuracy = 0.926000

k = 18, accuracy = 0.933000

k = 22, accuracy = 0.926000

k = 22, accuracy = 0.946000

k = 22, accuracy = 0.915000

k = 22, accuracy = 0.917000

k = 22, accuracy = 0.928000

k = 26, accuracy = 0.923000

k = 26, accuracy = 0.942000

k = 26, accuracy = 0.908000

k = 26, accuracy = 0.910000

k = 26, accuracy = 0.927000

k = 30, accuracy = 0.918000

k = 30, accuracy = 0.938000

k = 30, accuracy = 0.904000

k = 30, accuracy = 0.906000

k = 30, accuracy = 0.920000

In [5]:

# visualize the results above. ################################################################################# TODO: WRITE CODE FOR THE FOLLOWING ## To better understand the influence of different k values, ## show the above printed results in the scatter plot and then plot the trend ## with error bars that correspond to standard deviation. #################################################################################import matplotlib.pyplot as plt# scatter plotfor k in k_cands: plt.scatter([k]*5,k_to_accuracies[k]) acc_means = []acc_stds = []for k in k_cands: acc_means.append(np.mean(k_to_accuracies[k])) acc_stds.append(np.std(k_to_accuracies[k])) plt.errorbar(k_cands,acc_means,yerr=acc_stds)################################################################################# END OF YOUR CODE #################################################################################plt.title(Cross-validation on k)plt.xlabel(k)plt.ylabel(Cross-validation accuracy)plt.show()

In [6]:

################################################################################# TODO: ## Based on the cross-validation results above, identify the best value for k ## and apply it to best_k below. Then, retrain the classifier using all the ## training data, and test it on the test data. #################################################################################best_k = 1################################################################################# END OF YOUR CODE ################################################################################# classifier = KNN()classifier.train(sub_trainX, sub_trainy)y_test_pred = classifier.predict(sub_testX, k=best_k) # Compute and display the accuracyaccuracy = np.sum(y_test_pred == sub_testy) / len(sub_testy)print(accuracy: %f % accuracy)

accuracy: 0.942000

Inline question:

In practice, why do we often choose an odd number for k? Give your answer below.

As we can see from figure-the Cross-validation on k, it has obvious vallys as k = 2,4,12. The ambiguity will happend when the test point has equal number voters of different classes. If we choose odd number, that case will not happen.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] EE5934 Assignment 1: K- Nearest Neighbours[Solved] EE5934 Assignment 1: K- Nearest Neighbours
$25