5/5 - (1 vote)

The goal of the assignment is to implement a system for image classification. In other words, this system should tell if there is an object of given class in an image. You will perform 5-class ( {1 : airplanes,2 : birds,3 : ships,4 : horses,5 : cars}) image classification based on bag-of-words approach¹using SIFT features, respectively. STL-10 dataset²will be used for the task. For each class, test sub-directories contain 800 images, and training sub-directories contain 500 images. Images are represented as (RGB) 9696 pixels.

Hint

In a real scenario, the public data you use often deviates from your task. You need to figure it out and re-arrange the label as required using stl10 input.py as a reference.

Download the dataset from http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz . There are five files: test X.bin, test y.bin, train X.bin,train y.bin and unlabeled X.bin. For the project, you will just use the train and test partitions. Download the dataset and make yourself familiar with it by figuring out which images and labels you need for the aforementioned 5 classes. Note that you do not need fold indices variable.

1.1 Training Phase

Training must be conducted over the training set. Keep in mind that using more samples in training will likely result in better performance. However, if your computational resources are limited and/or your system is slow, its OK to use less number of training data to save time.

http://www.robots.ox.ac.uk/~az/icvss08_az_bow.pdf

2 https://cs.stanford.edu/~acoates/stl10/

1.2 Testing Phase

You have to test your system using the specified subset of test images. All 800 test images should be used at once for testing to observe the full performance. Again, exclude them from training for fair comparison.

2 Bag-of-Words based Image Classification

Bag-of-Words based Image Classification system contains the following steps:

Feature extraction and description
Building a visual vocabulary
Quantify features using visual dictionary (encoding)
Representing images by frequencies of visual words
Train the classifier

We will consider each step in detail.

2.1 Feature Extraction and Description

SIFT descriptors can be extracted from either (1) densely sampled regions or (2) key points. You can use SIFT related functions in OpenCV for feature extraction.

2.2 Building Visual Vocabulary

Here, we will obtain visual words by clustering feature descriptors, so each cluster center is a visual word, as shown in Figure 1. Take a subset (maximum half) of all training images (this subset should contain images from ALL categories), extract SIFT descriptors from all of these images, and run k-means clustering (you can use your favourite k-means implementation) on these SIFT descriptors to build visual vocabulary. Then, take the rest of the training images to calculate visual dictionary. Nonetheless, you can also use less images, say 100 from each class (exclusive from the previous subset) if your computational resources are limited. Pre-defined cluster numbers will be the size of your vocabulary. Set it to different sizes (500, 1000 and 2000).

Figure 1: An illustration on learning visual dictionary. Note: (1) Code-words is another term for visual words. (2) The figure is from Hosef Slvic, with SIFT space used.

2.3 Encoding Features Using Visual Vocabulary

Once we have a visual vocabulary, we can represent each image as a collection of visual words. For this purpose, we need to extract feature descriptors (SIFT) and then assign each descriptor to the closest visual word from the vocabulary.

2.4 Representing images by frequencies of visual words

The next step is the quantization. The idea is to represent each image by a histogram of its visual words, see Figure 2 for overview. Check out matplotlibs hist function. Since different images can have different numbers of features, histograms should be normalized.

Figure 2: Schematic representation of Bag-Of-Words system.

2.5 Classification

We will train a classifier per each object class. Now, we take the Support Vector Machine (SVM) as an example. As a result, we will have 5 binary classifiers. Take images from the training set of the related class (should be the ones which you did not use for dictionary calculation). Represent them with histograms of visual words as discussed in the previous section. Use at least 50 training images per class or more, but remember to debug your code first! If you use the default setting, you should have 50 histograms of size 500. These will be your positive examples. Then, you will obtain histograms of visual words for images from other classes, again about 50 images per class, as negative examples. Therefore, you will have 200 negative examples. Now, you are ready to train a classifier. You should repeat it for each class. To classify a new image, you should calculate its visual words histogram as described in Section 2.4 and use the trained SVM classifier to assign it to the most probable object class. (Note that for proper SVM scores you need to use cross-validation to get a proper estimate of the SVM parameters. In this assignment, you do not have to experiment with this cross-validation step).

2.6 Evaluation

To evaluate your system, you should take all the test images from all classes and rank them based on each binary classifier. In other words, you should classify each test image with each classifier and then sort them based on the classification score. As a result, you will have five lists of test images. Ideally, you would have images with airplanes on the top of your list which is created based on your airplane classifier, and images with cars on the top of your list which is created based on your car classifier, and so on.

In addition to the qualitative analysis, you should measure the performance of the system quantitatively with the Mean Average Precision over all classes. The Average Precision for a single class c is defines as

, (1)

where n is the number of images (n = 50 5 = 250), m is the number of images of class c (m_c= 50), x_iis the i^thimage in the ranked list X = {x₁,x₂,,x_n}, and finally, f_cis a function which returns the number of images of class c in the first i images if x_iis of class c, and 0 otherwise. To illustrate, if we want to retrieve R and we get the following sequence: [R,R,T,R,T,T,R,T], then n = 8, m = 4, and.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CV1 Lab Final Part 1-Image Classification using Bag-of-Words

1.1 Training Phase

1.2 Testing Phase

2 Bag-of-Words based Image Classification

2.1 Feature Extraction and Description

2.2 Building Visual Vocabulary

2.3 Encoding Features Using Visual Vocabulary

2.4 Representing images by frequencies of visual words

2.5 Classification

2.6 Evaluation

Reviews

Whatsapp Us

[Solved] CV1 Lab Final Part 1-Image Classification using Bag-of-Words

1.1 Training Phase

1.2 Testing Phase

2 Bag-of-Words based Image Classification

2.1 Feature Extraction and Description

2.2 Building Visual Vocabulary

2.3 Encoding Features Using Visual Vocabulary

2.4 Representing images by frequencies of visual words

2.5 Classification

2.6 Evaluation

Reviews

Related products

[Solved] CV1 Assignment 3

[Solved] CV1 Lab 2-Neighborhood Processing and Filters

[Solved] CV1 Assignment 2

[Solved] CV1 Lab 4-Image Alignment and Stitching

[Solved] CV1 Lab1-Photometric Stereo & Colour

[Solved] CV1 Assignment 1