1. Objectives
The objectives of this project are for you to have some hands-on experiences of multimedia programming and to develop an image retrieval application. This project is interesting because we learn how to find a particular object (football in our case) from a set of images. You are given an image retrieval program written using C++/OpenCV, and are asked to extend it to provide additional features. This project involves first extracting different features from the input image, and then improving the image/object matching performances through different ways of combining the extracted features.
2. Requirements
You are given an OpenCV-based demo program. The package includes two image datasets (dataset1 and dataset2). dataset1 contains a lot of images, some of which contain footballs in them, for the Image Retrieval task. dataset2 contains only football images for the Object Detection task.
Using the given matching methods in the demo program, you can only correctly retrieve a few matched images that contain the desired object (i.e., football in this case) or locate some of the desired objects. You are asked to improve the retrieval performance of this program by adding more feature extractors.
2.1 Basic Requirements
OpenCV 2.4.13 Visual Studio 2017
Task 1: Image Retrieval
Find the images containing footballs (i.e., images 990.jpg, 991.jpg, , 999.jpg) from dataset1. Each time, you pick one of these football images (i.e., images 990.jpg, 991.jpg, , 999.jpg) as the input image to the program. The program will return n images. (n is set to 10 by default, but you may change it.) As there are a total of 10 football images in dataset1, the final retrieval performance is computed as the average of the 10 retrieval results.
Improvement on the Precision
The target of this requirement is to achieve an average of 60% retrieval precision. This means that given an input
football image, the program will return some matched images from dataset1. Among these returned images, at least
60% of them contain footballs.
Precision = number of (retrieved images AND containing football) number of retrieved images
Improvement on the Recall
The target of this requirement is to be able to retrieve an average of 60% of all the images in dataset1 containing footballs.
Recall = number of (retrieved images AND containing football) number of images in the dataset containing footballs
Task 2: Object Detection
Detect and locate the football in each image in dataset2. Use the given football image (filename: football.png) as input and generate bounding boxes to indicate the locations of the football in the images, as shown in the demo program.
Improvement on Top 10 Detection Accuracy
Top 10 accuracy refers to the situation that one of the top 10 detected bounding boxes should be a correct match with the ground truth bounding box based on the intersection over union (IoU) metric. IoU is the intersected region of two bounding boxes divided by the union of the two bounding boxes. (For these two bounding boxes, one is the ground truth bounding box provided by us and the other is detected by your program.) Here, for each retrieved image, if the best IoU among the top 10 returned bounding boxes is more than 0.1, we consider this image as a correct detection. The final accuracy is defined by how many retrieved images that are considered as correct detection. To be exact, if all 10 images are considered as correct detection, your algorithm has 100% accuracy. See:
http://www.mathworks.com/help/vision/ref/bboxoverlapratio.html for more information.
Note: You should use the same setting to test all the images in dataset2 and report your accuracy. Improvement on IoU
This is to try and improve the localization accuracy measured by IoU. The higher the IoU that you get, the higher the mark that you will receive. The final IoU score is computed by averaging the IoU obtained from each of the images in dataset2.
2.2 Advanced Requirements
You are expected to extend the program into an application. The extension can be done along two directions: technical improvement and/or UI design. The technical improvement may include speeding up the retrieval time and advancing the retrieval performance with new techniques (such as using machine learning methods, high dimensional data indexing techniques, efficient searching of sub-regions of each image instead of using sliding window, or a crawler to obtain images from the internet). A UI may include real-time display of the regions of each image being compared and their scores, or allowing users to select different objects to be retrieved from the database.
Reviews
There are no reviews yet.