CS4185 Multimedia Technologies and Applications
Course Project
Objectives
The objectives of this course project are for students to have some handson experiences of multimedia programming and to develop an image retrieval application. This course project is interesting because we learn how to find a particular object football in our case from a set of images. You are given an image retrieval program written using COpenCV, and are asked to extend it to provide additional features. This project involves first extracting different features from the input image, and then improving the imageobject matching performances through different ways of combining the extracted features.
Requirements of the Course Project
This course project can be carried out as an individual or group project. The maximum number of members in each group is 3. However, we expect more work and better results from a group with more people, and the responsibility of each group member should be clearly indicated in the report.
You are given an OpenCVbased demo program. The package includes two image datasets dataset1 and dataset2. dataset1 contains a lot of images, some of which contain footballs in them, for the Image Retrieval task. dataset2 contains only football images for the Object Detection task.
Using the given matching methods in the demo program, you can only correctly retrieve a few matched images that contain the desired object i.e., football in this case or locate some of the desired objects. You are asked to improve the retrieval performance of this program by adding more feature extractors.
There are two levels of requirements for the project, basic and advanced, to cater for students of different backgrounds and interests. The basic requirements are designed for all the students to practice some multimedia programming skills. The advanced requirements are for those students who would like to go further to create an application, and are more flexible in terms of what you would like to do. The basic requirements and advanced requirements account for 80 and 25, respectively, of the grade for this project. The total final mark will be bounded by 100.
2.1 Basic Requirements 80
Students are required to finish the following two tasks in the basic requirements:
Task 1: Image Retrieval
Find the images containing footballs i.e., images 990.jpg, 991.jpg, , 999.jpg from dataset1. Each time, you pick one of these football images i.e., images 990.jpg, 991.jpg, , 999.jpg as the input image to the program. The program will return n images. n is set to 10 by default, but you may change it. As there are a total of 10 football images in dataset1, the final retrieval performance is computed as the average of the 10 retrieval results.
Improvement on the Precision 20
The target of this requirement is to achieve an average of 60 retrieval precision. This means that given an input football image, the program will return some matched images from dataset1. Among these returned images, at least 60 of them contain footballs. 30 precision gets 5 of marks, 60 precision gets 20 of marks, etc.
Improvement on the Recall 20
The target of this requirement is to be able to retrieve an average of 60 of all the images in dataset1 containing footballs. 30 recall gets 5 of marks, 60 recall gets 20 of marks, etc.
Task 2: Object Detection
Detect and locate the football in each image in dataset2. Use the given football image filename: football.png as input and generate bounding boxes to indicate the locations of the football in the images, as shown in the demo program.
Improvement on Top 10 Detection Accuracy 20
Top 10 accuracy refers to the situation that one of the top 10 detected bounding boxes should be a correct match with the ground truth bounding box based on the intersection over union IoU metric. IoU is the intersected region of two bounding boxes divided by the union of the two bounding boxes. For these two bounding boxes, one is the ground truth bounding box provided by us and the other is detected by your program. Here, for each retrieved image, if the best IoU among the top 10 returned bounding boxes is more than 0.1, we consider this image as a correct detection. The final accuracy is defined by how many retrieved images that are considered as correct detection. To be exact, if all 10 images are considered as correct detection, your algorithm has 100 accuracy. See:
http:www.mathworks.comhelpvisionrefbboxoverlapratio.html for more information.
Note: You should use the same setting to test all the images in dataset2 and report your accuracy. Evaluation code has already been included in the demo program. 40 accuracy gets 5 of marks, 70 accuracy gets 20 of the marks, etc.
Improvement on IoU 20
This is to try and improve the localization accuracy measured by IoU. The higher the IoU that you get, the higher the mark that you will receive. The final IoU score is computed by averaging the IoU obtained from each of the images in dataset2. 10 accuracy improvement gets 5 of marks, 20 improvement gets 10 of marks, 30 or above gets 20 marks.
2.2 Advanced Requirements 25
Students are expected to extend the program into an application. The extension can be done along two directions: technical improvement andor UI design. The technical improvement may include speeding up the retrieval time and advancing the retrieval performance with new techniques such as using machine learning methods, high dimensional data indexing techniques, efficient searching of subregions of each image instead of using sliding window, or a crawler to obtain images from the internet.A UI may include realtime display of the regions of each image being compared and their scores, or allowing users to select different objects to be retrieved from the database.
Grading
The course work component contributes 40 of the final course markgrade. Attendance will contributes to 5. For the remaining 35, I will select one of the following distributions for your project that will maximize your coursework mark:
15 for course project, 20 for quiz
17.5 for course project, 17.5 for quiz
20 for course project, 15 for quiz
Note that we will use a PC with the following configurations to grade the course projects:
Windows with Visual Studio 2017
OpenCV 2.4.13
Unfortunately, we do not have a Mac to grade the course projects. I understand that SCM students may not have a PC for the course project. I have asked cslab to install the above tools in all the PCs in room MMW2410 in the cslab. So, you may use those PCs for your course project, if you like.
Submission Details
Due date: November 10, 2019
Each group needs to submit the following items in a CD or a USB, together with a hardcopy report summarizing the work see Report below:
Program:
A source subdirectory containing all the source files and the necessary files.
A binary subdirectory containing the executable file of the program and relevant files, including image files or libraries. The executable file should output the retrieved results e.g., the list of retrieved images, precision, recall values and IoU in the Object Detection task. Note that it is important to make sure that we only need to click on the executable file to run the program.You will need to try the executable file on a different machine before you submit the work. We will not be able to give you marks if we fail to run your executable file.
A readme file with instructions on how to compile and execute the program.
Demo:
A demo video that guides the marker through the main contributions of the work. This video should be captured while you are running the program, so that we can see the inputs and the outputs.
Report:
The purpose of this report is just to indicate the main contributions of the work. We will not be marking on the report itself. Instead, the report should show us what have been done so that we may grade the work appropriately. Hence, there is no need to submit a large report. It can just be a few pages providing the following information:
A cover that indicates your names and student IDs
A brief description of the final program, including the main modules and the relationship of these modules. The description may be in the form of short paragraphs or a flow diagram.
A list of features added to the original program, including the names of the modified modules in reference to point 2 above, brief explanations, and screen captures of the results.
Listing of your entire program output. The demo has been rewritten to output some required information. You should report these information, including the precision, recall for each football image and the average value in the Image Retrieval task, and the IoU of the top 10 detected bounding boxes for each image in the Object Detection task. You may organize these results into several tables if you prefer.
Responsibilities of each group member if applicable, including
The programmer of each added function
The author of each major section of the report
The person who has done the survey, group coordination, etc.
Note that your submission must contain the above items. Marks may be deducted if any is missing. There is no need to submit the image database.
Reviews
There are no reviews yet.