[Solved] CS6476 Project4-Depth Estimation Using Stereo

$25

File Name: CS6476_Project4-Depth_Estimation_Using_Stereo.zip
File Size: 423.9 KB

SKU: [Solved] CS6476 Project4-Depth Estimation Using Stereo Category: Tag:
5/5 - (1 vote)

The goal of this project is to create stereo depth estimation algorithms, both classical and deep learning based. For classical stereo depth estimation algorithms, you will be using deterministic functions to compare patches and compute a disparity map. For deep learning based algorithms, you will be using a learning method to estimate the disparity map. There will be two parts in this project, the first of which is described in this handout. You will implement functions in part1_*.py to generate random patches, evaluate the similarity of those patches, and then compute the disparity map for several images. The corresponding notebook for this section is part1_simple_stereo.ipynb. Part 2 of this project (including its corresponding handout) will be released separately.

Setup

  1. Install Miniconda. It doesnt matter whether you use Python 2 or 3 because we will create our own environment that uses python3 anyways.
  2. Download and extract the part 1 starter code.
  3. Create a conda environment using the appropriate command. On Windows, open the installed Conda prompt to run the command. On MacOS and Linux, you can just use a terminal window to run the command, Modify the command based on your OS (linux, mac, or win): conda env create -f

proj4_env_<OS>.yml. If youre running into issues building the environment, try running conda update all first.

  1. This will create an environment named cs6476 proj4. Activate it using the Windows command, activate cs6476_proj4 or the MacOS / Linux command, conda activate cs6476_proj4 or source activate cs6476_proj4
  2. Install the project package, by running pip install -e . inside the repo folder. This might be unnecessary for every project, but is good practice when setting up a new conda environment that may have pip
  3. Run the notebook using jupyter notebook ./proj4_code/part1_simple_stereo.ipynb
  4. After implementing all functions, ensure that all sanity checks are passing by running pytest proj4_unit_tests inside the repo folder.
  5. Complete part 2 (template and instructions to be released separately).

1 Simple stereo by matching patches

Introduction

We know that there is some encoding of depth when images are captured using a stereo rig, much like human eyes. You can try a simple experiment to see the stereo effect in action. Try seeing a scene with only your left eye. Then close your left eye and see using your right eye. Make the transition quickly. You should notice a horizontal shift in the image perceived. Can you comment on the difference in shift for different objects when you do this experiment? Is it related to the depth of the objects in some way?

In this section, we will generate a disparity map, which is the map of horizontal shifts estimated at each pixel. We will start working on a simple algorithm, which will then be improved to calculate more accurate disparity maps.

The notebook corresponding to this part is part1_simple_stereo.ipynb.

1.1 Random dot stereogram

It was once believed that in order to perceive depth, one must either match feature points (like SIFT) between left and right images, or rely upon clues such as shadows.

A random dot stereogram eliminates all other depth cues, and hence proves that a stereo setup is sufficient to get an idea of the depth of the scene. A random dot stereogram is generated by the following steps:

  1. Create the left image with random dots at each pixel (0/1 values).
  2. Create the right image as a copy of the left image.
  3. Select a region in the right image and shift it horizontally.
  4. Add a random pattern in the right image in the empty region created after the shift.

In part1a_random_stereogram.py, you will implement generate_random_stereogram() to generate a random dot stereogram for the given image size.

1.2 Similarity measure

To compare patches between left and right images, we will need two kinds of similarity functions:

  1. Sum of squared differences (SSD):

SSD(A,B) = X (Aij Bij)2 (1)

i[0,H),j[0,W)

  1. Sum of absolute differences (SAD):

SAD(A,B) = X |Aij Bij| (2)

i[0,H),j[0,W)

where A and B are two patches of height H and width W.

In part1b_similarity_measures.py, you will implement the following:

  • ssd_similarity_measure(): Calculate SSD distance.
  • sad_similarity_measure(): Calculate SAD distance.

1.3 Disparity maps

We are now ready to write code for a simple algorithm for stereo matching. You will need to follow the steps visualized in Figure 1:

Figure 1: Example of a stereo algorithm.

  1. Pick a patch in the left image (red block), P1.
  2. Place the patch in the same (x,y) coordinates in the right image (red block). As this is binocular stereo, we will need to search for P1 on the left side starting from this position. Make sure you understand this point well before proceeding further.
  3. Slide the block of candidates to the left (indicated by the different pink blocks). The search area is restricted by the parameter max_search_bound in the code. The candidates will overlap.
  4. We will pick the candidate patch with the minimum similarity error (green block). The horizontal shift from the red block to the green block in this image is the disparity value for the center of P1 in the left image.

Note: the images have already been rectified, so we can search only a single horizontal scan line.

In part1c_disparity_map.py, you will implement calculate_disparity_map() (please read the documentation carefully!) to calculate the disparity value at each pixel by searching a small patch around a pixel from the left image in the right image.

(a) Convex profile. (b) Non-convex profile.

Figure 2

1.4 Error profile analysis

Before computing the full disparity map, we will first analyze the similarity error distribution between patches. You will have to find two examples which display a close-to-convex error profile, and a highly non-convex profile, respectively. For reference, we provide the plots we obtained (see Figure 2). Based on your output visualizations and understanding of the process, answer the reflection questions in the report.

1.5 Real life stereo images

You will iterate through pairs of images from the dataset and calculate the disparity maps for images. The code is already given to you. You just need to compare the disparity maps and answer the reflection questions in the report.

1.6 Smoothing

One issue with the above results is that they arent very smooth. Pixels next to each other on the same surface can have vastly different disparities, making the results look very noisy and patchy in some areas. Intuitively, pixels next to each other should have a smooth transition in disparity (unless at an object boundary or occlusion).

In this part, we try to improve our results. One way of doing this is through the use of a smoothing constraint. The smoothing method we use is called semi-global matching (SGM) or semi-global block matching. Before, we picked the disparity for a pixel based on the minimum matching cost of the block using some metric (SSD or SAD). Now, the basic idea of SGM is to penalize disparity computations which are very different than their pixel-wise neighbors by adding a penalty term on top of the matching cost term. SGM approximately minimizes the global (over the entire image) energy function.

E(D) X(C(p,Dp) + XPT(|Dp Dq|))

p q

C(p,Dp) is the matching cost for a pixel with disparity Dp, q is a neighboring pixel, and PT() is some penalty function penalizing the difference in disparities. You can read more about how this method works and is optimized here: Semi-Global Matching Motivation, Developments, and Applications and Stereo Processing by Semi-Global Matching and Mutual Information.

You will not need to implement SGM by yourself. But to help understand SGM, you will implement a function which computes the cost volume. You have already written code to compute disparity map. Now you will extend that code to compute the cost volume. Instead of taking the argmin of the similarity error profile, we will store the tensor of error profiles at each pixel location along the third dimension. If we have an input image of dimension (H,W,C) and max search bound of D, the cost_volume will be a tensor of dimension (H,W,D). The cost volume at (i,j) pixel is the error profile obtained for the patch in the left image centered at (i,j).

In part1c_disparity_map.py, you will implement calculate_cost_volume() to calculate the disparity map.

Testing

We have provided a set of tests for you to evaluate your implementation. We have included tests for part

1 inside part1_simple_stereo.ipynb so you can check your progress as you implement each function. When youre done with the entire project, you can call additional tests by running pytest proj4_unit_tests inside the root directory of the project, as well as checking against the tests on Gradescope. Your grade on the coding portion of the project will be further evaluated with a set of tests not provided to you.

  • Project 4 part 2The goal of this project is to create stereo depth estimation algorithms, both classical and deep learning based. In part 1, you implemented classical stereo depth estimation algorithms using a deterministic function to evaluate patches and then get disparity map. In part 2, you will implement deep learning based algorithms to estimate the disparity map. Specifically, you will 1) implement the part for generating patch and architecture of MC-CNN model in part2_*.py and go through part2_disparity.ipynb. Make sure you pass all the sanity checks for part 2 before starting training. 2) use part2_mc_cnn.ipynb to go through the training and visualize the results of your model.

    Setup

    You can skip steps 1-5 if youve already started part 1.

    1. Install Miniconda. It doesnt matter whether you use Python 2 or 3 because we will create our own environment that uses python3 anyways.
    2. Download and extract the project starter code.
    3. Create a conda environment using the appropriate command. On Windows, open the installed Conda prompt to run the command. On MacOS and Linux, you can just use a terminal window to run the command, Modify the command based on your OS (linux, mac, or win): conda env create -f

    proj4_env_<OS>.yml

    1. This will create an environment named cs6476 proj4. Activate it using the Windows command, activate cs6476_proj4 or the MacOS / Linux command, conda activate cs6476_proj4 or source activate cs6476_proj4
    2. Install the project package, by running pip install -e . inside the repo folder. This might be unnecessary for every project, but is good practice when setting up a new conda environment that may have pip
    3. Section 2:
      • Part 1: Run the notebook using jupyter notebook ./proj4_code/part2_disparity.ipynb
      • Part 2: Run the notebook part2_mc_cnn.ipynb and upload the zipped (semiglobalmatching and proj4_code) as proj4.zip to Colab.
    4. Once you are done executing the Colab notebook, and are satisfied with your visualization, run the final cell, which will generate a file called pth. Download it and make sure you save that file in your proj4_code folder while submitting, as we will evaluate your models performance as a hidden Gradescope test.
    5. Generate the zip folder for the code portion of your submission once youve finished the project using python zip_submission.py gt_username <your_gt_username>

    2 Learning-based stereo matching

    In the previous section, you saw how we can use simple concepts like SAD and SSD to compute matching costs between two patches and produce disparity maps. Now lets try something different instead of using SAD or SSD to measure similarity, we will train a neural network and learn from the data directly.

    Introduction

    Youll implement what has been proposed in the paper [Zbontar & LeCun, 2015], and evaluate how it performs compared to classical cost matching approaches. The paper proposes several network architectures, but what we will be using is the accurate architecture for the Middlebury stereo dataset. This dataset provides a ground truth disparity map for each stereo pair, which means we know exactly where the match is supposed to be on the epipolar line. This allows us to extract many such matches and train the network to identify what type of patches should be matches and what shouldnt. You should definitely read the paper in more details if youre curious about how it works.You dont have to worry about the dataset we provide images in a ready-to-use format (with rectification). In fact, you wont be doing much coding in this part. Rather, you should focus on experimenting and thinking about why. Your report will have a lot of weight in this part, so try to be as clear as possible.Note: The network in Part 2.2.1 can take around 15-30 mins to train on Colab. We suggest you start early and dont wait until the last minute.

    2.1 PyTorch functions on CPU

    In this part, we will implement an MCNET network architecture as described in the paper (See Figure 1), generate patches for the training process, and calculate disparity for MCNET.The corresponding notebook for this part is part2_disparity.ipynb.2.1.1 Network architecture

    MCNET

    We will follow the description of the accurate network for Middlebury dataset. The inputs to the network are 2 image patches, coming from left and right images. Each will pass through a series of convolution + ReLU layers. The extracted features are then concatenated and passed through additional fully connected + ReLU layers. The output is a single real number between 0 and 1, indicating the similarity between the two input images [Zbontar & LeCun, 2015]. In this case, since training from scratch will take a really long time to converge, youll train from our pre-trained network instead. In order to load up the pre-trained network, you must first implement the architecture exactly as described below:Figure 1: Visualization of network architecture.For efficiency we will convolve both input images in the same batch (this means that the input to the network will be 2batch size). After the convolutional layers, we will then reshape them into [batch size,conv out] where conv out is the flattened output size of the convolutional layers. This will then be passed through a series of fully connected layers and finally a sigmoid layer to bound the output value to [0,1].Here is an example of a network with num conv layers = 1 and num fc layers = 2:conv_layers = nn.Sequential( nn.Conv2d(in_channel, num_feature_map, kernel_size=kernel_size, stride=1, padding=( kernel_size // 2)), nn.ReLU(), )fully_connected_layers = nn.Sequential( nn.Linear(conv_out, num_hidden_units), nn.ReLU(), nn.Linear(num_hidden_units, 1), nn.Sigmoid())conv_feature_batch = conv_layers(input_batch) conv_feature_batch.reshape((batch_size, conv_out) output_batch = fully_connected_layers(conv_feature_batch)In part2a_network.py, you will implement the following network architecture:

    • MCNET: Implement the network architecture as described in the paper.

    2.1.2 Patch generationIn part2b_patch.py, you will implement gen_patch() to extract a patch from an image.

    2.1.3 Disparity map calculation with MCNET

    The core logic for calculating disparity for MCNET will remain the same, but we will have to do a few things differently. It will take around 1-2 mins to generate the disparity map if implemented correctly. The steps required here are as follows:

    1. We will operate on convolutional features instead of raw pixels. Pass the images through the convolutional block of MCNET to obtain the features.
    2. Pick a patch in the left image features, P1.
    3. Calculate the search-space of corresponding patches in the right image features:
      • As before, place the patch in the corresponding location in the right image features, and slide it to obtain a sequence of window patches.
      • Concatenate these patches at the 0th dimension to form a batch of patches.
    4. Compute the similarity values over the entire window using the similarity function provided to you. All the similarity values over the window will be present as a (k 1) tensor.
    5. Pick the patch with the minimum similarity error.

    Note: It is important that the similarity calculation happens in parallel over the entire search window. Otherwise, the disparity calculation will take a really long time in the subsequent part.In part2c_disparity.py, you will implement mc_cnn_similarity() and calculate_mccnn_cost_volume() to calculate the disparity value at each pixel using MCNET.Note: Before proceeding to the next part, you need to ensure that all sanity checks for this part are passing by running part2_disparity.ipynb with jupyter notebook, and running pytest proj4_unit_tests

    2.2 Train and evaluation on Google Colab

    In this part, we will train the MCNET architecture and evaluate the overall performance.

    Setup

    We will be using Google Colab, which is a cloud-based Jupyter notebook environment. You can choose to run this section locally as well, especially if you have a good GPU, but the assignment is designed to run on Colab with GPU (this project is doable without a GPU, but a GPU makes the process much faster and frustration free.). These are the steps we follow:

    1. Upload ipynb to Google Colab
    2. Zip semiglobalmatching and proj4_code into zip and upload them to the Colab runtime.
    3. Unzip the uploaded zip using !unzip -qq uploaded_file.zip -d ./
    4. In Colab, make sure you select GPU in the menu (Runtime Change runtime time Hardware accelerator).

    You will need to follow the instructions in Setup, Compute Requirements, and DataLoader in the notebook to download the necessary data and set up the environment in Google Colab.

    2.2.1 Train MCNET

    In this part, we will train a neural network that learns how to classify 2 patches as positive vs negative match. Your task is to train a best network by experimenting with the learning parameters. The following shows the experiments you need to complete for this part:

    • Experiment with the learning rate: try using large (> 1) vs. small (< 1e 5) values. Based on your output visualizations, answer the reflection questions in the report.
    • Experiment with the window size: In the previous part, we use window size of 11 as suggested in the paper, meaning that the input to the network will be patches of size 1111. This corresponds to the block size that will be used when perform stereo matching later on. You can experiment with other window size, namely 55, 99, and 1515 and compare the performance.
    • Tune the training parameters and pick the best combination of hyperparameters with the best disparity map visualization. You should show the training loss plot in the report and answer the reflection questions in the report. Typically, models with average error of around 20 tend to pass the Gradescope tests.

    2.2.2 Evaluate stereo matching

    In this part, we will again generate the disparity map but this time from our newly trained matching cost network. We will use calculate_mc_cnn_disparity from part2c_disparity.py for this.Note that all the required functions in Part 2.1 need to be implemented correctly before starting this part.Hint: You dont have to re-train the network every time you want to evaluate, as long as your saved model is in Colab file system. Dont forget to change load_path to your best model.Then we will evaluate your trained network as a stereo matching cost with the metrics used in the Middlebury leaderboard for stereo matching. For the bicycle image, you should see the improvement in using the trained network vs. SAD cost matching.

    • avgerr: average absolute error in pixels (lower is better)
    • bad1: percentage of bad pixels whose error is > 1 (lower is better)
    • bad2: percentage of bad pixels whose error is > 2 (lower is better)
    • bad4: percentage of bad pixels whose error is > 4 (lower is better)

    Evaluate stereo matching with SGM

    You will use the semi-global matching module in part1 and the calculate_mccnn_cost_volume in part2c_disparity .py to evaluate the disparity map generated by SAD method and MC-CNN model.Based on your outputs, answer the reflection questions in the report.

    3 Writeup

    For this project (and all other projects), you must do a project report using the template slides provided to you. Do not change the order of the slides or remove any slides, as this will affect the grading process on Gradescope and you will be deducted points. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then you will show and discuss the results of your algorithm. The template slides provide guidance for what you should include in your report. A good writeup doesnt just show resultsit tries to draw some conclusions from the experiments. You must convert the slide deck into a PDF for your submission.

    Testing

    We have provided a set of tests for you to evaluate your implementation. We have included tests inside part1_simple_stereo.ipynb and part2_disparity.ipynb so you can check your progress as you implement each section. Your grade on the coding portion of the project will be further evaluated with a set of tests not provided to you.

    Bells & whistles (extra points)

    Also note that while were closely following the proposed matching cost network from the paper, were still skipping several bells and whistles post-processing components used in the paper, so the results are still far from perfect. You are free to add any components you think would be useful (doesnt have to be from the paper). There is a maximum of 10 pts extra credit for any interesting experiments beyond the outline we have. Be creative, and be clear and concise about what extra things you did in the report. Here are some starting points:

    • Data augmentation for training: Rather than cropping patches directly, you can augment the data with slight rotation/affine transformations to make it more robust to noise and perspective change. Be careful not to transform too much that we lose the precision of the match. To earn extra credit, you must explain the augmentation you do, and show the improvement with adding it.
    • Experiment with training from scratch: Get great performance by training from scratch using datasets like KITTI2015.

    If you choose to do anything extra, include your code implementation in proj4_code/extra_credit.py, and add slides after the slides given in the template deck to describe your implementation, results, and analysis. Adding slides in between the report template will cause issues with Gradescope, and you will be deducted points. You will not receive full credit for your extra credit implementations if they are not described adequately in your writeup.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CS6476 Project4-Depth Estimation Using Stereo
$25