Cellphone camera
S
c
r
e
e
n
Camera obscura Pinhole
(a) Camera obscura (b) Example design
Figure 1: You will design a camera obscura with your cellphone.
In this assignment, you will design a camera obscura with your cellphone camera. The
camera obscura is a dark chamber (box) with a small pinhole where light is mapped to
the other side of the chamber (screen). This creates an upside-down image.
1. Build a lightproof dark chamber: the chamber will be only illuminated by the
light from the pinhole. Play with different size of chambers if possible. Cover the
screen with a white paper and the rest of insider surfaces with black papers. Be
creative!
2. Make a pinhole on the other side of the screen. Start with a small hole (diameter<1mm)
and adjust the size of the hole to get more light. Trade-off is that the bigger hole,
the brighter image but blurrier.
3. Set the camera focal length to the manual mode (AF→MF) and adjust the focal
length to see an object at the distance between the pinhole and the chamber
screen.
4. Set the camera sensitivity (>ISO 800).
5. Set the camera exposure time (>8 sec).
6. Make an additional hole where your cellphone camera can look inside. Locate
your cellphone camera close to the pinhole without occluding pinhole. Make sure
this hole is completely light sealed.
7. Take a picture and adjust the pinhole size and camera settings to make better
sharp image.
For an Android phone, you can control the exposure time and sensitivity easily. Camera
FV-5 Lite is an app to grab a long exposure image. For iOS, the exposure control is
highly limited. There are apps that simulate the long exposure effect by taking many
images. This creates a noisy image. You may borrow an Android phone or old digital
camera. You can also refer to the website: https://graphics.cs.cmu.edu/courses/
15-463/2015_fall/hw/proj5-camera/.
2
Note: Lighting is extremely important. Given Minnesota’s weather, it is difficult to
find a nice cloudless day. Plan outside data capture on sunny day ahead.
(1) Describe your design (dimension) with images and your camera setting. Share your
awesome photos.
3 Where am I?
(a) Lateral view (b) Camera obscura image (c) Original image
Figure 2: You will your camera obscura to estimate depth.
Using this camera obscura, you will estimate the depth of a 3D object (from pinhole).
You will take a picture containing your two friends (A and B) whom you know their
height in meter where they will stand at different distance from the camera as show in
Figure 2(a).
(1) Given the height of A in meter (HA) and pixel (hA), derive and compute the distance
from A to the pinhole.
(2) Given the height of B in meter (HB) and pixel (hB), derive and compute the distance
from B to the pinhole.
Note: You can measure pixel distance using an image viewer software, e.g., irfanview,
or MATLAB.
(a) Geometry (b) Image cropping
Figure 3: You will your camera obscura to estimate depth.
4 Dolly Zoom
You will simulate the Dolly zoom effect using your camera obscura. You will take
at least two pictures with different camera locations, e.g., taking 5 step back, ∆d, as
shown in Figure 3(a). A and B will appear smaller than the first image as ∆d >
0. You can apply the zoom-in effect by scaling and cropping the image such that
A appears the same as shown in Figure 3(b). You may find a reference from here:
https://www-users.cs.umn.edu/~hspark/CSci5980/Lec1_Supp_DollyZoom.pdf.
Write-up:
(1) Predict the height of B in the second image given hB in the first image. Reason
about the prediction. Hint: You may need to compute ∆d with the information in the
second image.
(2) Confirm the prediction by measuring the height of B in the second image.
Useful MATLAB functions:
(a) Load image: im = imread(filename).
(b) Save image: imwrite(im, filename).
(c) Resize image: im = imresize(im, scale).
(d) Display image: imshow(im) or imshow(filename).
(e) Measure the distance in image: display the image in a figure, select the data cursor
icon in the figure toolbar, and click on the image to get the coordinates of certain pixels.
Calculate the distance with the pixel coordinates.
Orginal image Undistorted image k > 0 k < 0
Figure 1: Lens distortion recification.
Using your cellphone camera, you will estimate intrinsic camera parameters (focal
length, principal points, and lens distortion parameter).
(1) Derive K matrix using your camera specification.
(2) Calibrate lens distortion using the radial distortion model (single parameter, k1).
Show the original image, undistorted image, and two image examples of wrong k1 as
shown in Figure 1.
2
3 Projective Line
(a) Image (b) UMN logo
Figure 2: (a) You will use your cellphone camera image to compute camera poses with
respect to the ground plane and measure the height of your friend given the height of
an object. (b) You will paste the UMN logo to the ground plane.
Take a picture of your friend with many 3D objects such as street lamps and chair where
two orthogonal directions on the ground plane are visible as shown in Figure 2(a). Apply
lens undistortion to make the straight lines straight.
(1) Derive and compute two vanishing points and a vanishing line, and visualize them
on your image similar to Figure 2(a).
(2) Compute camera rotation matrix, R, and visualize 3D camera axes with respect to
the ground plane axes using MATLAB plot3 function. Give a geometric interpretation
of the computed rotation matrix.
(3) Measure the heights of at least 3D objects given your friend’s height using the cross
ratio. Verify the height measurements.
(4) Project UMN logo onto the ground plane or any planar surface. You are also free
to choose different logo or image.
3
4 Panoramic Image
(a) Input images
h
,
x
h y
z
p
p
p
φ
=
p
Camera center
Cylindrical surface
φ
(b) Geometry
h
φ
(c) Cylindrical coordinate
(d) Panoramic image
Figure 3: Given a collection of input images, you will create a panoramic images by
projecting onto a cylindrical surface.
4
You will create a panoramic image from multiple images (at least 8 images) taken
by your cellphone camera using a cylindrical projection as shown in Figure 3(a). The
panoramic image will be created in (φ, h) where φ and h are angle and height coordinate
of the cylindrical surface, respectively, as shown in Figure 3(c). Note that the radius and
height of the cylinder are set to the focal length and height of the image, respectively.
(1) Express the direction vector pφ,h =
px py pz
T
using φ and h as shown in
Figure 3(c) and 3(b).
(2) Given the first and second images, compute homography, 2H1 using 4 correspondences and relative rotation from first to second, 2R1 where the first image rotation is
the identity matrix I3.
λu2 = Hu1
µu2 = K2R1pφ,h (1)
where u1 ↔ u2 is the corresponding points in the first and second image.
(3) For all images, compute the rotation matrix, iR1, and visualize the camera Z axis
in 3D using MATLAB plot3 function. Hint:
iR1 =i Ri−1
i−1Ri−2 · · ·
2 R1.
(4) Create a panoramic image by copying RGB value of original image coordinate (u, v)
to the cylindrical coordinate (φ, h) as shown in Figure 3(d). You may blend overlapping
pixels by taking average.
1. Rotate your camera about fixed rotation center (no translation). Translation of
your camera produces mis-alignment.
2. Choose 4 correspondences very carefully.
3. Lens distortion may introduce mis-alignment.
4. Objects at far distance often work better.
(a) Vanishing points (b) Checkerboard pattern
Figure 1: (a) You will use the vanishing points in a single image to calibrate your
cameras. (b) You will use multiple images of checkerboard patterns to calibrate your
camera.
You will calibrate your camera (focal length, principal points, and lens distortion parameters).
(1) (Single Image Calibration) Take a single image where you can estimate three vanishing points vX inf, vY inf, and vZ inf as shown in Figure 3(c). Compute f, px, and py
using the vanishing points.
(2) (Multiple Image Calibration) Print out the checkerboard pattern as shown in Figure 1(b) (https://www-users.cs.umn.edu/~hspark/CSci5980/code/pattern.pdf). Derive f, px, and py based on homographies from multiple images (at least 5 images).
Compute the intrinsic parameters by solving the least squares system.
(3) (MATLAB Toolbox Calibration) Use the existing MATLAB Calibration Toolbox
(https://www.vision.caltech.edu/bouguetj/calib_doc/) to compute the intrinsic
parameters (at least 20 images from different views). Verify the estimate with your
solutions in (1) and (2).
3 Rotation Interpolation
(a) Left image (b) Right image
(c) Interpolated View
Figure 2: Using two images of pure rotation (a and b), you will generate interpolated
view by sampling rotation (c)
In HW 2, we generate a panoramic image from multiple images. In this problem, you
will use two images of pure rotation (Figure 2(a) and 2(b)) to generate many views by
interpolating between rotations as shown in Figure 2(c).
(1) Compute the pure rotation, RR
L
from the homography between two images, RHL.
(2) Interpolate between two rotations, {
iRL}
N
i=1, from I3 (Left image) to R (Right
image) where N > 20 is the number of interpolated rotations using SLERP.
(3) Generate interpolated images similar to Figure 2(c).
• Given iRL, compute homography from Left image to the interpolated image, iHL
and warp it, im1 = ImageWarping(imageLeft, iHL).
• Compute homography from Right image to the interpolated image, i.e., iHR =i
HR
LH−1
L
. Warp image, im2 = ImageWarping(imageRight, iHR).
• Composite two images based on distance, imInterpolated = w*im1+(1-w)*im2
where w is weight, inversely proportional to the distance.
3
4 Tour into Your Picture
(a) Source image (b) Rectified image
u11 u12
u21 u22
v11 v12
v21 v22
p
(c) Vanishing point (d) 3D reconstruction
(e) Tour into picture
Figure 3: (a) Given a single image taken by a cellphone, you will navigate into the
image. (b) You rectify the image such that the Y axis of the camera aligns with the
surface normal of the ground plane. (c) You define the five principal planes using
vanishing points. (d) A 3D box can be texture mapped.
Take an photo with your camera of a scene where you can observe the Z vanishing
point. You will virtually tour into your photo that creates 3D sensation.
(1) Rectification: Rectify your image such that the surface normal of the ground plane
is aligned with the Y axis of your camera using the homography between ground plane
and the camera. Derive and compute the homography for the rectification. Visualize
4
rectified image similar to Figure 3(b).
(2) Z vanishing point: Given the rectified image, derive and compute the Z vanishing
point, p (Figure 3(c)) using the four edges from the ground and ceiling planes using
linear least squares (the intersection of four lines), i.e., Ax = b.
1. Define the four corners (u11, u12, u21, u22) of the purple plane (Figure 3(c)) in
the image.
2. Define the depth of the four corner, e.g., d=1 and compute the 3D locations of
the corners, U = dK−1u.
3. Express the 3D locations of frontal plane, V11, V12, V21, and V22 using d, p, K,
u11, u12, u21, u22, v11, v12, v21, v22.
4. Express a 3D point, X, in the 3D plane (U11, U12, U21, and U22), using two
parameters, e.g., X = a1µ1 + a2µ2 + a3. Apply this for the rest four planes.
1. Derive and compute the homography per plane to the rectified image, i.e., sHplane1,
sHplane2,
sHplane3,
sHplane4,
sHplane5.
Hint: λu = KX = K
a1 a2 a3
µ1
µ2
1
= sHplane
µ1
µ2
1
.
2. Derive and compute the homography per plane to a target image of pure rotation
about Y axis of the camera with 20 degree (tHplane1,
tHplane2,
tHplane3,
tHplane4,
tHplane5). Derive and compute the homography from the rectified image to the
target image. Hint:
tHs =t Hplane
sH−1
plane.
3. Derive and compute the homography per plane to a target image of pure translation along Z axis of the camera with 0.2d. Derive and compute the homography
from the rectified image to the target image.
(4) Generate camera trajectory by interpolating rotation and translation (at least 5
discrete rotation and translation) and generate a video of navigation.
5
One of key skills to learn in computer vision is the ability to use other open source
code, which allow you not to re-invent the wheel. We will use VLFeat by A. Vedaldi
and B. Fulkerson (2008) for SIFT extraction given your images. Install VLFeat from
following:
https://www.vlfeat.org/install-matlab.html
Run vl demo sift basic to double check the installation is completed.
(NOTE) You will use this library only for SIFT feature extraction and its visualization.
All following visualizations and algorithms must be done by your code.
Figure 1: Given your two cellphone images (left and right), you will extract SIFT
descriptors and visualize them using VLFeat.
You will extract David Lowe’s SIFT (Scale Invariant Feature Transform) features from
your cellphone images as shown in Figure 1. First, take a pair of pictures with your
calibrated camera (intrinsic parameter, K, and radial distortion parameter, k, are precalibrated.) as follow:
1. Take the first picture and another one after moving one step right (1m).
2. Common 3D objects across different depths, e,g., buildings, ground plane, and
tree, appear in both images.
3. Two images have to be similar enough so that the appearance based image matching can be applied, i.e., SIFT feature does not break, while maintaining the baseline between cameras (at least 1 m), i.e., similar camera orientation and sufficient
translation.
4. Avoid a 3D scene dominated by one planar surface, e.g., looking at ground plane.
(SIFT visualization) Use VLFeat to visualize SIFT features with scale and orientation
as shown in Figure 1. You may want to plot up to 500 feature points. You may want
to follow the following tutorial:
https://www.vlfeat.org/overview/sift.html
(a) Matching from I1 to I2 (b) Matching from I2 to I1
(c) Matching from I1 to I2 after ratio test (d) Matching from I2 to I1 after ratio test
(e) Bidirectional matching between I1 and I2
Figure 2: You will match points between I1 and I2 using SIFT features.
(NOTE) From this point, you cannot use any function provided by VLFeat.
The SIFT is composed of scale, orientation, and 128 dimensional local feature descriptor
(integer), f ∈ Z128. You will use the SIFT features to match between two images, I1
and I2.
(1) (Nearest neighbor search) Let two sets of features be {f1, · · · ,fN1 } from I1 and
{g1, · · · , gN2 } from I2 where N1 and N2 are the number of features in image 1 and
2, respectively. Compute nearest neighbor per feature and visualize ({f1, · · · ,fN1 } →
{g1, · · · , gN2 } and {g1, · · · , gN2 } → {f1, · · · ,fN1 }) as shown in Figure 2(a) and Figure 2(b). Note that the distance between two features is defined as d = kf − gk. You
may use knnsearch function in MATLAB.
(2) (Ratio test) Filter out matches using the ratio test, i.e., keep the match if dij1 /dij2 <
0.7 and discard otherwise, where dij1 and dij2 are the first and second nearest neighbors
for the i
th feature, respectively. Visualize the matches after the ratio test as shown
Figure 2(d) and Figure 2(c).
(3) (Bidirectional match) Visualize bidirectionally consistent matches as shown Figure 2(e). Compare the number of matches from (1) to (3).
Figure 3: You will visualize epipole and epipolar lines for each image.
Compute a fundamental matrix between I1 and I2.
(1) (Fundamental matrix) Complete the following function to compute a fundamental
matrix, linearly:
F = ComputeFundamentalMatrix(u, v)
Input: u and v are Nf ×2 matrices of 2D correspondences where the Nf is the number
of 2D correspondences, u ↔ v.
Output: F ∈ R3×3
is a rank 2 fundamental matrix.
(2) (Epipole and epipolar line) Pick 8 random correspondences, ur ↔ vr, compute the
fundamental matrix, Fr and visualize epipole and epipolar lines for the rest of feature
points in both images as shown in Figure 3. Pick different sets of correspondences and
visualize different epipolar lines.
Estimate the fundamental matrix using RANSAC.
(1) (RANSAC with fundamental matrix) Write a RANSAC algorithm for the fundamental matrix estimation given N matches from Section 4 using the following pseudo
code:
Algorithm 1 GetInliersRANSAC
1: n ← 0
2: for i = 1 : M do
3: Choose 8 correspondences, ur and vr, randomly from u and v.
4: Fr = ComputeFundamentalMatrix(ur, vr)
5: Compute the number of inliers, nr, with respect to F.
6: if nr > n then
7: n ← nr
8: F = Fr
9: end if
10: end for
(2) (Epipole and epipolar line) Using the fundamental matrix, visualize epipole and
epipolar lines.
7
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2
0
0.8
0.6
0.4
0.2
0
0.5
0.2
0
0
-0.2
-0.5
-0.4
-1
-0.6
-1.5 0.5
0
-0.5
0
0
0.2
-0.2
0.4
0 0.2 0.4 0.6 0.8 1 -1 -0.5 0 0.5
0.2
0
-0.2
-0.4
-1.5
-0.6
0
Figure 4: Four configurations of camera pose from a fundamental matrix.
(3) (Camera pose estimation) Compute 4 configurations of relative camera poses:
[R1 C1 R2 C2 R3 C3 R4 C4] = CameraPose(F, K)
Input: F is the fundamental matrix and K is the intrinsic parameter.
Output: R1 C1 · · · R4 C4 are rotation and camera center (represented in the world
coordinate system).
(4) Visualize the 4 configuration in 3D as shown in Figure 4.
8
Given four configurations of relative camera pose, you will find the best camera pose
by verifying through 3D point triangulation.
(1) (Linear triangulation) Write a code that computes the 3D point given the correspondence, u ↔ v, and two camera projection matrices:
[X] = LinearTriangulation(P1,u,P2,v)
Input: P1, P2 ∈ R3×4 are two camera projection matrices, and u ↔ v ∈ R2 are their
2D correspondence.
Output: X ∈ R3
is the triangulated 3D point.
Hint: Use the triangulation method by linear solve, i.e.,
(2) (Cheirality) Write a code that computes the number of 3D points in front of two
cameras. The condition of a 3D point being in front of camera is called cheirality:
idx = CheckCheirality(Y,C1,R1,C2,R2)
Input: Y is a n × 3 matrix that includes n 3D points, and C1,R1/C2,R2 are the first
and second camera poses (camera center and rotation).
Output: idx is a set of indices of 3D points that satisfy cheirality.
Hint: the point must satisfy r3(X − C) > 0 where r3 is the 3rd row of the rotation
matrix (z-axis of the camera), C is the camera position and X is the 3D point.
9
(3) (Camera pose disambiguation) Based on cheirality, find the correct camera pose.
Visualize 3D camera pose and 3D points together as shown in Figure 5.
Hint: Use plot3 and DisplayCamera.m to visualize them.
0 50 100
0
-20
-50
-40
-60
-80
0
0
-100
20
-50
40
60
0
80
50
40
20
40 60 80
0
-20 0 20
-20
-80 -60 -40 0
-40
-20
0
20
0-80 -60 -40 -20 0 20 40 60
(a) nValid = 10 (b) nValid = 488
(c) nValid = 0 (d) nValid = 0
Figure 5: You will visualize four camera pose configurations with point cloud.
(4) (Reprojection) Project 3D points to each camera and visualize reprojection, ub and
measurement, u onto the image as shown in Figure 6, i.e.,
λ
ub
1
= KR
I −C
X
1
where X is the reconstructed point.
Reprojection
Measurement
Reprojection
Measurement
Figure 6: Visualization of measurements and reprojection.
Figure 1: You will capture your cellphone images to reconstruct camera pose and 3D
points.
In this assignment, you will use your cellphone images (more than 5) to reconstruct
3D camera poses and points with full bundle adjustment. Make sure you have enough
baseline (translation) between images for well conditioned fundamental matrix while
retaining enough number of correspondences between image. Avoid a scene dominated
by a planar surface, i.e., the images need to contain many 3D objects as shown in
Figure 1.
You will write a full pipeline of the structure from motion algorithm including matching, camera pose estimation using fundamental matrix, PnP, triangulation, and bundle
adjustment. A nonlinear optimization is always followed by the initial estimate by
linear least squares solution. The pipeline is described in Algorithm 1.
Algorithm 1 Structure from Motion
1: [Mx, My] = GetMatches(I1, · · · , IN )
2: Normalize coordinate in Mx and My, i.e., x = K−1x.
3: Select two images Ii1 and Ii2 for the initial pair reconstruction.
4: [R, C, X] = CameraPoseEstimation([Mx(:,i1) My(:,i1)], [Mx(:,i2) My(:,i2)])
5: P = {P1,P2} where Pi1 =
I3 0
, Pi2 = R
I3 −C
6: R = {i1, i2}
7: while |R| < N do
8: i = GetBestFrame(Mx, My, R);
9: [Ri
, Ci
] = PnP RANSAC([Mx(:,i) My(:,i)], X)
10: [Ri
, Ci
] = PnP Nonlinear(Ri Ci
, [Mx(:,i) My(:,i)], X)
11: Pi = Ri
I3 −Ci
12: for f = 1 : |R| do
13: U = FindUnreconstructedPoints(X, Rf , i, Mx, My)
14: for j = 1 : |U| do
15: u = [Mx(Uj
, i), My(Uj
, i)] and v = [Mx(Uj
, Rf ), My(Uj
, Rf )]
16: x = LinearTriangulation(u, Pi
, v, PRf
)
17: x = NonlinearTriangulation(X, u, Ri
, Ci
, v, RRf
, CRf
)
18: X = X ∪ x
19: end for
20: end for
21: P = P ∪ Pi and R = R ∪ i.
22: [P, X] = BundleAdjustment(P, X, R, Mx, My)
23: end while
Given a set of images, I1, · · · , IN , you will find matches across all images where N is
the number of images similar to HW #4. Pick a reference image, Iref , and match with
other images using SIFT features from VLFeat, i.e., Iref ↔ I1, · · · , Iref ↔ IN (no need
to match Iref ↔ Iref ).
Your matches are outlier free, i.e., bidirectional knn match → ratio test → inliers from
the fundamental matrix based RANSAC. Based on the matches, you will build a measurement matrix, Mx and My:
[Mx, My] = GetMatches(I1, · · · , IN )
Mx: F×N matrix storing x coordinate of correspondences
My: F×N matrix storing y coordinate of correspondences
The f
th feature point in image Ii corresponds to a point in image Ij
. The x and y
coordinates of the correspondence is stored at (f, i) and (f, j) elements in Mx and My,
respectively. If (f, i) does not correspond to any point in image Ik, you set -1 to indicate
no match as shown in Figure 2.
Important: For better numerical stability, you can transform the measurements to the
normalized coordinate by multiplying K−1
, i.e., x = K−1x where x is 2D measured
points in homogeneous coordinate. You can run structure from motion in the normalized coordinate by factoring out K. When visualizing projection in the image, the
coordinate needs to be transformed back to original coordinate by multiplying K.
F
N
f ,i x f , j f x
i j
Mx
N
f ,i y f , j f y
i j
My
( xf ,i , y f ,i) ↔ ( xf , j , y f , j) ↔ ( xf ,k , y f ,k )
−1
k
−1
k
Figure 2: The f
th feature point in image Ii corresponds to a point in image Ij
. The
x and y coordinates of the correspondence is stored at (f, i) and (f, j) elements in Mx
and My, respectively. If (f, i) does not correspond to any point in image Ik, you set -1
to indicate no match.
4
You will write a camera pose estimation code that takes correspondences between two
images, Ii1 and Ii2 where i1 and i2 are the indices of the initial images to reconstruct
selected manually.
[R, C, X] = CameraPoseEstimation(u1, u2)
R and C: the relative transformation of the i2 image
u1 and u2: 2D-2D correspondences
As studied in HW #4, you will compute:
1. Fundamental matrix via RANSAC on correspondences, Mx(:,i1), My(:,i2)
2. Essential matrix from the fundamental matrix
3. Four configurations of camera poses given the essential matrix
4. Disambiguation via chierality (using 3D point linear triangulation):
X = LinearTriangulation(u, Pi
, v, Pj )
Write-up:
(a) Inlier matches
Top view Oblique view
(b) 3D camera pose
Figure 3: Camera pose estimation.
(1) Visualize inlier matches as shown in Figure 3(a).
(2) Visualize camera pose and 3D reconstructed points in 3D as shown in Figure 3(b).
5
You will write a nonlinear triangulation code. Given the linear estimate for the point
triangulation, X, you will refine the 3D point X =
X Y Z T
to minimize geometric
error (reprojection error) via iterative nonlinear least squares estimation,
∆X =
∂f(X)
∂X
T
∂f(X)
∂X
!−1
∂f(X)
∂X
T
(b − f(X)). (1)
Write-up:
(1) Derive the point Jacobian, i.e., ∂f(X)j
∂X
and write the following code.
df dX = JacobianX(K, R, C, X)
(2) Write a code to refine the 3D point by minimizing the reprojection error and visualize reprojection error reduction similar to Figure 5.
X = NonlinearTriangulation(X, u1, R1, C1, u2, R2, C2)
Algorithm 2 Nonlinear Point Refinement
1: b =
u
T
1 u
T
2
T
2: for j = 1 : nIters do
3: Build point Jacobian, ∂f(X)j
∂X
.
4: Compute f(X).
5: ∆X =
∂f(X)
∂X
T ∂f(X)
∂X + λI
−1
∂f(X)
∂X
T
(b − f(X))
6: X = X + ∆X
7: end for
You will register an additional image, Ij using 2D-3D correspondences.
Write-up:
(1) (3D-2D correspondences) Given 3D triangulated points, find 2D-3D matches, X ↔
u.
(2) (Perspective-n-Point algorithm) Write a code that computes 3D camera pose from
3D-2D correspondences:
[R, C] = LinearPnP(u, X)
X: n × 3 matrix containing n 3D reconstructed points
u: n × 2 matrix containing n 2D points in the additional image I3
R and C: rotation and translation for the additional image.
Hint: After the linear solve, rectify the rotation matrix such that det(R) = 1 and scale
C according to the rectification.
(3) (RANSAC PnP) Write a RANSAC algorithm for the camera pose registration (PnP)
given n matches using the following pseudo code:
Algorithm 3 PnP RANSAC
1: nInliers ← 0
2: for i = 1 : M do
3: Choose 6 correspondences, Xr and ur, randomly from X and u.
4: [Rr, tr] = LinearPnP(ur, Xr)
5: Compute the number of inliers, nr, with respect to Rr, tr.
6: if nr > nInliers then
7: nInliers ← nr
8: R = Rr and t = tr
9: end if
10: end for
Visualize 3D registered pose as shown in Figure 4.
-40
-20
0
20
(a) Front view
0
10
20
30
40
50
60
0 -80 -60 -40 -20 0 20 40
(b) Top view
Figure 4: Additional image registration.
(4) (Reprojection) Visualize measurement and reprojection to verify the solution.
Given the initial estimate Ri and ti
, you will refine the camera pose to minimize
geometric error (reprojection error) via iterative nonlinear least squares estimation,
∆p =
∂f(p)
∂p
T
∂f(p)
∂p
!−1
∂f(p)
∂p
T
(b − f(p)),
f(p) =
u1/w1
v1/w1
.
.
.
un/wn
vn/wn
,
u
v
w
= Ri
I3 −C
X
1
, b =
x1
y1
.
.
.
xn
yn
(2)
where p =
CT q
T
T
. C ∈ R3
is the camera optical center and q ∈ S
3
is the
quaternion representation of the camera rotation.
It is possible to minimize the overshooting by adding damping, λ as follows:
∆p =
∂f(p)
∂p
T
∂f(p)
∂p
+ λI
!−1
∂f(p)
∂p
T
(b − f(p)), (3)
where λ is the damping parameter. You can try λ ∈ [0, 10].
Note that the conversion between quaternion and rotation matrix is given as follows:
R =
1 − 2q
2
z − 2q
2
y −2qzqw + 2qyqx 2qyqw + 2qzqx
2qxqy + 2qwqz 1 − 2q
2
z − 2q
2
x
2qzqy − 2qxqw
2qxqz − 2qwqy 2qyqz + 2qwqx 1 − 2q
2
y − 2q
2
x
,
q =
qw
(R32 − R23)/(4qw)
(R13 − R31)/(4qw)
(R21 − R12)/(4qw)
, where qw =
√
1 + R11 + R22 + R33
2
, kqk = 1 (4)
9
Write-up:
(1) Derive the quaternion Jacobian to rotation using Equation (4), i.e., ∂R
∂q
and write
the following code. Note: ignore the normalization kqk = 1.
dR dq = JacobianQ(q)
(2) Derive the rotation Jacobian to projection using Equation (2), i.e., ∂f(p)j
∂R where
f(p)j =
uj/wj vj/wj
T
and write the following code. Note: use vectorized form of
the rotation matrix.
df dR = JacobianR(R, C, X)
(3) Derive the expression of ∂f(p)j
∂q
using the chain rule.
(4) Derive the camera center Jacobian to projection using Equation (2), i.e., ∂f(p)j
∂C
and
write the following code.
df dC = JacobianC(R, C, X)
(5) Write a code to refine the camera pose by minimizing the reprojection error and
visualize reprojection error reduction as shown in Figure 5:
[R, C] = PnP Nonlinear(R C, u, X)
Algorithm 4 Nonlinear Camera Pose Refinement
1: p =
CT q
T
T
2: for j = 1 : nIters do
3: C = p1:3, R=Quaternion2Rotation(q), q = p4:7
4: Build camera pose Jacobian for all points, ∂f(p)j
∂p =
h
∂f(p)j
∂C
∂f(p)j
∂q
i
.
5: Compute f(p).
6: ∆p =
∂f(p)
∂p
T ∂f(p)
∂p + λI
−1
∂f(p)
∂p
T
(b − f(p)) using Equation (3).
7: p = p + ∆p
8: Normalize the quaternion scale, p4:7 = p4:7/kp4:7k.
9: end for
10
Measurement
Linear estimate (reproj: 0.199104)
Nonlinear estimate (reproj: 0.119272)
(a) Reprojection error
Measurement
Linear estimate (reproj: 0.199104)
Nonlinear estimate (reproj: 0.119272)
(b) Close up
Figure 5: Nonlinear refinement reduces the reprojection error (0.19→0.11).
8 Bundle Adjustment
You will write a nonlinear refinement code that simultaneously optimizes camera poses
and 3D points using the sparse nature of the Jacobian matrix.
[P, X] = BundleAdjustient(P, X, R, Mx, My)
For example, consider 3 camera poses and 2 points. The Jacobian matrix can be written
as follows:
J =
∂f(p1,X1)
∂p1
02×7 02×7
∂f(p1,X1)
∂X1
02×3
02×7
∂f(p2,X1)
∂p2
02×7
∂f(p2,X1)
∂X1
02×3
02×7 02×7
∂f(p3,X1)
∂p3
∂f(p3,X1)
∂X1
02×3
∂f(p1,X2)
∂p1
02×7 02×7 02×3
∂f(p1,X2)
∂X2
02×7
∂f(p2,X2)
∂p2
02×7 02×3
∂f(p2,X2)
∂X2
02×7 02×7
∂f(p3,X2)
∂p3
02×3
∂f(p3,X2)
∂X2
=
Jp JX
(5)
where Jp and JX are the Jacobian for camera and point, respectively, and λ ∈ [0, 10].
The normal equation, J
TJ∆x = J
T
(b − f(x)) can be decomposed into:
A B
BT D
∆pb
∆Xb
=
ep
eX
, (6)
where
A = J
T
pJp + λI, B = J
T
pJX, D = J
T
XJX + λI
ep = J
T
p
(b − f(x)), eX = J
T
X(b − f(x))
where pb =
p
T
1
· · · p
T
I
and Xb =
XT
1
· · · XT
M
where I and M are the number
of images and points, respectively.
11
The decomposed normal equation in Equation (6) allows us to efficiently compute the
inverse of J
TJ using Schur complement of D:
∆pb = (A − BD−1B
T
)
−1
(ep − BD−1
eX),
∆Xb = D−1
(eX − B
T∆pb)
where D is a block diagonal matrix whose inverse can be efficiently computed by inverting small block matrix:
D =
d1
.
.
.
dM
, D−1 =
d
−1
1
.
.
.
d
−1
M
(7)
The bundle adjustment algorithm is summarized in Algorithm 5. Note that not all
points are visible from cameras. You need to reason about the visibility, i.e., if the
point is not visible from the camera, the corresponding Jacobian and measurement
from J and b will be omitted, respectively.
12
Algorithm 5 Bundle Adjustment
1: pb =
p
T
1
· · · p
T
I
T
and Xb =
XT
1
· · · XT
M
2: for iter = 1 : nIters do
3: Empty Jp, JX, b, f, Dinv.
4: for i = 1 : M do
5: d = 03×3
6: for j = 1 : I do
7: if the i
th point is visible from the j
th image then
8: J1 = 02×7I and J2 = 02×3M
9: J1(:, 7(j − 1) + 1 : 7j) = ∂f(pj ,Xi)
∂pj
10: J2(:, 3(i − 1) + 1 : 3i) = ∂f(pj ,Xi)
∂Xi
11: Jp =
J
T
p J
T
1
T
and JX =
J
T
X J
T
2
T
12: d = d +
∂f(pj ,Xi)
∂Xi
T ∂f(pj ,Xi)
∂Xi
13: b =
b
T u
T
ij
14: f =
f
T x
T
ij
where µ
xij
1
= Rj
I −Cj
15: end if
16: end for
17: d = d + λI
18: Dinv = blkdiag(Dinv, d
−1
)
19: end for
20: ep = J
T
p
(b − f)
21: eX = J
T
X(b − f)
22: A = J
T
pJp + λI, B = J
T
pJX, D−1 = Dinv
23: ∆pb = (A − BD−1BT
)
−1
(ep − BD−1eX)
24: Normalize quaternions.
25: ∆Xb = D−1
(eX − BT∆pb)
26: end for
Write-up: You will first start with two images and 10 3D points to test your bundle
adjustment program.
(1) Derive Jp and JX.
(2) Run Algorithm 5 and visualize the reprojection error similar to Figure 5.
13
9 Putting All Things Together
Write-up: You will run with all images and 3D points based on Algorithm 1.
(1) Visualize 3D camera pose and points as shown in Figure 6.
(2) Visualize reprojection for all images.
Top view Oblique view
Figure 6: You will reconstruct all images and 3D points using structure from motion.
Reviews
There are no reviews yet.