[Solved] TTK4255 Midterm Project

$25

File Name: TTK4255_Midterm_Project.zip
File Size: 216.66 KB

SKU: [Solved] TTK4255 Midterm Project Category: Tag:
5/5 - (1 vote)

PredictedFigure 1: Results for a single image, showing the estimated coordinate frames, in addition to theobserved and predicted location of the markers. The markers here are called AprilTags; these canbe robustly detected thanks to their high contrast and error-tolerant coding system. Each marker iscoded with a unique ID, which is used to establish 2D-3D correspondences.Midterm project TTK4255: Robotic Vision (2021)

About the projectYou should be familiar with the Quanser 3-DOF Helicopter from Homework 3. As you will recall, thehelicopter consists of an arm and a rotor carriage with three rotational degrees of freedom: Yaw (): Rotation around an axis perpendicular to the mounting platform. Pitch (): Rotation of the arm up or down. Roll (): Rotation of the rotor carriage around the arm.In this project you will estimate these angles in a pre-recorded image sequence of one of the helicoptersin the lab at ITK. During recording, the helicopter was augmented with fiducial markers, which makesit easy to establish 2D-3D point correspondences. However, instead of deriving a linear algorithm toestimate the angles, you will use the more versatile approach of directly minimizing the reprojectionerror between the markers predicted and detected image locations.To minimize the reprojection error you will implement the Gauss-Newton and Levenberg-Marquardtmethods, which are frequently used in the robotic vision community. In Part 2 and Part 3 you willfurther investigate the use of these algorithms to estimate the pose of the camera and to calibrate thehelicopter model itself.

You do not need to include any text, derivation or figure that is not explicitly requested for, nor willyou receive any points for it. Keep the report simple and provide only what is asked for. You can writethe report using any convenient tool, as long as it contains the requested answers and the name of whoyou worked with (if you did not work alone). Also upload a zip archive of all your code.ResourcesNon-linear least squares is used in Szeliski 6 (Feature-based alignment) and 7 (Structure from motion),but these chapters can be hard to follow without a background in numerical optimization. A summaryof the necessary optimization theory is therefore given in Part 1. Alternatively, the booklet by Madsen,Nielsen and Tingleff (available on Blackboard) is a good introduction to non-linear least squares.I Madsen, Nielsen, and Tingleff. Methods for non-linear least squares problems. 2004.A further overview of vision-related applications of non-linear least squares, along with some moreadvanced topics, can be found in Hartley and Zisserman Appendix 6 (available on Blackboard).Provided data and codeYou will do all the tasks on the following dataset included in the zip: video0000.jpgvideo0350.jpg: Undistorted image sequence. K.txt: Camera intrinsic matrix K. heli_points.txt: Homogeneous 3D coordinates of markers in the helicopter model. platform_to_camera.txt: Transformation matrix Tcameraplatform. detections.txt: Marker detections. The jth row corresponds to the jth image, and contains7 tuples of the form (mi, ui, vi), where miis 1 if marker i was detected and 0 otherwise, and(ui, vi) is the markers pixel coordinates. logs.txt: Recorded angles from motor encoders. Each row contains a timestamp (in seconds),yaw, pitch and roll. The logs have been time-synchronized with the images. platform_corners_metric.txt: Metric coordinates of four points on the platform (Part 2). platform_corners_image.txt: Measured pixel coordinates of the above points (Part 2).Note that the image locations of the markers have been extracted for you. Additionally, the images havehad lens distortion removed, such that they satisfy a pinhole camera model with the provided intrinsicmatrix K. The zip contains helper code to load the dataset in Python and in Matlab.Figure 2: Quanser 3-DOF helicopter and its three degrees of freedom. The arrows indicate thedirection of increasing yaw, pitch and roll.BasePlatformHingeArmRotorsxyxyxyxyxyFigure 3: Helicopter coordinate frames11.45 cm32.5 cm5.0 cm65.0 cm3.0 cmFigure 4: Helicopter dimensionsPart 1 Estimate the helicopter angles (10 points)In this part you will implement an iterative algorithm to estimate the helicopters yaw, pitch and roll.Compared with the DLT, this algorithm is more flexible and can easily incorporate constraints andother knowledge, such as motion dynamics and information from other sensors. The algorithm is basedon the assumption that an optimal estimate of the helicopter angles will cause the projection of pointsin the helicopter model to fall close to their corresponding observations. This criterion is formulated asa least squares problem, where the quantity to be minimized is the sum of squared reprojection errors,E(p) = Xi||ui(p) ui||2. (1)Here, E is called the objective function, p contains the unknown parameters that we want to estimate,and ui(p) and ui are corresponding pairs of predicted and observed image locations. For the Quanserhelicopter, the parameter vector is p = (, , ) and uiis computed fromui = KXci(2)where Xciis the ith markers coordinates in the helicopter model transformed into the camera frame:Xci =Tcameraarm (, )Xarmii {0, 1, 2},Tcamerarotors (, , )Xrotorsii {3, 4, 5, 6}.(3)Likewise, uiis the observed image location of the ith marker.Note that although the markers have a square shape and could thereby provide four correspondenceseach, you will only use a single corner of each marker. These points are shown in the front page figure.Because ui(p) is a non-linear function of p, this is called a non-linear least squares problem. Unlikelinear least squares, there is no universally good algorithm to solve for the global minimum. Althoughthe field has recently discovered efficient global solvers for several interesting non-linear problems,these make strong assumptions about the underlying model. In absence of such a solver, the standardapproach is to use a generic non-linear optimization method. These tend to be iterative, in that theystart from an initial estimate and refine it over a number of iterations, so as to decrease the error. Suchmethods can at best guarantee convergence to a local minimum.The Matlab Optimization Toolbox (notably fmincon) and IPOPT are well-tested software packages forsolving generic optimization problems. Ceres, GTSAM and g2o are packages developed specificallyfor non-linear least squares problems that arise in computer vision, and utilize their special structurefor computational efficiency. The basis for these latter packages are the Gauss-Newton and LevenbergMarquardt methods, which you will implement in this part. These methods start at an initial estimateand iteratively apply small corrections, obtained by solving a linear least squares problem constructedat the current estimate.The Gauss-Newton methodThe derivation of the Gauss-Newton method is enlightening for understanding the more widely appliedmethod of Levenberg and Marquardt. First, it will be convenient to write our objective function asE(p) = Xni=1ri(p)2(4)where ri(p) is called a residual, and is a scalar-valued function of the parameters. The key idea in bothmethods is to linearize each residual using its Taylor expansion around a running estimate p. At eachiteration, this gives a local quadratic approximation of E of the formE(p + ) Xni=1ri(p) + rip(p)2:= E() (5)where is a small step. The local approximation E() has the form of a linear least squares objectivefunction, with as the variables, and is therefore much easier to minimize. The Gauss-Newton step isthe minimizer of E(), which can be obtained by solving the linear systemJT JGN = JTr (6)where r = [r1(p) rn(p)]Tis the vector containing all the residuals and J is the Jacobian containingtheir partial derivatives evaluated at p,Jij =ripj(p). (7)For example, if r Rnand p R3then J Rn3:J =r1/p1 r1/p2 r1/p3………rn/p1 rn/p2 rn/p3 . (8)The matrix JT J is called the (approximate) Hessian and Eq. (6) are called the normal equations.These can be solved by standard linear algebra techniques, e.g. numpy.linalg.solve in Python orthe backslash operator in Matlab, assuming that JT J is invertible. The Gauss-Newton method updatesthe current estimate by moving some amount (the step size) in the direction of GNp p + GN (9)and repeats the above process at the new estimate.If JT J is positive definite and JTr 6= 0, then there exists a non-zero step size that decreases the objectivefunction. However, a step size of 1 is optimal only if the linearization is exact. If the accuracy of thelinearization is poor, this step may actually increase the objective function, even if JT J is positivedefinite, indicating that a smaller step size should have been used. Automatically determining the stepsize is a key part of a Gauss-Newton implementation, but we will see that the Levenberg-Marquardtmethod simultaneously provides automatic step size control and improves several other issues.The Levenberg-Marquardt methodThe Gauss-Newton method has some shortcomings. Mainly, it requires a strategy to determine a goodstep size, and there is the possibility that is not a descent direction (JT J may not be positive definite).The Levenberg-Marquardt method is a small modification that addresses both issues simultaneously.The difference is that the step size is omitted (fixed to 1) and the normal equations are replaced with(JT J + I)LM = JTr (10)where > 0 is a scalar called the damping parameter, and is determined by the following rules. In agiven iteration, if the step LM obtained by solving (10) leads to a reduced error,E(p + LM) < E(p), (11)then the step is accepted and is decreased (e.g. divided by 3) before the next iteration. Otherwise, is increased (e.g. multiplied by 2) and the normal equations are solved again. This is repeated until astep is found for the current iteration that leads to a reduced error.An interpretation of the damping parameter can be found in Hartley and Zisserman A6.2, p.601. Mostimportant to note is that any positive value of guarantees that JT J + I is positive definite, andthereby that LM is a descent direction (locally).A typical initial value of is 103times the maximum of the elements on the diagonal of JT J. Inpractice its also a good idea to add a termination condition to prevent unnecessary iterations. Acommon choice is to stop when the change in the parameter vector between two successive iterationsis less than a desired precision, measured e.g. using the norm of the accepted step ||LM||.Computing partial derivativesPartial derivatives can be computed by deriving analytical expressions by hand and transcribing theexpressions to code. Symbolic processing software, like Matlab or SymPy in Python, can automaticallyderive the analytical expression for you, although these are usually not simplified that well. Thereis also the option of automatic differentiation, which is the ability of the language or a library toautomatically compute the partial derivative of a function with respect to its inputs.For this project, the most straightforward option may be to use the finite difference approximation. Fora function of a scalar parameter, the central finite difference approximation of its partial derivative isf(p)pf(p + ) f(p )2(12)where  is a small change in p. For functions of vector-valued parameters p Rd, the above formulais applied to each parameter pi, i = 1d, one at-a-time while keeping the other variables fixed.Note that setting  either too high or too low can both lead to instability. The best value depends on thescale of each parameter. In Part 1, the value 105seems to work fine.Task 1.1 (1 point)In order to apply the described methods to our problem, we need to define the residuals ri(p). At afirst glance, you may be tempted to define these as the norm of the vector differences,r(p) = h||u1(p) u1|| ||un(p) un||iT, (13)which gives one scalar residual per point correspondence. However, a better choice is to define tworesiduals per correspondence, one per horizontal difference and one per vertical difference,r(p) = h(u1(p) u1) (un(p) un) (v1(p) v1) (vn(p) vn)iT. (14)Mathematically, both choices define the exact same objective function (the sum of squared residuals),but give rise to very different linearizations. Speculate on why the linearization produced by the firstchoice might be problematic.The Gauss-Newton method can be derived generally for vector-valued residuals without this seeminglyad-hoc concatenation. The general formulation allows e.g. each 2D pixel error to be weighted by its22 inverse covariance matrix (see Szeliski 6.1.1), which is used in the previously mentioned softwarepackages. However, this is more cumbersome to implement in Python and Matlab. If the residuals areunweighted there is no difference with concatenation.Task 1.2 (1 point)The hand-out code contains a class called Quanser, which provides the complete helicopter model fromHW3 and a utility function to visualize the reprojected frames and points. The partially implementedmember method residuals is meant to compute the vector of residuals r as described above. Theresult should be an array of length 2n, where n is the number of markers; the first n elements shouldbe the horizontal differences and the last n elements should be the vertical differences.Finish the implementation of residuals and run the part1 script. It should print the residuals on thefirst image using the optimal angles. Check that they are small (within 10 pixels) and include thesenumbers in your report.Note that not every marker is detected in each image, so some of the residuals may be invalid. Insteadof returning only the valid residuals, it can be useful to keep the length of r the same for each image,and handle invalid residuals by multiplying the corresponding entries of r by 0 before returning thearray. The hand-out code provides a vector called weights which you can use to achieve this.Task 1.3 (2 points)Write a function that implements the Gauss-Newton method. See the hand-out code for some tips.Modify the part1 script to run the method up to image number 86, using a fixed step size of = 0.25and 100 iterations. The part1 script should generate a figure showing your estimated angles againstthe motor encoder logs (the helicopters trajectory). Include only this figure in your report.Task 1.4 (1 point)Run Gauss-Newton beyond image number 86. In Matlab or Python, you should see a warning indicatingthat JT J is singular (not invertible). Explain why this happens here and when it can happen in general.Task 1.5 (3 points)Write a function that implements the Levenberg-Marquardt method.Initialize = = = 0 and run the method on only the first image. Use a maximum of 100 iterationsand print the angle estimates and the value of in each iteration. What happens with ? How manyiterations does it take before the estimates stabilize to 0.001 radians precision?Finally, modify the part1 script to run Levenberg-Marquardt on the entire image sequence. Includethe plot of the estimated angles against the motor encoder logs.Task 1.6 (1 point)If your Levenberg-Marquardt implementation is correct, you will no longer get the above warning.Why? Is this a good way to handle the underlying issue or can you imagine a better strategy? (One ortwo sentences is an acceptable answer).Task 1.7 (1 point)Ignore the rotor carriage and consider the helicopter as just the arm with two degrees of freedom (, ).Suppose only a single marker is observed and that this marker is on the arm. This gives a single pointcorrespondence (when we ignore the remaining three corners of the marker). Argue that there thenexists up to two physical configurations (, ) that are local minimizers of the reprojection error (i.e.consistent with the marker observation).You may include a sketch to support your argument. (A photograph of a paper drawing is fine).Part 2 Estimate the platform pose (5 points)In Part 1, the transformation Tcameraplatform was given to you. Here you will investigate how it can be estimated.As input, you receive the metric coordinates of four coplanar points on the platform (the same that youdefined in Homework 3) and their corresponding pixel coordinates in the image. These are providedin matching order in platform_corners_metric.txt and platform_corners_image.txt.Task 2.1 (2 points)Estimate Tcameraplatform using the linear algorithm from Homework 4. Compute the predicted image locationof the platform points using the following two transformations(a) u = KH[X Y 1]T(b) u = K[R t][X Y 0 1]Twhere (X, Y ) are the metric coordinates, H is the matrix returned by the direct linear transform, and[R t] is the pose extracted from H, with R being a valid rotation matrix. Include a figure of both sets ofimage locations. Also compute their reprojection errors and explain why the reprojection errors of (a)should all be exactly zero, and why the reprojection errors of (b) are all, except for one point, non-zero.Task 2.2 (2 points)Estimate Tcameraplatform by minimizing the sum of squared reprojection errors with respect to R and t, usingLevenberg-Marquardt. Compare the resulting reprojection errors with those in the previous task.Tip: You will need to parameterize the pose as a vector. However, it is not obvious how to handle therotation matrix due to the constraints between the entries. Some options are described in Szeliski 2.1.4,but a simple solution is to use a local parameterization around a reference rotation matrix R0, e.g.R(p) = Rx(p1)Ry(p2)Rz(p3)R0 (15)where p1, p2, p3 are small angles. The reference rotation R0 may be obtained from the linear estimate.This parameterization suffers from gimbal lock in general, but for small angles it is fine. With threeadditional parameters for t, you obtain a minimal (six-dimensional) parameterization of Tcameraplatform.Task 2.3 (1 point)Suppose you only have three point correspondences instead of four. Show that there can then be morethan one physically plausible transformation Tcameraplatform that minimizes the reprojection error, by findinga specific example of a different local minimum for this particular dataset. Include the 4 4 matrix inyour report along with a short description of how you found it.Part 3 Calibrate the model via batch optimization (10 points)This part is much more challenging and requires strong programming and debugging abilities,along with a good amount of self-direction. We do not expect everyone to complete this part.The helicopter model will not perfectly match the physical reality. Besides inaccurate measurementsof the various lengths and the 3D marker locations, the structural assumptions themselves may alsobe violated; the shaft may not be exactly perpendicular to the platform, the hinge may not be locatedexactly on the yaw axis, and so on. The consequence of this is an inaccurate estimate of the angles.This can be addressed by performing batch optimization, whereby the static aspects of the model arejointly estimated with the variable degrees of freedom. This is similar to camera calibration, wherethe intrinsics are estimated from images from different viewpoints, under the assumption that only thecamera pose varies from image to image, while the intrinsics are shared. Likewise, we assume thatthe helicopter model has a set of static parameters that are shared between the images. If the model isnot over-parameterized, and if the helicopter undergoes sufficient motion over many images, then thereshould be a unique set of parameters that minimize the sum of reprojection errors over all the images.Suppose that the helicopter model has m static parameters. The angles can change in each image, so fora sequence of length l there are 3l dynamic parameters, giving a total of m + 3l optimization variables.This is a much larger optimization problem than before, and while you could use your methods fromPart 1, they will be prohibitively slow due to the need to invert JT J, which is now roughly a 10001000matrix if you use all the images. However, the problem turns out to have a similar bipartite structureas the bundle adjustment problem (Szeliski 7.4), and can be solved much faster by exploiting sparsity.In particular, notice that dynamic variables from different images are independent; adjusting the angles(, , ) for one image does not affect the reprojection error in a different image. This results in a sparseJacobian and consequently a sparse matrix JT J. If we partition and order the variables appropriately,then JT J will have a 3l 3l block, corresponding to the dynamic variables, which is block-diagonalwith blocks of size 3 3. Inverting a block-diagonal matrix is easy, as you simply invert each blockindependently. The Schur complement lets you exploit this fact for solving the original system, see e.g.the Wikipedia entry (link) or Hartley and Zisserman A6.3.1.Tips: Print the objective function value and make sure that it decreases in each iteration. When computing the Schur complement its very easy to forget the minus sign in the normalequations: JT J = JTr. Dont store or operate with the 3l 3l block as a dense matrix; store the individual 3 3 blocksand simplify the computations involving them. Add a damping term to the diagonal of each 3 3 block, as well as the m m block. (For Python users) Dont create a reference where a copy was intended.Figure 5: Form of the Jacobian J and the approximate Hessian A = JT J for l = 8 images andm static parameters. The shaded rectangles indicate blocks of possibly non-zero entries, while theunshaded areas are all zeros. Note that the proportions here are not indicative of the matrices that youwill obtain in your implementation; when you use all the available images, the 3l 3l block-diagonalmatrix will be much bigger than the m m static variable block.Task 3.1 (9 points)To begin with, let the five lengths indicated in Fig. 4 and the 3D marker coordinates be optimizationvariables rather than constants. This gives m = 5 + 3n static variables, with n = 7 being the numberof markers. Implement batch optimization in a new script and estimate the static model parameters.You will need a reasonable initial estimate of all the variables. You may use the inaccurate helicoptermodel and estimated trajectory from Part 1 for this. Modify the Quanser class to use the new lengthsand marker coordinates, and rerun the part1 script. Include all the generated figures in your report.While the above parameterization will greatly reduce the error, it still has some room for improvement.Define an even more general helicopter model by letting the three rotation axes (yaw, pitch and roll) befully parameterized. To avoid over-parameterization, each rotation axis should have four static variables,corresponding to two rotational and two translational degrees of freedom. (The third rotation is givenby , or , and any translation not perpendicular to the rotated axis is redundant). Repeat the batchoptimization and run the part1 script with the improved model. Include all the generated figures.Task 3.2 (1 point)With the second, more general, model from above, you should be able to achieve a mean reprojectionerror below 0.2 pixels. However, you may still observe a small biased error between the encoder logsand the vision estimates. The magnitude of the error varies smoothly over the trajectory, and is thereforenot explained by potential synchronization errors (i.e. a horizontal and vertical offset). What could bethe source of this remaining error

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] TTK4255 Midterm Project[Solved] TTK4255 Midterm Project
$25