Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Csci 5980 assignment 1 to 5 solutions

Cellphone camera S c r e e n Camera obscura Pinhole(a) Camera obscura (b) Example design Figure 1: You will design a camera obscura with your cellphone. In this assignment, you will design a camera obscura with your cellphone camera. The camera obscura is a dark chamber (box) with a small pinhole where light is mapped to the other side of the chamber (screen). This creates an upside-down image.1. Build a lightproof dark chamber: the chamber will be only illuminated by the light from the pinhole. Play with different size of chambers if possible. Cover the screen with a white paper and the rest of insider surfaces with black papers. Be creative!2. Make a pinhole on the other side of the screen. Start with a small hole (diameterISO 800).5. Set the camera exposure time (>8 sec).6. Make an additional hole where your cellphone camera can look inside. Locate your cellphone camera close to the pinhole without occluding pinhole. Make sure this hole is completely light sealed.7. Take a picture and adjust the pinhole size and camera settings to make better sharp image.For an Android phone, you can control the exposure time and sensitivity easily. Camera FV-5 Lite is an app to grab a long exposure image. For iOS, the exposure control is highly limited. There are apps that simulate the long exposure effect by taking many images. This creates a noisy image. You may borrow an Android phone or old digital camera. You can also refer to the website: https://graphics.cs.cmu.edu/courses/ 15-463/2015_fall/hw/proj5-camera/. 2Note: Lighting is extremely important. Given Minnesota’s weather, it is difficult to find a nice cloudless day. Plan outside data capture on sunny day ahead.(1) Describe your design (dimension) with images and your camera setting. Share your awesome photos. 3 Where am I? (a) Lateral view (b) Camera obscura image (c) Original image Figure 2: You will your camera obscura to estimate depth. Using this camera obscura, you will estimate the depth of a 3D object (from pinhole). You will take a picture containing your two friends (A and B) whom you know their height in meter where they will stand at different distance from the camera as show in Figure 2(a).(1) Given the height of A in meter (HA) and pixel (hA), derive and compute the distance from A to the pinhole. (2) Given the height of B in meter (HB) and pixel (hB), derive and compute the distance from B to the pinhole. Note: You can measure pixel distance using an image viewer software, e.g., irfanview, or MATLAB.(a) Geometry (b) Image cropping Figure 3: You will your camera obscura to estimate depth. 4 Dolly ZoomYou will simulate the Dolly zoom effect using your camera obscura. You will take at least two pictures with different camera locations, e.g., taking 5 step back, ∆d, as shown in Figure 3(a). A and B will appear smaller than the first image as ∆d > 0. You can apply the zoom-in effect by scaling and cropping the image such that A appears the same as shown in Figure 3(b). You may find a reference from here: https://www-users.cs.umn.edu/~hspark/CSci5980/Lec1_Supp_DollyZoom.pdf.Write-up: (1) Predict the height of B in the second image given hB in the first image. Reason about the prediction. Hint: You may need to compute ∆d with the information in the second image.(2) Confirm the prediction by measuring the height of B in the second image. Useful MATLAB functions: (a) Load image: im = imread(filename). (b) Save image: imwrite(im, filename). (c) Resize image: im = imresize(im, scale). (d) Display image: imshow(im) or imshow(filename). (e) Measure the distance in image: display the image in a figure, select the data cursor icon in the figure toolbar, and click on the image to get the coordinates of certain pixels. Calculate the distance with the pixel coordinates.Orginal image Undistorted image k > 0 k < 0 Figure 1: Lens distortion recification. Using your cellphone camera, you will estimate intrinsic camera parameters (focal length, principal points, and lens distortion parameter).(1) Derive K matrix using your camera specification. (2) Calibrate lens distortion using the radial distortion model (single parameter, k1). Show the original image, undistorted image, and two image examples of wrong k1 as shown in Figure 1. 23 Projective Line(a) Image (b) UMN logo Figure 2: (a) You will use your cellphone camera image to compute camera poses with respect to the ground plane and measure the height of your friend given the height of an object. (b) You will paste the UMN logo to the ground plane. Take a picture of your friend with many 3D objects such as street lamps and chair where two orthogonal directions on the ground plane are visible as shown in Figure 2(a). Apply lens undistortion to make the straight lines straight.(1) Derive and compute two vanishing points and a vanishing line, and visualize them on your image similar to Figure 2(a). (2) Compute camera rotation matrix, R, and visualize 3D camera axes with respect to the ground plane axes using MATLAB plot3 function. Give a geometric interpretation of the computed rotation matrix. (3) Measure the heights of at least 3D objects given your friend’s height using the cross ratio. Verify the height measurements. (4) Project UMN logo onto the ground plane or any planar surface. You are also free to choose different logo or image. 34 Panoramic Image(a) Input images h , x h y z p p p φ     =       p Camera center Cylindrical surface φ (b) Geometry h φ (c) Cylindrical coordinate (d) Panoramic image Figure 3: Given a collection of input images, you will create a panoramic images by projecting onto a cylindrical surface. 4You will create a panoramic image from multiple images (at least 8 images) taken by your cellphone camera using a cylindrical projection as shown in Figure 3(a). The panoramic image will be created in (φ, h) where φ and h are angle and height coordinate of the cylindrical surface, respectively, as shown in Figure 3(c). Note that the radius and height of the cylinder are set to the focal length and height of the image, respectively.(1) Express the direction vector pφ,h =px py pz T using φ and h as shown in Figure 3(c) and 3(b).(2) Given the first and second images, compute homography, 2H1 using 4 correspondences and relative rotation from first to second, 2R1 where the first image rotation is the identity matrix I3. λu2 = Hu1 µu2 = K2R1pφ,h (1) where u1 ↔ u2 is the corresponding points in the first and second image.(3) For all images, compute the rotation matrix, iR1, and visualize the camera Z axis in 3D using MATLAB plot3 function. Hint: iR1 =i Ri−1 i−1Ri−2 · · · 2 R1.(4) Create a panoramic image by copying RGB value of original image coordinate (u, v) to the cylindrical coordinate (φ, h) as shown in Figure 3(d). You may blend overlapping pixels by taking average.1. Rotate your camera about fixed rotation center (no translation). Translation of your camera produces mis-alignment. 2. Choose 4 correspondences very carefully. 3. Lens distortion may introduce mis-alignment. 4. Objects at far distance often work better.(a) Vanishing points (b) Checkerboard pattern Figure 1: (a) You will use the vanishing points in a single image to calibrate your cameras. (b) You will use multiple images of checkerboard patterns to calibrate your camera.You will calibrate your camera (focal length, principal points, and lens distortion parameters).(1) (Single Image Calibration) Take a single image where you can estimate three vanishing points vX inf, vY inf, and vZ inf as shown in Figure 3(c). Compute f, px, and py using the vanishing points.(2) (Multiple Image Calibration) Print out the checkerboard pattern as shown in Figure 1(b) (https://www-users.cs.umn.edu/~hspark/CSci5980/code/pattern.pdf). Derive f, px, and py based on homographies from multiple images (at least 5 images).Compute the intrinsic parameters by solving the least squares system.(3) (MATLAB Toolbox Calibration) Use the existing MATLAB Calibration Toolbox (https://www.vision.caltech.edu/bouguetj/calib_doc/) to compute the intrinsic parameters (at least 20 images from different views). Verify the estimate with your solutions in (1) and (2).3 Rotation Interpolation(a) Left image (b) Right image (c) Interpolated View Figure 2: Using two images of pure rotation (a and b), you will generate interpolated view by sampling rotation (c) In HW 2, we generate a panoramic image from multiple images. In this problem, you will use two images of pure rotation (Figure 2(a) and 2(b)) to generate many views by interpolating between rotations as shown in Figure 2(c).(1) Compute the pure rotation, RR L from the homography between two images, RHL. (2) Interpolate between two rotations, { iRL} N i=1, from I3 (Left image) to R (Right image) where N > 20 is the number of interpolated rotations using SLERP. (3) Generate interpolated images similar to Figure 2(c). • Given iRL, compute homography from Left image to the interpolated image, iHL and warp it, im1 = ImageWarping(imageLeft, iHL). • Compute homography from Right image to the interpolated image, i.e., iHR =i HR LH−1 L . Warp image, im2 = ImageWarping(imageRight, iHR). • Composite two images based on distance, imInterpolated = w*im1+(1-w)*im2 where w is weight, inversely proportional to the distance. 34 Tour into Your Picture (a) Source image (b) Rectified image u11 u12 u21 u22 v11 v12 v21 v22 p (c) Vanishing point (d) 3D reconstruction (e) Tour into picture Figure 3: (a) Given a single image taken by a cellphone, you will navigate into the image. (b) You rectify the image such that the Y axis of the camera aligns with the surface normal of the ground plane. (c) You define the five principal planes using vanishing points. (d) A 3D box can be texture mapped.Take an photo with your camera of a scene where you can observe the Z vanishing point. You will virtually tour into your photo that creates 3D sensation.(1) Rectification: Rectify your image such that the surface normal of the ground plane is aligned with the Y axis of your camera using the homography between ground plane and the camera. Derive and compute the homography for the rectification. Visualize 4rectified image similar to Figure 3(b). (2) Z vanishing point: Given the rectified image, derive and compute the Z vanishing point, p (Figure 3(c)) using the four edges from the ground and ceiling planes using linear least squares (the intersection of four lines), i.e., Ax = b.1. Define the four corners (u11, u12, u21, u22) of the purple plane (Figure 3(c)) in the image.2. Define the depth of the four corner, e.g., d=1 and compute the 3D locations of the corners, U = dK−1u.3. Express the 3D locations of frontal plane, V11, V12, V21, and V22 using d, p, K, u11, u12, u21, u22, v11, v12, v21, v22.4. Express a 3D point, X, in the 3D plane (U11, U12, U21, and U22), using two parameters, e.g., X = a1µ1 + a2µ2 + a3. Apply this for the rest four planes.1. Derive and compute the homography per plane to the rectified image, i.e., sHplane1, sHplane2, sHplane3, sHplane4, sHplane5. Hint: λu = KX = Ka1 a2 a3  µ1 µ2 1   = sHplane   µ1 µ2 1  .2. Derive and compute the homography per plane to a target image of pure rotation about Y axis of the camera with 20 degree (tHplane1, tHplane2, tHplane3, tHplane4, tHplane5). Derive and compute the homography from the rectified image to the target image. Hint: tHs =t Hplane sH−1 plane.3. Derive and compute the homography per plane to a target image of pure translation along Z axis of the camera with 0.2d. Derive and compute the homography from the rectified image to the target image. (4) Generate camera trajectory by interpolating rotation and translation (at least 5 discrete rotation and translation) and generate a video of navigation. 5One of key skills to learn in computer vision is the ability to use other open source code, which allow you not to re-invent the wheel. We will use VLFeat by A. Vedaldi and B. Fulkerson (2008) for SIFT extraction given your images. Install VLFeat from following: https://www.vlfeat.org/install-matlab.html Run vl demo sift basic to double check the installation is completed. (NOTE) You will use this library only for SIFT feature extraction and its visualization. All following visualizations and algorithms must be done by your code.Figure 1: Given your two cellphone images (left and right), you will extract SIFT descriptors and visualize them using VLFeat. You will extract David Lowe’s SIFT (Scale Invariant Feature Transform) features from your cellphone images as shown in Figure 1. First, take a pair of pictures with your calibrated camera (intrinsic parameter, K, and radial distortion parameter, k, are precalibrated.) as follow:1. Take the first picture and another one after moving one step right (1m).2. Common 3D objects across different depths, e,g., buildings, ground plane, and tree, appear in both images.3. Two images have to be similar enough so that the appearance based image matching can be applied, i.e., SIFT feature does not break, while maintaining the baseline between cameras (at least 1 m), i.e., similar camera orientation and sufficient translation.4. Avoid a 3D scene dominated by one planar surface, e.g., looking at ground plane.(SIFT visualization) Use VLFeat to visualize SIFT features with scale and orientation as shown in Figure 1. You may want to plot up to 500 feature points. You may want to follow the following tutorial: https://www.vlfeat.org/overview/sift.html(a) Matching from I1 to I2 (b) Matching from I2 to I1 (c) Matching from I1 to I2 after ratio test (d) Matching from I2 to I1 after ratio test (e) Bidirectional matching between I1 and I2 Figure 2: You will match points between I1 and I2 using SIFT features. (NOTE) From this point, you cannot use any function provided by VLFeat. The SIFT is composed of scale, orientation, and 128 dimensional local feature descriptor (integer), f ∈ Z128. You will use the SIFT features to match between two images, I1 and I2.(1) (Nearest neighbor search) Let two sets of features be {f1, · · · ,fN1 } from I1 and {g1, · · · , gN2 } from I2 where N1 and N2 are the number of features in image 1 and 2, respectively. Compute nearest neighbor per feature and visualize ({f1, · · · ,fN1 } → {g1, · · · , gN2 } and {g1, · · · , gN2 } → {f1, · · · ,fN1 }) as shown in Figure 2(a) and Figure 2(b). Note that the distance between two features is defined as d = kf − gk. You may use knnsearch function in MATLAB.(2) (Ratio test) Filter out matches using the ratio test, i.e., keep the match if dij1 /dij2 n then 7: n ← nr 8: F = Fr 9: end if 10: end for (2) (Epipole and epipolar line) Using the fundamental matrix, visualize epipole and epipolar lines. 7-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 0 0.8 0.6 0.4 0.2 0 0.5 0.2 0 0 -0.2 -0.5 -0.4 -1 -0.6 -1.5 0.5 0 -0.5 0 0 0.2 -0.2 0.4 0 0.2 0.4 0.6 0.8 1 -1 -0.5 0 0.5 0.2 0 -0.2 -0.4 -1.5 -0.6 0 Figure 4: Four configurations of camera pose from a fundamental matrix.(3) (Camera pose estimation) Compute 4 configurations of relative camera poses: [R1 C1 R2 C2 R3 C3 R4 C4] = CameraPose(F, K) Input: F is the fundamental matrix and K is the intrinsic parameter. Output: R1 C1 · · · R4 C4 are rotation and camera center (represented in the world coordinate system).(4) Visualize the 4 configuration in 3D as shown in Figure 4. 8Given four configurations of relative camera pose, you will find the best camera pose by verifying through 3D point triangulation.(1) (Linear triangulation) Write a code that computes the 3D point given the correspondence, u ↔ v, and two camera projection matrices: [X] = LinearTriangulation(P1,u,P2,v) Input: P1, P2 ∈ R3×4 are two camera projection matrices, and u ↔ v ∈ R2 are their 2D correspondence. Output: X ∈ R3 is the triangulated 3D point.Hint: Use the triangulation method by linear solve, i.e.,(2) (Cheirality) Write a code that computes the number of 3D points in front of two cameras. The condition of a 3D point being in front of camera is called cheirality: idx = CheckCheirality(Y,C1,R1,C2,R2) Input: Y is a n × 3 matrix that includes n 3D points, and C1,R1/C2,R2 are the first and second camera poses (camera center and rotation). Output: idx is a set of indices of 3D points that satisfy cheirality. Hint: the point must satisfy r3(X − C) > 0 where r3 is the 3rd row of the rotation matrix (z-axis of the camera), C is the camera position and X is the 3D point. 9 (3) (Camera pose disambiguation) Based on cheirality, find the correct camera pose. Visualize 3D camera pose and 3D points together as shown in Figure 5. Hint: Use plot3 and DisplayCamera.m to visualize them. 0 50 100 0 -20 -50 -40 -60 -80 0 0 -100 20 -50 40 60 0 80 50 40 20 40 60 80 0 -20 0 20 -20 -80 -60 -40 0 -40 -20 0 20 0-80 -60 -40 -20 0 20 40 60 (a) nValid = 10 (b) nValid = 488 (c) nValid = 0 (d) nValid = 0 Figure 5: You will visualize four camera pose configurations with point cloud.(4) (Reprojection) Project 3D points to each camera and visualize reprojection, ub and measurement, u onto the image as shown in Figure 6, i.e., λ ub 1 = KR I −C X 1 where X is the reconstructed point. Reprojection Measurement Reprojection Measurement Figure 6: Visualization of measurements and reprojection.Figure 1: You will capture your cellphone images to reconstruct camera pose and 3D points.In this assignment, you will use your cellphone images (more than 5) to reconstruct 3D camera poses and points with full bundle adjustment. Make sure you have enough baseline (translation) between images for well conditioned fundamental matrix while retaining enough number of correspondences between image. Avoid a scene dominated by a planar surface, i.e., the images need to contain many 3D objects as shown in Figure 1.You will write a full pipeline of the structure from motion algorithm including matching, camera pose estimation using fundamental matrix, PnP, triangulation, and bundle adjustment. A nonlinear optimization is always followed by the initial estimate by linear least squares solution. The pipeline is described in Algorithm 1.Algorithm 1 Structure from Motion 1: [Mx, My] = GetMatches(I1, · · · , IN ) 2: Normalize coordinate in Mx and My, i.e., x = K−1x. 3: Select two images Ii1 and Ii2 for the initial pair reconstruction. 4: [R, C, X] = CameraPoseEstimation([Mx(:,i1) My(:,i1)], [Mx(:,i2) My(:,i2)]) 5: P = {P1,P2} where Pi1 =I3 0, Pi2 = RI3 −C6: R = {i1, i2} 7: while |R| < N do 8: i = GetBestFrame(Mx, My, R); 9: [Ri , Ci ] = PnP RANSAC([Mx(:,i) My(:,i)], X) 10: [Ri , Ci ] = PnP Nonlinear(Ri Ci , [Mx(:,i) My(:,i)], X) 11: Pi = RiI3 −Ci12: for f = 1 : |R| do 13: U = FindUnreconstructedPoints(X, Rf , i, Mx, My) 14: for j = 1 : |U| do 15: u = [Mx(Uj , i), My(Uj , i)] and v = [Mx(Uj , Rf ), My(Uj , Rf )] 16: x = LinearTriangulation(u, Pi , v, PRf ) 17: x = NonlinearTriangulation(X, u, Ri , Ci , v, RRf , CRf ) 18: X = X ∪ x 19: end for 20: end for 21: P = P ∪ Pi and R = R ∪ i. 22: [P, X] = BundleAdjustment(P, X, R, Mx, My) 23: end whileGiven a set of images, I1, · · · , IN , you will find matches across all images where N is the number of images similar to HW #4. Pick a reference image, Iref , and match with other images using SIFT features from VLFeat, i.e., Iref ↔ I1, · · · , Iref ↔ IN (no need to match Iref ↔ Iref ).Your matches are outlier free, i.e., bidirectional knn match → ratio test → inliers from the fundamental matrix based RANSAC. Based on the matches, you will build a measurement matrix, Mx and My: [Mx, My] = GetMatches(I1, · · · , IN ) Mx: F×N matrix storing x coordinate of correspondences My: F×N matrix storing y coordinate of correspondences The f th feature point in image Ii corresponds to a point in image Ij . The x and y coordinates of the correspondence is stored at (f, i) and (f, j) elements in Mx and My, respectively. If (f, i) does not correspond to any point in image Ik, you set -1 to indicate no match as shown in Figure 2.Important: For better numerical stability, you can transform the measurements to the normalized coordinate by multiplying K−1 , i.e., x = K−1x where x is 2D measured points in homogeneous coordinate. You can run structure from motion in the normalized coordinate by factoring out K. When visualizing projection in the image, the coordinate needs to be transformed back to original coordinate by multiplying K. F N f ,i x f , j f x i j Mx N f ,i y f , j f y i j My ( xf ,i , y f ,i) ↔ ( xf , j , y f , j) ↔ ( xf ,k , y f ,k ) −1 k −1 k Figure 2: The f th feature point in image Ii corresponds to a point in image Ij . The x and y coordinates of the correspondence is stored at (f, i) and (f, j) elements in Mx and My, respectively. If (f, i) does not correspond to any point in image Ik, you set -1 to indicate no match. 4You will write a camera pose estimation code that takes correspondences between two images, Ii1 and Ii2 where i1 and i2 are the indices of the initial images to reconstruct selected manually. [R, C, X] = CameraPoseEstimation(u1, u2) R and C: the relative transformation of the i2 image u1 and u2: 2D-2D correspondences As studied in HW #4, you will compute: 1. Fundamental matrix via RANSAC on correspondences, Mx(:,i1), My(:,i2) 2. Essential matrix from the fundamental matrix 3. Four configurations of camera poses given the essential matrix 4. Disambiguation via chierality (using 3D point linear triangulation): X = LinearTriangulation(u, Pi , v, Pj ) Write-up: (a) Inlier matches Top view Oblique view (b) 3D camera pose Figure 3: Camera pose estimation. (1) Visualize inlier matches as shown in Figure 3(a). (2) Visualize camera pose and 3D reconstructed points in 3D as shown in Figure 3(b). 5You will write a nonlinear triangulation code. Given the linear estimate for the point triangulation, X, you will refine the 3D point X =X Y Z T to minimize geometric error (reprojection error) via iterative nonlinear least squares estimation, ∆X =∂f(X) ∂X T ∂f(X) ∂X !−1 ∂f(X) ∂X T (b − f(X)). (1) Write-up: (1) Derive the point Jacobian, i.e., ∂f(X)j ∂X and write the following code. df dX = JacobianX(K, R, C, X)(2) Write a code to refine the 3D point by minimizing the reprojection error and visualize reprojection error reduction similar to Figure 5. X = NonlinearTriangulation(X, u1, R1, C1, u2, R2, C2) Algorithm 2 Nonlinear Point Refinement 1: b =u T 1 u T 2 T 2: for j = 1 : nIters do 3: Build point Jacobian, ∂f(X)j ∂X . 4: Compute f(X). 5: ∆X = ∂f(X) ∂X T ∂f(X) ∂X + λI −1 ∂f(X) ∂X T (b − f(X)) 6: X = X + ∆X 7: end forYou will register an additional image, Ij using 2D-3D correspondences. Write-up: (1) (3D-2D correspondences) Given 3D triangulated points, find 2D-3D matches, X ↔ u. (2) (Perspective-n-Point algorithm) Write a code that computes 3D camera pose from 3D-2D correspondences: [R, C] = LinearPnP(u, X) X: n × 3 matrix containing n 3D reconstructed points u: n × 2 matrix containing n 2D points in the additional image I3 R and C: rotation and translation for the additional image. Hint: After the linear solve, rectify the rotation matrix such that det(R) = 1 and scale C according to the rectification.(3) (RANSAC PnP) Write a RANSAC algorithm for the camera pose registration (PnP) given n matches using the following pseudo code: Algorithm 3 PnP RANSAC 1: nInliers ← 0 2: for i = 1 : M do 3: Choose 6 correspondences, Xr and ur, randomly from X and u. 4: [Rr, tr] = LinearPnP(ur, Xr) 5: Compute the number of inliers, nr, with respect to Rr, tr. 6: if nr > nInliers then 7: nInliers ← nr 8: R = Rr and t = tr 9: end if 10: end for Visualize 3D registered pose as shown in Figure 4. -40 -20 0 20 (a) Front view 0 10 20 30 40 50 60 0 -80 -60 -40 -20 0 20 40 (b) Top view Figure 4: Additional image registration. (4) (Reprojection) Visualize measurement and reprojection to verify the solution.Given the initial estimate Ri and ti , you will refine the camera pose to minimize geometric error (reprojection error) via iterative nonlinear least squares estimation, ∆p =∂f(p) ∂p T ∂f(p) ∂p !−1 ∂f(p) ∂p T (b − f(p)), f(p) =        u1/w1 v1/w1 . . . un/wn vn/wn        ,   u v w   = RiI3 −C X 1 , b =        x1 y1 . . . xn yn        (2) where p =CT q T T . C ∈ R3 is the camera optical center and q ∈ S 3 is the quaternion representation of the camera rotation. It is possible to minimize the overshooting by adding damping, λ as follows: ∆p =∂f(p) ∂p T ∂f(p) ∂p + λI !−1 ∂f(p) ∂p T (b − f(p)), (3) where λ is the damping parameter. You can try λ ∈ [0, 10].Note that the conversion between quaternion and rotation matrix is given as follows: R =   1 − 2q 2 z − 2q 2 y −2qzqw + 2qyqx 2qyqw + 2qzqx 2qxqy + 2qwqz 1 − 2q 2 z − 2q 2 x 2qzqy − 2qxqw 2qxqz − 2qwqy 2qyqz + 2qwqx 1 − 2q 2 y − 2q 2 x   , q =     qw (R32 − R23)/(4qw) (R13 − R31)/(4qw) (R21 − R12)/(4qw)     , where qw = √ 1 + R11 + R22 + R33 2 , kqk = 1 (4) 9Write-up:(1) Derive the quaternion Jacobian to rotation using Equation (4), i.e., ∂R ∂q and write the following code. Note: ignore the normalization kqk = 1. dR dq = JacobianQ(q) (2) Derive the rotation Jacobian to projection using Equation (2), i.e., ∂f(p)j ∂R where f(p)j =uj/wj vj/wj T and write the following code. Note: use vectorized form of the rotation matrix. df dR = JacobianR(R, C, X) (3) Derive the expression of ∂f(p)j ∂q using the chain rule. (4) Derive the camera center Jacobian to projection using Equation (2), i.e., ∂f(p)j ∂C and write the following code. df dC = JacobianC(R, C, X)(5) Write a code to refine the camera pose by minimizing the reprojection error and visualize reprojection error reduction as shown in Figure 5: [R, C] = PnP Nonlinear(R C, u, X) Algorithm 4 Nonlinear Camera Pose Refinement 1: p =CT q T T 2: for j = 1 : nIters do 3: C = p1:3, R=Quaternion2Rotation(q), q = p4:7 4: Build camera pose Jacobian for all points, ∂f(p)j ∂p = h ∂f(p)j ∂C ∂f(p)j ∂q i . 5: Compute f(p). 6: ∆p = ∂f(p) ∂p T ∂f(p) ∂p + λI −1 ∂f(p) ∂p T (b − f(p)) using Equation (3). 7: p = p + ∆p 8: Normalize the quaternion scale, p4:7 = p4:7/kp4:7k. 9: end for 10Measurement Linear estimate (reproj: 0.199104) Nonlinear estimate (reproj: 0.119272) (a) Reprojection error Measurement Linear estimate (reproj: 0.199104) Nonlinear estimate (reproj: 0.119272) (b) Close up Figure 5: Nonlinear refinement reduces the reprojection error (0.19→0.11). 8 Bundle Adjustment You will write a nonlinear refinement code that simultaneously optimizes camera poses and 3D points using the sparse nature of the Jacobian matrix. [P, X] = BundleAdjustient(P, X, R, Mx, My) For example, consider 3 camera poses and 2 points. The Jacobian matrix can be written as follows: J =∂f(p1,X1) ∂p1 02×7 02×7 ∂f(p1,X1) ∂X1 02×3 02×7 ∂f(p2,X1) ∂p2 02×7 ∂f(p2,X1) ∂X1 02×3 02×7 02×7 ∂f(p3,X1) ∂p3 ∂f(p3,X1) ∂X1 02×3 ∂f(p1,X2) ∂p1 02×7 02×7 02×3 ∂f(p1,X2) ∂X2 02×7 ∂f(p2,X2) ∂p2 02×7 02×3 ∂f(p2,X2) ∂X2 02×7 02×7 ∂f(p3,X2) ∂p3 02×3 ∂f(p3,X2) ∂X2           =Jp JX(5) where Jp and JX are the Jacobian for camera and point, respectively, and λ ∈ [0, 10]. The normal equation, J TJ∆x = J T (b − f(x)) can be decomposed into: A B BT D ∆pb ∆Xb = ep eX , (6) where A = J T pJp + λI, B = J T pJX, D = J T XJX + λI ep = J T p (b − f(x)), eX = J T X(b − f(x)) where pb =p T 1 · · · p T Iand Xb =XT 1 · · · XT Mwhere I and M are the number of images and points, respectively. 11The decomposed normal equation in Equation (6) allows us to efficiently compute the inverse of J TJ using Schur complement of D: ∆pb = (A − BD−1B T ) −1 (ep − BD−1 eX), ∆Xb = D−1 (eX − B T∆pb) where D is a block diagonal matrix whose inverse can be efficiently computed by inverting small block matrix: D =    d1 . . . dM    , D−1 =    d −1 1 . . . d −1 M    (7) The bundle adjustment algorithm is summarized in Algorithm 5. Note that not all points are visible from cameras. You need to reason about the visibility, i.e., if the point is not visible from the camera, the corresponding Jacobian and measurement from J and b will be omitted, respectively. 12Algorithm 5 Bundle Adjustment 1: pb =p T 1 · · · p T I T and Xb =XT 1 · · · XT M2: for iter = 1 : nIters do 3: Empty Jp, JX, b, f, Dinv. 4: for i = 1 : M do 5: d = 03×3 6: for j = 1 : I do 7: if the i th point is visible from the j th image then 8: J1 = 02×7I and J2 = 02×3M 9: J1(:, 7(j − 1) + 1 : 7j) = ∂f(pj ,Xi) ∂pj 10: J2(:, 3(i − 1) + 1 : 3i) = ∂f(pj ,Xi) ∂Xi 11: Jp =J T p J T 1 T and JX =J T X J T 2 T 12: d = d + ∂f(pj ,Xi) ∂Xi T ∂f(pj ,Xi) ∂Xi 13: b =b T u T ij 14: f =f T x T ij where µ xij 1 = RjI −Cj15: end if 16: end for 17: d = d + λI 18: Dinv = blkdiag(Dinv, d −1 ) 19: end for 20: ep = J T p (b − f) 21: eX = J T X(b − f) 22: A = J T pJp + λI, B = J T pJX, D−1 = Dinv 23: ∆pb = (A − BD−1BT ) −1 (ep − BD−1eX) 24: Normalize quaternions. 25: ∆Xb = D−1 (eX − BT∆pb) 26: end forWrite-up: You will first start with two images and 10 3D points to test your bundle adjustment program. (1) Derive Jp and JX. (2) Run Algorithm 5 and visualize the reprojection error similar to Figure 5. 139 Putting All Things Together Write-up: You will run with all images and 3D points based on Algorithm 1. (1) Visualize 3D camera pose and points as shown in Figure 6. (2) Visualize reprojection for all images. Top view Oblique view Figure 6: You will reconstruct all images and 3D points using structure from motion.