Computer Vision Chapter 25

3D vision (introduction)

While standard images are 2D projections of the 3D world, many applications need depth, 3D shape, or camera pose: robotics, AR, autonomous driving, metrology. This chapter ties together the pinhole model, stereo and disparity, triangulation, and point clouds, and points to calibration and SLAM topics that follow in this series.

Pinhole projection

A 3D point X = (X, Y, Z) in the camera frame projects to the image plane via focal length f and principal point (cx, cy):

x = f · X/Z + cx, y = f · Y/Z + cy

In homogeneous coordinates this is a 3×4 projection matrix K [R | t] combining intrinsics K and extrinsics R, t. Lens distortion (radial/tangential) is corrected before accurate geometric reasoning—see the calibration chapter.

Depth from stereo

Two calibrated cameras with known relative pose (baseline B) view the same scene. A 3D point appears at horizontal positions xL, xR after rectification. Disparity d = xL − xR relates to depth Z ≈ f · B / d (up to units). Dense stereo estimates disparity at every pixel (SGBM, BM in OpenCV); quality depends on texture and calibration.

Rectification

Warp both images so epipolar lines align horizontally—simplifies matching to 1D search along rows.

Monocular depth

Single-image CNNs (MiDaS, DPT) predict relative depth without stereo—useful but scale-ambiguous without constraints.

OpenCV stereo (workflow sketch)

import cv2
import numpy as np

# After calibration: cameraMatrix1, dist1, cameraMatrix2, dist2, R, T
# stereoRectify → R1, R2, P1, P2, Q, roi1, roi2
# initUndistortRectifyMap + remap for left/right rectified images

# Example disparity (tune numDisparities, blockSize)
left = cv2.imread("left.png", cv2.IMREAD_GRAYSCALE)
right = cv2.imread("right.png", cv2.IMREAD_GRAYSCALE)
stereo = cv2.StereoSGBM_create(minDisparity=0, numDisparities=128, blockSize=5)
disp = stereo.compute(left, right).astype(np.float32) / 16.0

# Q from stereoRectify — reproject to XYZ in homogeneous coords
# points_3d = cv2.reprojectImageTo3D(disp, Q)

You must obtain Q from a proper stereoRectify with calibrated intrinsics and stereo extrinsics—placeholder comments mark the missing steps.

Point clouds and meshes

A point cloud is a set of (x, y, z) samples, often with color (RGB-D) or normals. Formats: PLY, PCD, LAS. Downstream: ICP for alignment, RANSAC for plane fitting, Poisson / Delaunay for meshing. Libraries: Open3D, PCL, CloudCompare.

Two-view triangulation

Given matched points in two images and known projection matrices, cv2.triangulatePoints recovers homogeneous 3D coordinates. Reprojection error measures calibration and correspondence quality.

import cv2
import numpy as np

# P1, P2: 3x4 projection matrices; x1, x2: 2xN homogeneous pixel coords
X_h = cv2.triangulatePoints(P1, P2, x1, x2)
X = (X_h[:3] / X_h[3]).T

What comes next

The following tutorials in this hub cover camera calibration (intrinsics, distortion), stereo vision in depth, and SLAM for simultaneous localization and mapping—closing the loop from single images to moving sensors.

Takeaways

  • 3D reasoning starts from the pinhole model and calibrated K, R, t.
  • Stereo disparity gives metric depth given baseline and rectification.
  • Point clouds unify depth outputs for robotics and 3D analytics.

Quick FAQ

Reflective or textureless surfaces break stereo matching; combine multiple views, structured light, or temporal filtering. Check calibration and exposure sync between cameras.

RGB-D (Kinect-class) projects a pattern or uses ToF for dense depth at short range; stereo scales with baseline and works outdoors with good calibration.