Pinhole projection
A 3D point X = (X, Y, Z) in the camera frame projects to the image plane via focal length f and principal point (cx, cy):
x = f · X/Z + cx, y = f · Y/Z + cy
In homogeneous coordinates this is a 3×4 projection matrix K [R | t] combining intrinsics K and extrinsics R, t. Lens distortion (radial/tangential) is corrected before accurate geometric reasoning—see the calibration chapter.
Depth from stereo
Two calibrated cameras with known relative pose (baseline B) view the same scene. A 3D point appears at horizontal positions xL, xR after rectification. Disparity d = xL − xR relates to depth Z ≈ f · B / d (up to units). Dense stereo estimates disparity at every pixel (SGBM, BM in OpenCV); quality depends on texture and calibration.
Rectification
Warp both images so epipolar lines align horizontally—simplifies matching to 1D search along rows.
Monocular depth
Single-image CNNs (MiDaS, DPT) predict relative depth without stereo—useful but scale-ambiguous without constraints.
OpenCV stereo (workflow sketch)
import cv2
import numpy as np
# After calibration: cameraMatrix1, dist1, cameraMatrix2, dist2, R, T
# stereoRectify → R1, R2, P1, P2, Q, roi1, roi2
# initUndistortRectifyMap + remap for left/right rectified images
# Example disparity (tune numDisparities, blockSize)
left = cv2.imread("left.png", cv2.IMREAD_GRAYSCALE)
right = cv2.imread("right.png", cv2.IMREAD_GRAYSCALE)
stereo = cv2.StereoSGBM_create(minDisparity=0, numDisparities=128, blockSize=5)
disp = stereo.compute(left, right).astype(np.float32) / 16.0
# Q from stereoRectify — reproject to XYZ in homogeneous coords
# points_3d = cv2.reprojectImageTo3D(disp, Q)
You must obtain Q from a proper stereoRectify with calibrated intrinsics and stereo extrinsics—placeholder comments mark the missing steps.
Point clouds and meshes
A point cloud is a set of (x, y, z) samples, often with color (RGB-D) or normals. Formats: PLY, PCD, LAS. Downstream: ICP for alignment, RANSAC for plane fitting, Poisson / Delaunay for meshing. Libraries: Open3D, PCL, CloudCompare.
Two-view triangulation
Given matched points in two images and known projection matrices, cv2.triangulatePoints recovers homogeneous 3D coordinates. Reprojection error measures calibration and correspondence quality.
import cv2
import numpy as np
# P1, P2: 3x4 projection matrices; x1, x2: 2xN homogeneous pixel coords
X_h = cv2.triangulatePoints(P1, P2, x1, x2)
X = (X_h[:3] / X_h[3]).T
What comes next
The following tutorials in this hub cover camera calibration (intrinsics, distortion), stereo vision in depth, and SLAM for simultaneous localization and mapping—closing the loop from single images to moving sensors.
Takeaways
- 3D reasoning starts from the pinhole model and calibrated
K, R, t. - Stereo disparity gives metric depth given baseline and rectification.
- Point clouds unify depth outputs for robotics and 3D analytics.