3D vision basics: CV guide

Pinhole projection

A 3D point X = (X, Y, Z) in the camera frame projects to the image plane via focal length f and principal point (c_x, c_y):

x = f · X/Z + c_x, y = f · Y/Z + c_y

In homogeneous coordinates this is a 3×4 projection matrix K [R | t] combining intrinsics K and extrinsics R, t. Lens distortion (radial/tangential) is corrected before accurate geometric reasoning—see the calibration chapter.

Depth from stereo

Two calibrated cameras with known relative pose (baseline B) view the same scene. A 3D point appears at horizontal positions x_L, x_R after rectification. Disparity d = x_L − x_R relates to depth Z ≈ f · B / d (up to units). Dense stereo estimates disparity at every pixel (SGBM, BM in OpenCV); quality depends on texture and calibration.

Rectification

Warp both images so epipolar lines align horizontally—simplifies matching to 1D search along rows.

Monocular depth

Single-image CNNs (MiDaS, DPT) predict relative depth without stereo—useful but scale-ambiguous without constraints.

OpenCV stereo (workflow sketch)

import cv2
import numpy as np

# After calibration: cameraMatrix1, dist1, cameraMatrix2, dist2, R, T
# stereoRectify → R1, R2, P1, P2, Q, roi1, roi2
# initUndistortRectifyMap + remap for left/right rectified images

# Example disparity (tune numDisparities, blockSize)
left = cv2.imread("left.png", cv2.IMREAD_GRAYSCALE)
right = cv2.imread("right.png", cv2.IMREAD_GRAYSCALE)
stereo = cv2.StereoSGBM_create(minDisparity=0, numDisparities=128, blockSize=5)
disp = stereo.compute(left, right).astype(np.float32) / 16.0

# Q from stereoRectify — reproject to XYZ in homogeneous coords
# points_3d = cv2.reprojectImageTo3D(disp, Q)

You must obtain Q from a proper stereoRectify with calibrated intrinsics and stereo extrinsics—placeholder comments mark the missing steps.

Point clouds and meshes

A point cloud is a set of (x, y, z) samples, often with color (RGB-D) or normals. Formats: PLY, PCD, LAS. Downstream: ICP for alignment, RANSAC for plane fitting, Poisson / Delaunay for meshing. Libraries: Open3D, PCL, CloudCompare.

Two-view triangulation

Given matched points in two images and known projection matrices, cv2.triangulatePoints recovers homogeneous 3D coordinates. Reprojection error measures calibration and correspondence quality.

import cv2
import numpy as np

# P1, P2: 3x4 projection matrices; x1, x2: 2xN homogeneous pixel coords
X_h = cv2.triangulatePoints(P1, P2, x1, x2)
X = (X_h[:3] / X_h[3]).T

What comes next

The following tutorials in this hub cover camera calibration (intrinsics, distortion), stereo vision in depth, and SLAM for simultaneous localization and mapping—closing the loop from single images to moving sensors.

                    Takeaways
                    3D reasoning starts from the pinhole model and calibrated K, R, t.
Stereo disparity gives metric depth given baseline and rectification.
Point clouds unify depth outputs for robotics and 3D analytics.

                

Quick FAQ

Reflective or textureless surfaces break stereo matching; combine multiple views, structured light, or temporal filtering. Check calibration and exposure sync between cameras.

RGB-D (Kinect-class) projects a pattern or uses ToF for dense depth at short range; stereo scales with baseline and works outdoors with good calibration.

Related Computer Vision Links

3D vision (introduction)

Pinhole projection

Depth from stereo

Rectification

Monocular depth

OpenCV stereo (workflow sketch)

Point clouds and meshes

Two-view triangulation

What comes next

Takeaways

Quick FAQ

Related Computer Vision Links

Pinhole projection

Depth from stereo

Rectification

Monocular depth

OpenCV stereo (workflow sketch)

Point clouds and meshes

Two-view triangulation

What comes next

Takeaways

Quick FAQ

Noisy depth on windows?

RGB-D vs stereo?