What is a geometric transform?
Each output pixel asks “which input coordinate should I sample?” That mapping can be a simple scale and shift, a rotation around a point, or a more general homography (perspective). Implementation samples the source image at (possibly fractional) locations using interpolation—nearest neighbor, bilinear, bicubic—trading speed vs smoothness.
Affine
Parallel lines stay parallel. Translation, rotation, scale, shear. Represented by a 2×3 matrix with warpAffine.
Perspective
Converging lines (e.g. road edges) are allowed. A 3×3 homography maps planes; warpPerspective for document “deskew” or bird’s-eye views.
Resize and scaling
cv2.resize changes width and height. For downscaling, INTER_AREA often reduces aliasing by averaging; for upscaling or small shifts, INTER_LINEAR is a common default. INTER_CUBIC can look smoother but costs more.
import cv2
img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
half = cv2.resize(img, (w // 2, h // 2), interpolation=cv2.INTER_AREA)
double = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_LINEAR)
Translation and rotation
Translation shifts the image by (tx, ty) pixels. Rotation is usually done around a center (often the image center) with optional scale. OpenCV builds a 2×3 affine matrix and applies warpAffine.
import cv2
import numpy as np
img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
# Translation only: [[1,0,tx],[0,1,ty]]
M_t = np.float32([[1, 0, 50], [0, 1, 30]])
shifted = cv2.warpAffine(img, M_t, (w, h), borderMode=cv2.BORDER_CONSTANT, borderValue=(0,0,0))
cx, cy = w / 2, h / 2
angle, scale = 15, 1.0
M_r = cv2.getRotationMatrix2D((cx, cy), angle, scale)
rotated = cv2.warpAffine(img, M_r, (w, h), borderMode=cv2.BORDER_REPLICATE)
BORDER_CONSTANT fills unknown areas with a color; BORDER_REPLICATE extends edge pixels—pick what fits your pipeline.
Flip and crop
cv2.flip mirrors horizontally (1), vertically (0), or both (-1)—cheap data augmentation for classification. Crop is just NumPy slicing img[y0:y1, x0:x1]; combine with resize to standard input sizes for networks.
import cv2
img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
aug_hflip = cv2.flip(img, 1)
patch = img[100:300, 50:250]
Perspective warp (homography)
Given four source corners and four destination corners (e.g. a tilted document → rectangle), getPerspectiveTransform + warpPerspective rectifies the plane. Quality depends on accurate corner detection.
import cv2
import numpy as np
img = cv2.imread("doc.jpg")
h, w = img.shape[:2]
src = np.float32([[100, 200], [400, 180], [420, 600], [80, 620]])
dst = np.float32([[0, 0], [w - 1, 0], [w - 1, h - 1], [0, h - 1]])
M = cv2.getPerspectiveTransform(src, dst)
flat = cv2.warpPerspective(img, M, (w, h))
Practical notes
- Bounding boxes and keypoints must be transformed with the same geometry as the image—otherwise labels no longer align.
- Augmentation stacks (rotate + crop + flip) should respect label semantics (e.g. left/right asymmetry in traffic signs).
- For video, temporal consistency matters: random heavy warps every frame can hurt trackers.
Takeaways
- Use
resizewith INTER_AREA when shrinking. getRotationMatrix2D+warpAffinecovers most 2D rigid/similar transforms in-plane.warpPerspectivefor plane-to-plane rectification when perspective is strong.