Computer Vision Chapter 4

Image transformations

Geometric transforms move pixel coordinates: pan, zoom, rotate, mirror, or warp with affine and perspective maps. You choose output size, interpolation, and how to fill regions with no source pixels. Below: the ideas and compact OpenCV patterns.

What is a geometric transform?

Each output pixel asks “which input coordinate should I sample?” That mapping can be a simple scale and shift, a rotation around a point, or a more general homography (perspective). Implementation samples the source image at (possibly fractional) locations using interpolation—nearest neighbor, bilinear, bicubic—trading speed vs smoothness.

Affine

Parallel lines stay parallel. Translation, rotation, scale, shear. Represented by a 2×3 matrix with warpAffine.

Perspective

Converging lines (e.g. road edges) are allowed. A 3×3 homography maps planes; warpPerspective for document “deskew” or bird’s-eye views.

Resize and scaling

cv2.resize changes width and height. For downscaling, INTER_AREA often reduces aliasing by averaging; for upscaling or small shifts, INTER_LINEAR is a common default. INTER_CUBIC can look smoother but costs more.

import cv2

img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
half = cv2.resize(img, (w // 2, h // 2), interpolation=cv2.INTER_AREA)
double = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_LINEAR)

Translation and rotation

Translation shifts the image by (tx, ty) pixels. Rotation is usually done around a center (often the image center) with optional scale. OpenCV builds a 2×3 affine matrix and applies warpAffine.

import cv2
import numpy as np

img = cv2.imread("photo.jpg")
h, w = img.shape[:2]

# Translation only: [[1,0,tx],[0,1,ty]]
M_t = np.float32([[1, 0, 50], [0, 1, 30]])
shifted = cv2.warpAffine(img, M_t, (w, h), borderMode=cv2.BORDER_CONSTANT, borderValue=(0,0,0))

cx, cy = w / 2, h / 2
angle, scale = 15, 1.0
M_r = cv2.getRotationMatrix2D((cx, cy), angle, scale)
rotated = cv2.warpAffine(img, M_r, (w, h), borderMode=cv2.BORDER_REPLICATE)

BORDER_CONSTANT fills unknown areas with a color; BORDER_REPLICATE extends edge pixels—pick what fits your pipeline.

Flip and crop

cv2.flip mirrors horizontally (1), vertically (0), or both (-1)—cheap data augmentation for classification. Crop is just NumPy slicing img[y0:y1, x0:x1]; combine with resize to standard input sizes for networks.

import cv2

img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
aug_hflip = cv2.flip(img, 1)
patch = img[100:300, 50:250]

Perspective warp (homography)

Given four source corners and four destination corners (e.g. a tilted document → rectangle), getPerspectiveTransform + warpPerspective rectifies the plane. Quality depends on accurate corner detection.

import cv2
import numpy as np

img = cv2.imread("doc.jpg")
h, w = img.shape[:2]
src = np.float32([[100, 200], [400, 180], [420, 600], [80, 620]])
dst = np.float32([[0, 0], [w - 1, 0], [w - 1, h - 1], [0, h - 1]])
M = cv2.getPerspectiveTransform(src, dst)
flat = cv2.warpPerspective(img, M, (w, h))

Practical notes

  • Bounding boxes and keypoints must be transformed with the same geometry as the image—otherwise labels no longer align.
  • Augmentation stacks (rotate + crop + flip) should respect label semantics (e.g. left/right asymmetry in traffic signs).
  • For video, temporal consistency matters: random heavy warps every frame can hurt trackers.

Takeaways

  • Use resize with INTER_AREA when shrinking.
  • getRotationMatrix2D + warpAffine covers most 2D rigid/similar transforms in-plane.
  • warpPerspective for plane-to-plane rectification when perspective is strong.

Quick FAQ

The output canvas is rectangular; corners of the original image may fall outside after rotation. Those areas have no source pixels unless you enlarge the canvas or use a border mode.

Affine preserves parallelism; homography models perspective between two planes (e.g. camera tilt). Use homography when parallel lines in the world appear to meet in the image.