Image Transformations: 20 Essential Q&A

Question 1

1 What is a geometric image transformation? ⚡ easy

Answer

Answer: A mapping that moves pixel locations—translation, rotation, scale, affine, or perspective—while optionally resampling intensities. It changes spatial layout but not the semantic label if the transform is label-consistent (e.g. bbox corners transformed too).

Question 2

2 Define translation of an image. ⚡ easy

Answer

Answer: Shifting all pixels by offsets (tx, ty). Implemented by moving the sampling grid or adjusting the transform matrix with identity + translation column. Boundaries may require padding or cropping.

Question 3

3 What is isotropic vs anisotropic scaling? ⚡ easy

Answer

Answer: Isotropic: same scale sx = sy preserves angles. Anisotropic: sx ≠ sy stretches content—can turn circles into ellipses. Know effect on aspect ratio for detection labels.

Question 4

4 How is rotation about the origin represented in 2D? 📊 medium

Answer

Answer: Linear part is matrix [[cos θ, -sin θ],[sin θ, cos θ]]. In practice pick a rotation center (image center) via translate-rotate-translate composition. Large rotations need bigger canvas or cropping.

Question 5

5 What does flipping do for ML? ⚡ easy

Answer

Answer: Horizontal flip is a common label-preserving augmentation for many object classes; vertical flip may break semantics (people, text, traffic scenes). Always validate against dataset semantics.

Question 6

6 Homogeneous coordinates for 2D transforms? 📊 medium

Answer

Answer: Represent point (x,y) as (x,y,1). Allows affine maps as 3×3 matrices acting on homogeneous vectors, unifying translation with linear maps for composition.

Question 7

7 What is an affine transformation? 📊 medium

Answer

Answer: Maps parallel lines to parallel lines: combination of linear transform and translation—rotation, scale, shear. Preserves ratios along lines but not necessarily lengths or angles unless constrained (similarity/euclidean).

Question 8

8 How many degrees of freedom does a 2D affine map have? 📊 medium

Answer

Answer: Six (4 in the 2×2 linear part + 2 translation). You need 3 point correspondences (non-degenerate) to estimate it in general.

Question 9

9 How does perspective differ from affine? 🔥 hard

Answer

Answer: Projective maps preserve collinearity but not parallelism—parallel world lines can converge in the image (vanishing points). Needed for planes viewed at an angle, document scanning, and bird’s-eye view from ground cameras.

Question 10

10 What is a homography? 🔥 hard

Answer

Answer: A 3×3 projective transform (up to scale) mapping one plane to another in pinhole imaging. Relates two views of the same planar surface. Estimated from 4 point correspondences (DLT) with constraints.

Question 11

11 Forward vs inverse warping? 📊 medium

Answer

Answer: Forward: map source→dest can leave holes and overlaps. Inverse: for each destination pixel, sample source via inverse map—avoids gaps and is standard in OpenCV warp* with a chosen interpolator.

Question 12

12 Why does warping need interpolation? 📊 medium

Answer

Answer: Mapped coordinates land between pixels. Nearest, bilinear, bicubic choose neighborhood weights—trade speed vs aliasing/blur. Downscaling may need prefiltering to avoid aliasing.

Question 13

13 Crop vs pad after transform? ⚡ easy

Answer

Answer: Rotation/scale can push content outside the original canvas—either expand canvas with padding (constant, reflect) or crop to a fixed size. Detection boxes must be clipped or transformed consistently.

Question 14

14 Augmentation: random affine on segmentation masks? 📊 medium

Answer

Answer: Apply the same spatial map to image and mask (nearest-neighbor interpolation for label masks to avoid fractional classes). For instance segmentation, warp polygons or rasterize after transform.

Question 15

15 What is image registration? 🔥 hard

Answer

Answer: Aligning two images of the same scene into a common coordinate frame—via feature matching + homography/affine, optical flow, or optimization. Used in medical imaging, panorama stitching, and super-resolution.

Question 16

16 What is a similarity transform? 📊 medium

Answer

Answer: Rotation + uniform scale + translation (4 DOF in 2D). Preserves angles and ratios of lengths—good model when perspective effects are weak.

Question 17

17 What is a rigid (Euclidean) transform? ⚡ easy

Answer

Answer: Rotation + translation only—preserves distances and angles (3 DOF in 2D). Models camera motion parallel to the plane or object pose without scale change.

Question 18

18 How do you compose transforms? 📊 medium

Answer

Answer: Multiply their homogeneous matrices in application order (rightmost often applied first to a column vector—be consistent with your library convention).

Question 19

19 OpenCV: warpAffine vs warpPerspective? ⚡ easy

Answer

Answer: warpAffine uses a 2×3 affine map; warpPerspective uses full 3×3 homography. Choose based on whether parallelism must be preserved (affine) or full perspective correction is needed.

Question 20

20 Are lens distortion and homography the same? 📊 medium

Answer

Answer: No—radial/tangential distortion is nonlinear and modeled separately (Brown-Conrady) before or jointly with pinhole projection. Undistort first, then apply homography for many planar AR/document pipelines.

Related Computer Vision Links

Image Transformations: 20 Essential Q&A

Quick Navigation

Transforms Cheat Sheet

Models

Warping

Uses

Full tutorial track