SIFT: 20 Essential Q&A

Difference-of-Gaussians, keypoint refinement, and why SIFT dominated matching for years.

~12 min read 20 questions Advanced

DoGoctaves128-Dratio test

Quick Navigation

1 What is SIFT? ⚡ easy

Answer: Scale-Invariant Feature Transform—detects blob-like keypoints in scale-space and builds a 128-D gradient-orientation histogram descriptor; robust to scale, rotation, moderate viewpoint/lighting.

2 What is Difference of Gaussians (DoG)? 📊 medium

Answer: DoG = G(σ1)−G(σ2) approximates scale-normalized LoG—cheap way to find blob-like structures across scales.

3 What is an octave? 📊 medium

Answer: Series of images downsampled by 2 with several σ levels per octave—covers large scale range efficiently.

4 How are keypoints detected? 🔥 hard

Answer: 3×3×3 neighborhood search for scale-space extrema (max/min) in DoG volume—candidate keypoints.

5 Refinement and edge rejection? 🔥 hard

Answer: Taylor expansion fit for subpixel location and scale; reject low contrast; use Hessian of DoG to reject edge-like unstable peaks (ratio of principal curvatures).

6 Orientation histogram? 📊 medium

Answer: Weighted gradient orientations in neighborhood; peak(s) define canonical rotation—descriptor becomes rotation invariant.

7 How is the descriptor built? 📊 medium

Answer: 16×16 window into 4×4 cells; each cell has 8-bin orientation histogram of gradients; 4×4×8 = 128 values, normalized.

8 Why 4×4 grid? ⚡ easy

Answer: Balances spatial layout (localization) vs distinctiveness; finer grid more sensitive to deformation.

9 Why normalize twice? 📊 medium

Answer: L2 normalize, clip large values to reduce illumination dominance, renormalize—improves robustness to affine lighting.

10 What is RootSIFT? 📊 medium

Answer: Apply square root to L1-normalized SIFT then L2 normalize—uses Hellinger kernel implicitly; often improves retrieval.

11 SIFT invariances? 📊 medium

Answer: Scale + rotation; approximate affine with dominant orientation; not fully viewpoint invariant for strong 3D perspective.

12 SIFT vs ORB speed? ⚡ easy

Answer: SIFT heavier (float descriptor, pyramid DoG); ORB binary + FAST—ORB much faster on embedded/CPU.

13 SIFT patents? ⚡ easy

Answer: Were encumbered in US until expired (~2020); OpenCV contrib had nonfree flag—now widely usable.

14 Typical matching? 📊 medium

Answer: L2 or cosine on float vectors; ratio test + RANSAC for geometry.

15 Contrast threshold? ⚡ easy

Answer: Filters weak DoG extrema—reduces unstable keypoints on flat noise.

16 Why DoG approximates LoG? 📊 medium

Answer: Mathematical identity: DoG with σ ratio ~√2 approximates σ²∇²G up to scale—cheap blob detector.

17 Color SIFT? 🔥 hard

Answer: Compute SIFT on color channels or opponent color spaces for extra discriminability—more dimensions or fused descriptors.

18 PCA-SIFT? 🔥 hard

Answer: Project gradient patch to lower-dim PCA basis—smaller descriptor; less common now than vanilla SIFT or learned features.

19 OpenCV? ⚡ easy

Answer: SIFT_create() in cv2 (main module after patent expiry); returns keypoints + descriptors.

20 Limitations? 📊 medium

Answer: Computation cost, repetitive texture ambiguities, limited with strong motion blur or specular highlights—deep features may win with data.

SIFT Cheat Sheet

Detect

DoG extrema
Subpixel + reject

Describe

4×4 × 8 orient
Normalize ×2

Match

L2 + ratio
RANSAC

💡 Pro tip: DoG finds scale; orientation hist fixes rotation; 128-D is spatial pooling of gradients.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.

SIFT Tutorial

Previous Next

Related Computer Vision Links