Computer Vision Chapter 12

SIFT

Scale-Invariant Feature Transform (SIFT) finds keypoints across a pyramid of blurred images (difference-of-Gaussian extrema), assigns scale and orientation, and encodes each patch as a 128-dimensional floating-point descriptor. It is strong under moderate scale, rotation, and illumination change—ideal for wide-baseline matching and structure-from-motion. Modern OpenCV includes SIFT in the main module on current releases; upgrade if AttributeError appears.

Pipeline in brief

  1. Build a scale space with Gaussian blur at multiple scales per octave.
  2. Take Difference of Gaussians (DoG); find 3D extrema (x, y, scale).
  3. Refine location, discard low-contrast and edge-like points.
  4. Assign dominant orientation from gradient histograms.
  5. Sample a canonical 16×16 neighborhood into orientation histograms → 128 floats per keypoint.

Descriptor distance

Use Euclidean (L2) or L1; BFMatcher with NORM_L2 is the usual baseline.

When to prefer SIFT

Texture-rich scenes, moderate viewpoint change, when ORB struggles with repeatability.

SIFT_create and detectAndCompute

import cv2

gray = cv2.imread("building.jpg", cv2.IMREAD_GRAYSCALE)
sift = cv2.SIFT_create(nfeatures=500, nOctaveLayers=3, contrastThreshold=0.04,
                       edgeThreshold=10, sigma=1.6)
kp, des = sift.detectAndCompute(gray, None)

print(len(kp), None if des is None else des.shape)

contrastThreshold ↑ → fewer weak keypoints. edgeThreshold ↑ → more points along elongated structures.

Detect only, then compute

kp = sift.detect(gray, None)
kp, des = sift.compute(gray, kp)

Brute-force L2 matching + ratio test

import cv2

im1 = cv2.imread("a.jpg", cv2.IMREAD_GRAYSCALE)
im2 = cv2.imread("b.jpg", cv2.IMREAD_GRAYSCALE)
sift = cv2.SIFT_create()
k1, d1 = sift.detectAndCompute(im1, None)
k2, d2 = sift.detectAndCompute(im2, None)

bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=False)
pairs = bf.knnMatch(d1, d2, k=2)

good = []
for pair in pairs:
    if len(pair) < 2:
        continue
    m, n = pair
    if m.distance < 0.7 * n.distance:
        good.append(m)

vis = cv2.drawMatches(im1, k1, im2, k2, good[:80], None, flags=2)

FLANN for larger descriptor sets

For thousands of keypoints, FLANN can be faster than exhaustive BF matching. Use KD-tree or k-means index parameters tuned to float descriptors.

import cv2
import numpy as np

FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)

d1f = d1.astype(np.float32)
d2f = d2.astype(np.float32)
pairs = flann.knnMatch(d1f, d2f, k=2)

Environment notes

If cv2.SIFT_create is missing, install a recent opencv-python (SIFT returned to the main build after the patent expired in many jurisdictions). Some older wheels required opencv-contrib-python. Always check your OpenCV version with print(cv2.__version__).

Takeaways

  • SIFT descriptors are float vectors—match with NORM_L2 (or L1).
  • Use kNN + ratio or geometry (findHomography + RANSAC) to drop outliers.
  • Heavier than ORB; use FLANN when matching large batches.

Quick FAQ

ORB is faster and uses compact binary descriptors; SIFT is often stronger on difficult pairs but costs more CPU and memory. Profile on target hardware.

OpenCV’s SIFT descriptors are already normalized to unit length in typical builds—distance metrics assume that. If you modify vectors, re-normalize before L2 matching.