Computer Vision Chapter 14

HOG descriptor

The histogram of oriented gradients (HOG) describes an image window by pooling local gradient directions into spatial cells, then normalizing over larger blocks to resist lighting change. It powered classic pedestrian detectors (Dalal–Triggs) before deep learning dominated detection. OpenCV bundles a pretrained linear SVM on a default people model—great for demos—and lets you compute raw HOG vectors for your own classifiers.

Geometry of the descriptor

Typical person settings use a 64×128 detection window (width × height in pixels). The window is tiled into cells (e.g. 8×8); each cell accumulates gradient energy into orientation bins (often 9 unsigned directions over 0°–180°). Blocks (e.g. 16×16) group 2×2 cells and are normalized (L2-Hys) so contrast changes affect cells within a block similarly. The final vector length depends on window, cell, block, and stride—OpenCV computes dimensions when you construct HOGDescriptor.

Why blocks overlap?

Overlapping blocks mean each cell contributes to multiple normalized histograms—smoother, more robust features at the cost of dimensionality.

vs deep features

HOG is fixed and interpretable; CNNs learn richer hierarchies but need data and compute. HOG + linear SVM remains a teaching baseline.

Default people detector

import cv2

img = cv2.imread("street.jpg")
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

rects, weights = hog.detectMultiScale(
    img,
    winStride=(8, 8),
    padding=(16, 16),
    scale=1.05,
    hitThreshold=0.0,
    finalThreshold=2.0,
)

for (x, y, w, h) in rects:
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

scale > 1 shrinks the image each pyramid level; smaller step → more scales, slower. finalThreshold groups overlapping hits (OpenCV 4 API—check your version for exact parameter names).

Tighter stride, stricter threshold

rects2, wt2 = hog.detectMultiScale(
    img, winStride=(4, 4), padding=(8, 8), scale=1.03, hitThreshold=0.3)

Custom HOGDescriptor and compute

When you train your own SVM (scikit-learn, etc.), extract HOG vectors with matching window and cell geometry.

import cv2

win = (64, 128)
block = (16, 16)
block_stride = (8, 8)
cell = (8, 8)
nbins = 9

hog = cv2.HOGDescriptor(win, block, block_stride, cell, nbins)
gray = cv2.imread("patch.png", cv2.IMREAD_GRAYSCALE)
# Resize crop to exactly win (width, height) before compute
crop = cv2.resize(gray, win)
vec = hog.compute(crop)
print(vec.shape)

Sliding-window mental model

detectMultiScale resizes the image, runs a dense scan at each scale, and applies the SVM score at each window position. False positives are common on cluttered scenes—non-maximum suppression (custom or from other libraries) and context (road geometry) help in production systems.

Takeaways

  • Default people model expects upright pedestrians near 64×128 aspect in the pyramid.
  • Tune winStride, scale, and hitThreshold for speed vs recall.
  • Use hog.compute on fixed crops when pairing HOG with your own classifier.

Quick FAQ

Raise hitThreshold, add NMS on overlapping boxes, or fuse with motion/depth. The bundled detector is a baseline, not a production tracker.

Classic HOG uses luminance gradients. OpenCV’s compute and people detector operate on single-channel images derived from your BGR input internally per implementation—convert explicitly to gray for reproducibility.