HOG descriptor: CV guide

Geometry of the descriptor

Typical person settings use a 64×128 detection window (width × height in pixels). The window is tiled into cells (e.g. 8×8); each cell accumulates gradient energy into orientation bins (often 9 unsigned directions over 0°–180°). Blocks (e.g. 16×16) group 2×2 cells and are normalized (L2-Hys) so contrast changes affect cells within a block similarly. The final vector length depends on window, cell, block, and stride—OpenCV computes dimensions when you construct HOGDescriptor.

Why blocks overlap?

Overlapping blocks mean each cell contributes to multiple normalized histograms—smoother, more robust features at the cost of dimensionality.

vs deep features

HOG is fixed and interpretable; CNNs learn richer hierarchies but need data and compute. HOG + linear SVM remains a teaching baseline.

Default people detector

import cv2

img = cv2.imread("street.jpg")
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

rects, weights = hog.detectMultiScale(
    img,
    winStride=(8, 8),
    padding=(16, 16),
    scale=1.05,
    hitThreshold=0.0,
    finalThreshold=2.0,
)

for (x, y, w, h) in rects:
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

scale > 1 shrinks the image each pyramid level; smaller step → more scales, slower. finalThreshold groups overlapping hits (OpenCV 4 API—check your version for exact parameter names).

Tighter stride, stricter threshold

rects2, wt2 = hog.detectMultiScale(
    img, winStride=(4, 4), padding=(8, 8), scale=1.03, hitThreshold=0.3)

Custom `HOGDescriptor` and `compute`

When you train your own SVM (scikit-learn, etc.), extract HOG vectors with matching window and cell geometry.

import cv2

win = (64, 128)
block = (16, 16)
block_stride = (8, 8)
cell = (8, 8)
nbins = 9

hog = cv2.HOGDescriptor(win, block, block_stride, cell, nbins)
gray = cv2.imread("patch.png", cv2.IMREAD_GRAYSCALE)
# Resize crop to exactly win (width, height) before compute
crop = cv2.resize(gray, win)
vec = hog.compute(crop)
print(vec.shape)

Sliding-window mental model

detectMultiScale resizes the image, runs a dense scan at each scale, and applies the SVM score at each window position. False positives are common on cluttered scenes—non-maximum suppression (custom or from other libraries) and context (road geometry) help in production systems.

                    Takeaways
                    Default people model expects upright pedestrians near 64×128 aspect in the pyramid.
Tune winStride, scale, and hitThreshold for speed vs recall.
Use hog.compute on fixed crops when pairing HOG with your own classifier.

                

Quick FAQ

Raise hitThreshold, add NMS on overlapping boxes, or fuse with motion/depth. The bundled detector is a baseline, not a production tracker.

Classic HOG uses luminance gradients. OpenCV’s compute and people detector operate on single-channel images derived from your BGR input internally per implementation—convert explicitly to gray for reproducibility.

Related Computer Vision Links

Geometry of the descriptor

Why blocks overlap?

vs deep features

Default people detector

Tighter stride, stricter threshold

Custom HOGDescriptor and compute

Sliding-window mental model

Takeaways

Quick FAQ

Many false positives?

Color vs grayscale?

Custom `HOGDescriptor` and `compute`