Geometry of the descriptor
Typical person settings use a 64×128 detection window (width × height in pixels). The window is tiled into cells (e.g. 8×8); each cell accumulates gradient energy into orientation bins (often 9 unsigned directions over 0°–180°). Blocks (e.g. 16×16) group 2×2 cells and are normalized (L2-Hys) so contrast changes affect cells within a block similarly. The final vector length depends on window, cell, block, and stride—OpenCV computes dimensions when you construct HOGDescriptor.
Why blocks overlap?
Overlapping blocks mean each cell contributes to multiple normalized histograms—smoother, more robust features at the cost of dimensionality.
vs deep features
HOG is fixed and interpretable; CNNs learn richer hierarchies but need data and compute. HOG + linear SVM remains a teaching baseline.
Default people detector
import cv2
img = cv2.imread("street.jpg")
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
rects, weights = hog.detectMultiScale(
img,
winStride=(8, 8),
padding=(16, 16),
scale=1.05,
hitThreshold=0.0,
finalThreshold=2.0,
)
for (x, y, w, h) in rects:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
scale > 1 shrinks the image each pyramid level; smaller step → more scales, slower. finalThreshold groups overlapping hits (OpenCV 4 API—check your version for exact parameter names).
Tighter stride, stricter threshold
rects2, wt2 = hog.detectMultiScale(
img, winStride=(4, 4), padding=(8, 8), scale=1.03, hitThreshold=0.3)
Custom HOGDescriptor and compute
When you train your own SVM (scikit-learn, etc.), extract HOG vectors with matching window and cell geometry.
import cv2
win = (64, 128)
block = (16, 16)
block_stride = (8, 8)
cell = (8, 8)
nbins = 9
hog = cv2.HOGDescriptor(win, block, block_stride, cell, nbins)
gray = cv2.imread("patch.png", cv2.IMREAD_GRAYSCALE)
# Resize crop to exactly win (width, height) before compute
crop = cv2.resize(gray, win)
vec = hog.compute(crop)
print(vec.shape)
Sliding-window mental model
detectMultiScale resizes the image, runs a dense scan at each scale, and applies the SVM score at each window position. False positives are common on cluttered scenes—non-maximum suppression (custom or from other libraries) and context (road geometry) help in production systems.
Takeaways
- Default people model expects upright pedestrians near
64×128aspect in the pyramid. - Tune
winStride,scale, andhitThresholdfor speed vs recall. - Use
hog.computeon fixed crops when pairing HOG with your own classifier.
Quick FAQ
hitThreshold, add NMS on overlapping boxes, or fuse with motion/depth. The bundled detector is a baseline, not a production tracker.compute and people detector operate on single-channel images derived from your BGR input internally per implementation—convert explicitly to gray for reproducibility.