HOG Descriptor: 20 Essential Q&A

Question 1

1 What is HOG? ⚡ easy

Answer

Answer: Histogram of Oriented Gradients—counts gradient orientations in local cells, groups into blocks with contrast normalization—handcrafted descriptor for object detection (classic pedestrians).

Question 2

2 What is a cell? 📊 medium

Answer

Answer: Small spatial tile (e.g. 8×8 px) where unsigned gradient orientations vote into fixed orientation bins with magnitude weighting.

Question 3

3 How many orientation bins? ⚡ easy

Answer

Answer: Typically 9 unsigned bins over 0–180° (Dalal-Triggs); signed 0–360° variants exist.

Question 4

4 What is a block? 📊 medium

Answer

Answer: Group of cells (e.g. 2×2 cells) concatenated into one vector—provides local spatial structure before normalization.

Question 5

5 Why block normalization? 🔥 hard

Answer

Answer: Dividing by block L2 norm (with clipping L2-Hys) achieves illumination and shadow invariance—critical for outdoor pedestrians.

Question 6

6 Block stride? 📊 medium

Answer

Answer: Sliding window step for blocks—smaller stride = denser descriptor, larger feature vector, more overlap.

Question 7

7 Typical detection window? 📊 medium

Answer

Answer: 64×128 pedestrian crop in Dalal-Triggs paper—fixed aspect; scanning at multiple scales for multi-size objects.

Question 8

8 Classifier with HOG? ⚡ easy

Answer

Answer: Linear SVM on HOG feature vector—fast sliding-window scoring; extensions used kernels but linear was standard.

Question 9

9 Relation to CNNs? 🔥 hard

Answer

Answer: Early conv layers learn similar local edge/orientation filters; HOG is fixed handcrafted analog of shallow hierarchical gradients.

Question 10

10 Unsigned vs signed gradients? 📊 medium

Answer

Answer: Unsigned merges opposite contrast edges into same bin—more stable for object shape; signed preserves direction when needed.

Question 11

11 Use color? 📊 medium

Answer

Answer: Compute gradients per channel and take max or concatenate—RGB helps slightly over grayscale for some classes.

Question 12

12 Multi-scale detection? 📊 medium

Answer

Answer: Resize image pyramid; run sliding window at each scale—or use faster feature pyramids.

Question 13

13 NMS after scanning? ⚡ easy

Answer

Answer: Merge overlapping high-score windows—standard object detection post-process.

Question 14

14 Why gradients not raw pixels? 📊 medium

Answer

Answer: Gradients emphasize edges and shape while reducing sensitivity to absolute brightness.

Question 15

15 What is L2-Hys? 🔥 hard

Answer

Answer: L2 normalize block, clip max value per dimension, renormalize—reduces influence of very large gradients.

Question 16

16 Visualize HOG? ⚡ easy

Answer

Answer: Show dominant orientation per cell as lines—“HOG picture” for debugging.

Question 17

17 Speed tricks? 🔥 hard

Answer

Answer: Integral images for histograms, coarse stride, cascaded classifiers—needed for near real-time before deep detectors.

Question 18

18 Link to DPM? 🔥 hard

Answer

Answer: Deformable Part Models extend HOG with root + part filters and deformation costs—won PASCAL era before R-CNN.

Question 19

19 OpenCV? ⚡ easy

Answer

Answer: cv2.HOGDescriptor + setSVMDetector for default people detector; detectMultiScale.

Question 20

20 Limitations? 📊 medium

Answer

Answer: Rigid window, hand-tuned cells/blocks, weaker than CNNs on cluttered scenes and fine-grained classes—still useful baseline and interpretable.

Related Computer Vision Links

HOG Descriptor: 20 Essential Q&A

Quick Navigation

HOG Cheat Sheet

Compute

Normalize

Detect

Full tutorial track