YOLO: 20 Essential Q&A

You Only Look Once—grids, anchors, and the push for real-time detection.

~12 min read 20 questions Advanced

one-stagegridanchorslatency

Quick Navigation

1 What does YOLO mean? 📊 medium

Answer: You Only Look Once: single forward pass predicts boxes and classes—treats detection as regression from a grid of cells.

2 YOLOv1 grid idea? 📊 medium

Answer: Image split into S×S cells; cell responsible for object whose center falls in it—predicts B boxes + class distribution per cell.

3 YOLOv1 loss components? 🔥 hard

Answer: Coordinate regression (with sqrt w,h trick), confidence (IoU weighted), classification CE—λ weights balance localization vs no-object cells.

4 When did anchors appear? 📊 medium

Answer: YOLOv2+ uses k-means anchor priors on dataset boxes—predict offsets instead of raw sizes for stability.

5 IoU in training? 📊 medium

Answer: Assign anchors/cells to GT by best IoU; some versions ignore preds below IoU threshold for classification to reduce conflict.

6 Post-processing? ⚡ easy

Answer: Like other detectors: NMS on decoded boxes with class-wise scores—some variants use DIoU-NMS or soft-NMS.

7 Objectness vs class? ⚡ easy

Answer: Objectness = is there an object in this anchor; class = which class—decoupled in many heads (obj * class prob = final score).

8 Multi-scale YOLO? 📊 medium

Answer: Later versions predict at multiple feature map scales (e.g. large/small stride) to catch objects of different sizes—similar spirit to FPN.

9 Path aggregation? 📊 medium

Answer: Models like YOLOv4 use PANet-style bottom-up path after top-down FPN for richer multi-scale features.

10 YOLOv5/v8 / Ultralytics? ⚡ easy

Answer: Popular PyTorch implementations with training zoo, export, and deployment tooling—interview “practical YOLO” often means this ecosystem.

11 Deploy on edge? 📊 medium

Answer: Export to ONNX, TensorRT, CoreML—quantize INT8 for speed; validate mAP drop after conversion.

12 Small objects? 📊 medium

Answer: Higher-res input, smaller stride heads, copy-paste aug, or tiling—same fundamentals as other detectors.

13 Crowded objects? 🔥 hard

Answer: Grid responsibility and NMS can struggle—improved assignment (e.g. ATSS-style ideas in some detectors) and better NMS help.

14 Common augmentations? 📊 medium

Answer: Mosaic, mixup, HSV jitter, random scale—strong aug standard in modern YOLO training recipes.

15 mAP vs FPS tradeoff? ⚡ easy

Answer: Larger model and image size ↑ mAP, ↓ FPS—choose for product SLA (latency vs accuracy).

16 YOLO vs SSD? 📊 medium

Answer: Both one-stage; SSD uses multi-scale default boxes on VGG features; YOLO family evolved different heads and assignment—both real-time capable.

17 YOLO vs RetinaNet? 📊 medium

Answer: RetinaNet introduced focal loss for dense classification imbalance; YOLO uses different obj loss weighting—both dense predictors.

18 Tiling satellite / huge images? 📊 medium

Answer: Split image, run YOLO per tile with overlap, merge + NMS—handle boundary duplicates.

19 Rotated boxes? 🔥 hard

Answer: Variants predict angle θ or use rotated IoU—needed for aerial/text detection.

20 Real-time on CPU? 📊 medium

Answer: Choose nano/tiny backbones, reduce input size, INT8—expect large accuracy gap vs GPU server models.

YOLO Cheat Sheet

Idea

Single forward
Dense preds

Train

Anchors (v2+)
Multi-scale heads

Ship

NMS
TensorRT / ONNX

💡 Pro tip: One-stage = dense predictions + clever assignment + NMS.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.

YOLO Tutorial

Previous Next

Related Computer Vision Links