RetinaNet: 20 Essential Q&A

Question 1

1 What is RetinaNet? 📊 medium

Answer

Answer: One-stage detector with FPN backbone and focal loss on dense classification—closes accuracy gap to two-stage without proposals.

Question 2

2 Focal loss intuition? 🔥 hard

Answer

Answer: Down-weights easy negatives (well-classified background) so training focuses on hard examples—prevents huge CE loss from overwhelming gradients.

Question 3

3 Role of γ (gamma)? 🔥 hard

Answer

Answer: Focusing parameter: (1 − p_t)^γ reduces loss for high-confidence correct preds; γ=0 is CE; typical γ=2.

Question 4

4 Why imbalance in one-stage? 📊 medium

Answer

Answer: ~100k anchors per image with few positives—vanilla CE is dominated by easy background classifications.

Question 5

5 How does FPN help RetinaNet? 📊 medium

Answer

Answer: Predicts at multiple pyramid levels P3–P7 with shared heads—each level responsible for objects in a scale range.

Question 6

6 Subnet design? 📊 medium

Answer

Answer: Separate small conv classification and box regression subnets applied per level—4 conv layers each in original paper.

Question 7

7 Anchors? ⚡ easy

Answer

Answer: Similar to RPN: multiple scales/ratios per location; classification predicts class (sigmoid per class) and reg head predicts deltas.

Question 8

8 Box regression loss? ⚡ easy

Answer

Answer: Smooth L1 on positive anchors only—standard in Faster R-CNN lineage.

Question 9

9 vs SSD? 📊 medium

Answer

Answer: Both multi-scale one-stage; RetinaNet’s focal loss specifically addresses training imbalance SSD tackled partly with hard negative mining.

Question 10

10 vs two-stage? 📊 medium

Answer

Answer: No separate proposal stage—simpler pipeline; historically competitive mAP on COCO with proper FPN + focal loss.

Question 11

11 Training tips? 📊 medium

Answer

Answer: Longer schedules help; careful anchor matching; synchronized BN on multi-GPU for large batch stability.

Question 12

12 Inference cost? ⚡ easy

Answer

Answer: Single backbone forward + per-level heads + NMS—faster than two-stage but still heavier than tiny YOLO variants.

Question 13

13 Anchor-free successors? 🔥 hard

Answer

Answer: FCOS, CenterNet, DETR reduce anchor design—focal loss ideas still influence classification in some heads.

Question 14

14 Why sigmoid per class? 📊 medium

Answer

Answer: Enables multi-label rare cases and simplifies K independent binary classifiers vs softmax mutual exclusivity.

Question 15

15 Unified loss? ⚡ easy

Answer

Answer: Sum of focal classification + smooth L1 regression over all locations (masked to assigned anchors).

Question 16

16 Variants of focal loss? 🔥 hard

Answer

Answer: Quality focal loss, balanced loss, GHM—adjust weighting scheme for hard/easy examples differently.

Question 17

17 IoU-aware classification? 🔥 hard

Answer

Answer: Some heads predict joint IoU quality with class to better rank detections—post-RetinaNet refinement.

Question 18

18 Historical COCO note? ⚡ easy

Answer

Answer: RetinaNet showed one-stage could match two-stage mAP around 2017—important milestone before transformer detectors.

Question 19

19 Limitations? 📊 medium

Answer

Answer: Many hyperparameters (α, γ, anchor design); dense preds still need NMS; superseded in some tracks by newer architectures.

Question 20

20 When reuse focal loss? ⚡ easy

Answer

Answer: Any extreme class imbalance in dense prediction—segmentation, keypoint heatmaps, or custom detectors.

Related Computer Vision Links

RetinaNet: 20 Essential Q&A

Quick Navigation

RetinaNet Cheat Sheet

Loss

Backbone

Type

Full tutorial track