ResNet: 20 Essential Q&A

Learning residuals F(x)+x so networks can go deep without degradation dominating training.

~12 min read 20 questions Advanced

residualskipbottleneckidentity

Quick Navigation

1 What is ResNet? ⚡ easy

Answer: Deep CNN where layers learn residual functions F(x) with skip connections adding input x—enables training very deep networks (50–1000+ layers).

2 What is the degradation problem? 🔥 hard

Answer: As depth increases, training error can get worse even without overfitting—not vanishing gradients alone; harder optimization for plain deep stacks.

3 Basic residual block? 📊 medium

Answer: y = F(x, {W_i}) + x where F is usually two 3×3 convs + BN + ReLU—output same spatial size as x for identity add.

4 Identity shortcut? ⚡ easy

Answer: Skip connection adds x directly when dimensions match—if channels/stride differ, use 1×1 conv projection on shortcut to match shape.

5 Why learn residual F? 🔥 hard

Answer: If optimal mapping is close to identity, easier to learn small perturbation F than full mapping; empirically eases optimization of deep nets.

6 When projection shortcut? 📊 medium

Answer: When block changes spatial size (stride 2) or channel count—1×1 conv on x with stride aligns dimensions for addition.

7 Bottleneck block? 📊 medium

Answer: 1×1 reduce channels → 3×3 spatial conv → 1×1 expand—cuts FLOPs for deep models (ResNet-50+).

8 Common depths? ⚡ easy

Answer: ResNet-18/34 use basic blocks; 50/101/152 use bottleneck—standard backbones for detection/segmentation.

9 Relation to vanishing gradients? 📊 medium

Answer: Shortcuts provide gradient highways—identity path carries gradients deeper; complements BN and good init.

10 BN ordering? 📊 medium

Answer: Original: conv → BN → ReLU inside F; post-activation variants exist (ResNet v2)—interview often accepts conv-BN-ReLU block.

11 Initialization? ⚡ easy

Answer: He init for conv layers suited to ReLU—standard with ResNet training recipes.

12 ResNeXt? 🔥 hard

Answer: Splits channels into cardinality groups inside block—trade width vs depth; improves accuracy with similar FLOPs.

13 ResNet in detection? 📊 medium

Answer: Common backbone in Faster R-CNN, RetinaNet with FPN—C4/C5 feature maps extracted for heads.

14 In segmentation? 📊 medium

Answer: Encoder backbone (e.g. ResNet-50) + decoder (U-Net style, ASPP)—pretrained ImageNet weights standard.

15 vs VGG? 📊 medium

Answer: ResNet achieves better accuracy with fewer FLOPs than very deep VGG due to bottlenecks and efficiency.

16 Training recipe? 📊 medium

Answer: SGD + momentum, step LR decay, weight decay, long epochs on ImageNet—augmentation similar to prior CNNs.

17 vs DenseNet? 🔥 hard

Answer: DenseNet concatenates all previous features—different memory/compute tradeoff; ResNet adds single skip per block.

18 Write the equation. ⚡ easy

Answer: Typically y = σ(F(x) + x) or ReLU after add depending on variant—core idea is additive skip.

19 Zero-init last BN? 🔥 hard

Answer: Some training refinements initialize last BN in residual branch to zero so block starts near identity—stabilizes early training.

20 Still used? ⚡ easy

Answer: Yes—strong baseline; ConvNeXt, ViT compete on benchmarks but ResNet remains default for robustness and tooling.

ResNet Cheat Sheet

Idea

y = F(x)+x

Deep

Bottleneck
50/101/152

Shape

1×1 proj skip

💡 Pro tip: Residuals fix optimization of depth, not just gradients.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.

ResNet Tutorial

Previous Next

Related Computer Vision Links