Computer Vision Interview 20 essential Q&A Updated 2026
ResNet

ResNet: 20 Essential Q&A

Learning residuals F(x)+x so networks can go deep without degradation dominating training.

~12 min read 20 questions Advanced
residualskipbottleneckidentity
1 What is ResNet? ⚡ easy
Answer: Deep CNN where layers learn residual functions F(x) with skip connections adding input x—enables training very deep networks (50–1000+ layers).
2 What is the degradation problem? 🔥 hard
Answer: As depth increases, training error can get worse even without overfitting—not vanishing gradients alone; harder optimization for plain deep stacks.
3 Basic residual block? 📊 medium
Answer: y = F(x, {W_i}) + x where F is usually two 3×3 convs + BN + ReLU—output same spatial size as x for identity add.
4 Identity shortcut? ⚡ easy
Answer: Skip connection adds x directly when dimensions match—if channels/stride differ, use 1×1 conv projection on shortcut to match shape.
5 Why learn residual F? 🔥 hard
Answer: If optimal mapping is close to identity, easier to learn small perturbation F than full mapping; empirically eases optimization of deep nets.
6 When projection shortcut? 📊 medium
Answer: When block changes spatial size (stride 2) or channel count—1×1 conv on x with stride aligns dimensions for addition.
7 Bottleneck block? 📊 medium
Answer: 1×1 reduce channels → 3×3 spatial conv → 1×1 expand—cuts FLOPs for deep models (ResNet-50+).
8 Common depths? ⚡ easy
Answer: ResNet-18/34 use basic blocks; 50/101/152 use bottleneck—standard backbones for detection/segmentation.
9 Relation to vanishing gradients? 📊 medium
Answer: Shortcuts provide gradient highways—identity path carries gradients deeper; complements BN and good init.
10 BN ordering? 📊 medium
Answer: Original: conv → BN → ReLU inside F; post-activation variants exist (ResNet v2)—interview often accepts conv-BN-ReLU block.
11 Initialization? ⚡ easy
Answer: He init for conv layers suited to ReLU—standard with ResNet training recipes.
12 ResNeXt? 🔥 hard
Answer: Splits channels into cardinality groups inside block—trade width vs depth; improves accuracy with similar FLOPs.
13 ResNet in detection? 📊 medium
Answer: Common backbone in Faster R-CNN, RetinaNet with FPN—C4/C5 feature maps extracted for heads.
14 In segmentation? 📊 medium
Answer: Encoder backbone (e.g. ResNet-50) + decoder (U-Net style, ASPP)—pretrained ImageNet weights standard.
15 vs VGG? 📊 medium
Answer: ResNet achieves better accuracy with fewer FLOPs than very deep VGG due to bottlenecks and efficiency.
16 Training recipe? 📊 medium
Answer: SGD + momentum, step LR decay, weight decay, long epochs on ImageNet—augmentation similar to prior CNNs.
17 vs DenseNet? 🔥 hard
Answer: DenseNet concatenates all previous features—different memory/compute tradeoff; ResNet adds single skip per block.
18 Write the equation. ⚡ easy
Answer: Typically y = σ(F(x) + x) or ReLU after add depending on variant—core idea is additive skip.
19 Zero-init last BN? 🔥 hard
Answer: Some training refinements initialize last BN in residual branch to zero so block starts near identity—stabilizes early training.
20 Still used? ⚡ easy
Answer: Yes—strong baseline; ConvNeXt, ViT compete on benchmarks but ResNet remains default for robustness and tooling.

ResNet Cheat Sheet

Idea
  • y = F(x)+x
Deep
  • Bottleneck
  • 50/101/152
Shape
  • 1×1 proj skip

💡 Pro tip: Residuals fix optimization of depth, not just gradients.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.