MobileNet: 20 Essential Q&A

Question 1

1 What is MobileNet? 📊 medium

Answer

Answer: Efficient CNN family for mobile/edge using depthwise separable convolutions to cut FLOPs and parameters vs standard convs.

Question 2

2 What is depthwise convolution? 📊 medium

Answer

Answer: Each input channel has its own spatial filter—no mixing across channels; drastically fewer params than full conv per output channel.

Question 3

3 What is pointwise convolution? 📊 medium

Answer

Answer: 1×1 conv after depthwise—mixes channels at each spatial location, like per-pixel linear layer across depth.

Question 4

4 Complexity vs standard conv? 🔥 hard

Answer

Answer: Roughly 1/C_out + 1/k² factor reduction vs k×k conv when comparing costs—huge savings for large kernels and channels.

Question 5

5 Width multiplier α? ⚡ easy

Answer

Answer: Uniformly thin every layer’s channels by α ∈ (0,1]—linear accuracy–latency tradeoff for deployment targets.

Question 6

6 Resolution multiplier? 📊 medium

Answer

Answer: Train/infer on smaller input resolution ρ—quadratic FLOP savings with predictable accuracy drop.

Question 7

7 MobileNetV2 inverted residual? 🔥 hard

Answer

Answer: Expand low-dim bottleneck → depthwise → project back—shortcut connects thin bottlenecks (memory efficient), opposite of classical residual wide→narrow.

Question 8

8 Why expansion t? 📊 medium

Answer

Answer: Depthwise needs rich features—expand channels by factor t before DW conv, then linear 1×1 compress to avoid ReLU killing info in low-dim subspace.

Question 9

9 ReLU6? 📊 medium

Answer

Answer: Clip ReLU at 6—originally claimed helpful for quantized deployment; still seen in some mobile architectures.

Question 10

10 MobileNet + SSD? 📊 medium

Answer

Answer: Lightweight object detectors attach SSD heads to MobileNet stages—real-time on phones with acceptable mAP on constrained devices.

Question 11

11 vs ShuffleNet? 🔥 hard

Answer

Answer: ShuffleNet uses channel shuffle after grouped convs—different structural trick; both target efficient inference.

Question 12

12 vs EfficientNet? 📊 medium

Answer

Answer: EfficientNet scales depth/width/resolution together (compound scaling)—often better Pareto frontier; MobileNet simpler family widely supported in runtimes.

Question 13

13 MobileNetV3? 🔥 hard

Answer

Answer: Uses NAS + NetAdapt for layer choices, h-swish activations in some layers, SE-like squeeze-excitation—improved accuracy per FLOP.

Question 14

14 Squeeze-and-excitation? 📊 medium

Answer

Answer: Global pool → small FC → channel gates—recalibrates channel importance; appears in MobileNetV3 and many efficient nets.

Question 15

15 Quantization? ⚡ easy

Answer

Answer: Depthwise-heavy nets often deployed as INT8—fewer MACs and memory; validate accuracy after PTQ/QAT.

Question 16

16 Strided depthwise? 📊 medium

Answer

Answer: Depthwise conv with stride 2 downsamples spatially—paired with pointwise for channel mix; replaces pooling in many blocks.

Question 17

17 Pointwise = ? ⚡ easy

Answer

Answer: Standard conv with 1×1 kernel—channel mixing only, no spatial context.

Question 18

18 Transfer to tasks? 📊 medium

Answer

Answer: ImageNet-pretrained MobileNet backbones fine-tune for classification, detection, segmentation with small heads—standard on edge.

Question 19

19 Accuracy ceiling? 📊 medium

Answer

Answer: Extreme width/resolution cuts hurt top-1 on hard datasets—need larger efficient families or distillation from big teacher.

Question 20

20 Deployment? ⚡ easy

Answer

Answer: Use vendor runtimes (CoreML, NNAPI, TensorRT) with fused DW+PW kernels; profile latency not just FLOPs.

Related Computer Vision Links

MobileNet: 20 Essential Q&A

Quick Navigation

MobileNet Cheat Sheet

Core

Scale

V2

Full tutorial track