Computer Vision Interview 20 essential Q&A Updated 2026
MobileNet

MobileNet: 20 Essential Q&A

Depthwise + pointwise convolutions—building accurate vision models under tight FLOP and latency budgets.

~11 min read 20 questions Advanced
depthwisepointwiseα widthinverted
1 What is MobileNet? 📊 medium
Answer: Efficient CNN family for mobile/edge using depthwise separable convolutions to cut FLOPs and parameters vs standard convs.
2 What is depthwise convolution? 📊 medium
Answer: Each input channel has its own spatial filter—no mixing across channels; drastically fewer params than full conv per output channel.
3 What is pointwise convolution? 📊 medium
Answer: 1×1 conv after depthwise—mixes channels at each spatial location, like per-pixel linear layer across depth.
# Depthwise: groups=in_channels; Pointwise: 1x1 conv
4 Complexity vs standard conv? 🔥 hard
Answer: Roughly 1/C_out + 1/k² factor reduction vs k×k conv when comparing costs—huge savings for large kernels and channels.
5 Width multiplier α? ⚡ easy
Answer: Uniformly thin every layer’s channels by α ∈ (0,1]—linear accuracy–latency tradeoff for deployment targets.
6 Resolution multiplier? 📊 medium
Answer: Train/infer on smaller input resolution ρ—quadratic FLOP savings with predictable accuracy drop.
7 MobileNetV2 inverted residual? 🔥 hard
Answer: Expand low-dim bottleneck → depthwise → project back—shortcut connects thin bottlenecks (memory efficient), opposite of classical residual wide→narrow.
8 Why expansion t? 📊 medium
Answer: Depthwise needs rich features—expand channels by factor t before DW conv, then linear 1×1 compress to avoid ReLU killing info in low-dim subspace.
9 ReLU6? 📊 medium
Answer: Clip ReLU at 6—originally claimed helpful for quantized deployment; still seen in some mobile architectures.
10 MobileNet + SSD? 📊 medium
Answer: Lightweight object detectors attach SSD heads to MobileNet stages—real-time on phones with acceptable mAP on constrained devices.
11 vs ShuffleNet? 🔥 hard
Answer: ShuffleNet uses channel shuffle after grouped convs—different structural trick; both target efficient inference.
12 vs EfficientNet? 📊 medium
Answer: EfficientNet scales depth/width/resolution together (compound scaling)—often better Pareto frontier; MobileNet simpler family widely supported in runtimes.
13 MobileNetV3? 🔥 hard
Answer: Uses NAS + NetAdapt for layer choices, h-swish activations in some layers, SE-like squeeze-excitation—improved accuracy per FLOP.
14 Squeeze-and-excitation? 📊 medium
Answer: Global pool → small FC → channel gates—recalibrates channel importance; appears in MobileNetV3 and many efficient nets.
15 Quantization? ⚡ easy
Answer: Depthwise-heavy nets often deployed as INT8—fewer MACs and memory; validate accuracy after PTQ/QAT.
16 Strided depthwise? 📊 medium
Answer: Depthwise conv with stride 2 downsamples spatially—paired with pointwise for channel mix; replaces pooling in many blocks.
17 Pointwise = ? ⚡ easy
Answer: Standard conv with 1×1 kernel—channel mixing only, no spatial context.
18 Transfer to tasks? 📊 medium
Answer: ImageNet-pretrained MobileNet backbones fine-tune for classification, detection, segmentation with small heads—standard on edge.
19 Accuracy ceiling? 📊 medium
Answer: Extreme width/resolution cuts hurt top-1 on hard datasets—need larger efficient families or distillation from big teacher.
20 Deployment? ⚡ easy
Answer: Use vendor runtimes (CoreML, NNAPI, TensorRT) with fused DW+PW kernels; profile latency not just FLOPs.

MobileNet Cheat Sheet

Core
  • DW + PW
Scale
  • α width
  • ρ resolution
V2
  • Inverted residual
  • Linear bottleneck

💡 Pro tip: Depthwise per channel, pointwise mixes channels—know the cost win.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.