Convolutional Neural Networks â€” 15 Interview Questions

Local receptive fields, parameter sharing, 1Ã—1 convs, depthwise separable convs, and why CNNs beat flat MLPs on images.

Colored left borders per card; green / amber / red difficulty chips.

Conv Pool RF Channels

1 What is convolution in a CNN?Easy

Answer: Slide a small learned filter over the input (with optional padding/stride), computing dot products at each positionâ€”local connectivity + weight sharing across space.

2 Stride and output spatial size (1D intuition).Medium

Answer: Stride s subsamples outputs. Common formula (1D): out = floor((n + 2p âˆ’ k)/s) + 1 for input length n, padding p, kernel kâ€”layout conventions (NCHW) same idea in 2D per axis.

out = âŒŠ(n + 2p âˆ’ k) / sâŒ‹ + 1

3 Parameter count for conv layer.Medium

Answer: Per filter: k_h Ã— k_w Ã— C_in weights + bias (if used). With C_out filters: multiply by C_out. Far fewer than fully connecting all pixels to all hidden units.

4 What does a 1Ã—1 convolution do?Medium

Answer: Mixes channels at each spatial location without blending neighborsâ€”changes depth (bottleneck/expansion), adds nonlinearity stacks cheaply (Inception, ResNet bottlenecks).

5 Max pooling vs average pooling.Easy

Answer: Max: strongest local activationâ€”sharp features. Avg: smoother downsamplingâ€”used in some network heads (e.g. Global Average Pooling). Both reduce spatial size.

6 Global average pooling (GAP).Medium

Answer: Average each channel over full HÃ—W â†’ vector length Câ€”replaces large FC layers, reduces parameters and overfitting (ResNet-style classifiers).

7 Transposed convolution (deconv)â€”one line.Hard

Answer: Learned upsampling with learnable kernelâ€”used in segmentation/decoders; can create checkerboard artifacts if not careful.

8 Dilated (atrous) convolutionâ€”why?Hard

Answer: Spaces kernel taps with holes to increase receptive field without more parameters or losing resolutionâ€”common in segmentation (DeepLab).

9 Depthwise separable convolution.Medium

Answer: Depthwise conv per channel + pointwise 1Ã—1 to mix channelsâ€”much fewer FLOPs than standard conv; MobileNet family.

10 Translation equivarianceâ€”what does CNN get?Medium

Answer: Shift input â†’ feature maps shift correspondingly (before pooling). Equivariant to translation; pooling adds approximate invariance locally.

11 CNN vs MLP on imagesâ€”interview answer.Easy

Answer: CNN exploits locality and sharingâ€”fewer parameters, better sample efficiency; MLP ignores spatial structure and scales poorly with resolution.

12 Receptive fieldâ€”define.Easy

Answer: Region of input pixels that can affect one output activationâ€”grows with depth, kernel size, dilation; shrinks effective field with stride/pooling.

13 What do deeper channels often represent?Medium

Answer: Hierarchical features: edges/textures low â†’ parts â†’ object-level abstractions (interpretation; not guaranteed per filter).

14 Data augmentation for CNNsâ€”examples.Easy

Answer: Random crop/flip, color jitter, Cutout/RandAugmentâ€”reduces overfitting by simulating label-preserving transforms.

15 Name one classic and one modern CNN family.Easy

Answer: Classic: VGG / ResNet. Modern: EfficientNet, ConvNeXt, or ViT hybridâ€”shows you know field moved beyond plain conv stacks.

Be ready to sketch a conv â†’ BN â†’ ReLU â†’ pool block.

Quick review checklist

Conv, stride, padding; params per layer; 1Ã—1 and depthwise sep.
Pooling vs strided conv; GAP; receptive field.
Equivariance; CNN vs MLP; augmentation.

Previous: Vanishing gradients Next: RNN

Related Neural Networks Links

Convolutional Neural Networks â€” 15 Interview Questions

Quick review checklist