Network Design & Depth â€” 15 Interview Questions

How deep vs how wide, bottlenecks, skip connections, receptive fields, and matching architecture to dataâ€”without hand-wavy â€œbigger is better.â€

Colored left borders per card; green / amber / red difficulty chips.

Depth Width Bottleneck Inductive bias

1 What does â€œnetwork designâ€ mean in an interview?Easy

Answer: Choosing depth, width, connectivity patterns (residual, dense), input/output heads, and regularization hooks so the model has enough capacity but fits data and compute.

2 Depth vs widthâ€”trade-offs.Medium

Answer: Depth composes features hierarchically; can improve sample efficiency for structured tasks. Width increases representational power per layer. Very deep nets need care (residuals, normalization); very wide nets cost parameters and memory.

3 What is model capacity?Easy

Answer: Roughly the family of functions the architecture can represent (VC-style intuition or parameter count as proxy). High capacity can overfit small data; too low underfits.

4 What is a bottleneck layer?Medium

Answer: A layer with fewer units than neighbors, forcing compression of the representationâ€”used in autoencoders, Inception modules, some efficient conv blocks (1Ã—1 convs).

5 Why do skip (residual) connections help very deep nets?Medium

Answer: They provide gradient highways and make it easier to learn near-identity refinements (â€œresidual mappingâ€). Mitigates degradation and vanishing signal in deep stacks.

6 What is inductive bias?Medium

Answer: Prior assumptions baked into the architectureâ€”e.g. CNNs assume locality and translation structure; RNNs assume sequential dependence. Good bias improves data efficiency.

7 Receptive fieldâ€”why does it matter for CNN design?Medium

Answer: The region of input affecting one output neuron. Must grow large enough to capture context (objects, text n-grams in 1D CNNs)â€”deeper stacks, dilated convs, or pooling increase effective RF.

8 Parameters vs FLOPsâ€”both needed?Easy

Answer: Parameters drive memory and overfitting risk; FLOPs drive latency and training cost. A layer can be compute-heavy but parameter-light (depthwise separable convs) or the opposite.

9 Signs your network is too small (underfitting).Easy

Answer: Training loss stays high; both train and validation error poor. Fix: more layers/units, better features, or longer training if optimization was the issue.

10 Signs your network is too large (overfitting).Easy

Answer: Training loss low but validation much worse. Fix: regularization, data, smaller model, early stoppingâ€”not always â€œmore parameters.â€

11 â€œScaling lawsâ€ in one interview sentence.Hard

Answer: Empirically, loss often improves predictably with more parameters, data, and compute along Pareto frontsâ€”guides large-model training but doesnâ€™t replace task-specific design.

12 Multi-branch architectures (e.g. Inception idea).Hard

Answer: Parallel paths with different kernel sizes or operations capture multi-scale features; concatenation or addition fuses themâ€”richer than a single tower at cost of complexity.

13 How does input resolution affect design?Medium

Answer: Higher resolution increases spatial tokens and compute (often quadratically for attention, linearly depth-wise for conv). May need deeper nets or downsampling early to control cost.

14 When start from a pretrained architecture?Medium

Answer: Small data or similar domain to pretrainingâ€”reuse backbone, replace classifier head. Random init better when data is huge or domain mismatch is extreme (with caveats).

15 Practical order for picking depth and width.Medium

Answer: Start from a known baseline (ResNet-18, small Transformer), match parameter budget to GPU and dataset size, measure train vs val curves, then adjust depth/width/regularizationâ€”not guess huge first.

Mention train/val gap and compute budgetâ€”signals you design empirically, not only from theory.

Quick review checklist

Depth vs width; capacity; bottlenecks; residual paths.
Inductive bias; receptive field; params vs FLOPs.
Underfit vs overfit signals; scaling and pretrained backbones.

Previous: Computational graph Next: Weight initialization

Related Neural Networks Links

Network Design & Depth â€” 15 Interview Questions

Quick review checklist