Neural Networks 15 Essential Q&A
Interview Prep

Neural Network Practice Exercises — 15 Interview Questions

How to structure drills, attack MCQs, trace shapes, whiteboard backprop, and use spaced repetition before interviews.

Colored left borders per card; green / amber / red difficulty chips.

Drills Shapes Whiteboard Timed
1 Effective NN practice routine.Easy
Answer: Short daily blocks: concepts (cards), numeric toy examples, one coding micro-task—repeat weak tags weekly.
2 MCQ strategy under time pressure.Easy
Answer: Eliminate impossible options (wrong units, violates invariances); for “which is true” questions, check edge cases (zeros, imbalanced classes).
3 Conv output shape (no code).Medium
Answer: Spatial: floor((W−K+2P)/S)+1 per dimension (with correct dilation if asked). Channels change via filter count, not kernel size alone.
Out = ⌊(W − K + 2P) / S⌋ + 1  (per spatial dim)
4 Count parameters in a linear layer.Easy
Answer: in×out + out for weights plus bias (if bias enabled)—common quick sanity check in screens.
5 Whiteboard: derivative of MSE w.r.t. prediction.Medium
Answer: For ½(ŷ−y)², dL/dŷ = (ŷ−y). Shows you remember scaling constants matter for manual derivations.
6 Sigmoid saturation—interview angle.Easy
Answer: Gradients ≈0 at tails → slow learning; prefer ReLU family in hidden layers; sigmoid often for binary output probability.
7 Softmax + CE gradient pattern.Medium
Answer: With CE on logits, gradient w.r.t. logits simplifies to p − y (one-hot y)—elegant result worth memorizing for speed.
8 BN during train vs eval (drill).Medium
Answer: Train uses batch stats; eval uses running mean/var learned during training—say why eval mode matters for fair metrics.
9 Dropout train vs inference.Easy
Answer: Randomly zero activations in training; at inference activations scaled (or inverted dropout in training) so expected scale matches.
10 SGD vs Adam—when prefer SGD?Medium
Answer: With careful LR schedule + momentum, SGD can generalize slightly better on some vision tasks; Adam faster early convergence—trade-off question.
11 LR too high / too low symptoms.Easy
Answer: Too high: loss spikes, NaNs. Too low: barely moves, underfits slowly—mention LR finder or grid search as practical response.
12 Spot overfitting from curves.Easy
Answer: Train metric improves while validation worsens or plateaus—response: regularization, more data, early stopping, simpler model.
13 10-minute coding drill example.Medium
Answer: Implement softmax numerically stable, or a single linear layer + CE forward on random tensors—tests API fluency without full CNN.
14 Spaced repetition for ML theory.Easy
Answer: Revisit cards at increasing intervals; tag errors (“vanishing grad”, “AUC”) and drill those stacks before interviews.
15 Mock interview structure.Easy
Answer: 5 min warm-up concepts, 20 min mixed questions, 15 min coding—record yourself; review filler words and unclear explanations only.
Keep a mistake log—interviewers love “here’s what I got wrong once.”

Quick review checklist

  • Shapes, param counts, MSE/softmax+CE gradients.
  • BN/Dropout modes, optim/LR stories, overfit signal.
  • Timed micro-codes + spaced repetition + mocks.