Related Neural Networks Links
Learn Forward Propagation Neural Networks Tutorial, validate concepts with Forward Propagation Neural Networks MCQ Questions, and prepare interviews through Forward Propagation Neural Networks Interview Questions and Answers.
Neural Networks
15 Essential Q&A
Interview Prep
Forward Propagation — 15 Interview Questions
From input to logits: layer order, tensor shapes, batching, inference vs training mode, and how interviewers test your mental model of the forward pass.
Colored left borders per card; green / amber / red difficulty chips.
Inference
Shapes
FLOPs
Activations
1 What is forward propagation?Easy
Answer: Computing the network’s output from input by applying each layer in order: affine transforms, biases, activations, pooling, etc.—no weight updates. It is prediction / loss input during training and pure inference at deploy time.
2 Forward vs backward pass in one sentence each.Easy
Answer: Forward: compute outputs and (usually) cache intermediates for loss. Backward: apply chain rule to get gradients for learning. Forward does not change weights; backward supplies the update signal.
3 One step of an MLP layer in forward form.Easy
Answer: z = Wx + b, then a = f(z) for activation
f. For a batch, X is stacked rows and the same W applies to each.z = Wx + b, a = f(z)
4 Shape of X, W, and output for a batched linear layer.Medium
Answer:
X: B × d_in, W: d_in × d_out, bias b: d_out (broadcast). Output Y: B × d_out with Y = XW + b (row-wise).5 Rough FLOPs for matrix multiply A (m×k) · B (k×n)?Medium
Answer: Dominant term is O(m·k·n) multiply-adds (often quoted as ~2mkn FLOPs if counting mul+add separately). Used to reason about layer cost in forward pass.
6 Why must layers be applied in a fixed order?Easy
Answer: Each layer’s input is the previous layer’s output. Reordering changes the composed function entirely unless the architecture is specially designed (e.g. parallel branches with merge).
7 What activations are often cached during forward pass in training?Medium
Answer: Pre-activations z and post-activations a (and BN stats inputs) so backprop can compute local gradients without recomputing everything. Frameworks handle this in autograd.
8 How does
eval() / inference mode change forward behavior?MediumAnswer: Dropout disabled (or scaled). BatchNorm uses running mean/var not batch stats. No gradient tracking needed—saves memory and compute.
9 Why subtract max before softmax in practice?Hard
Answer: Logits can be large; ez overflows. z' = z − max(z) shifts logits without changing softmax output (invariant) but keeps exponentials bounded—numerical stability.
10 Forward pass for batch size 1 vs large B—same code path?Easy
Answer: Usually yes—B=1 is a degenerate batch; matrix ops still work. Some ops (e.g. BN) behave differently with tiny batch size; that’s a practical caveat.
11 What drives memory during forward (training)?Medium
Answer: Storing activations for backprop, plus optimizer state if updating. Wider/deeper nets and larger batch increase activation memory—often the bottleneck before weights.
12 “Functional†forward: what does it mean in frameworks?Medium
Answer: Applying ops with explicit weight tensors passed in (e.g.
F.linear(x, W, b)) instead of nn.Module parameters—same math, useful for meta-learning or custom graphs.13 Mixed precision forward—what changes?Hard
Answer: Many ops run in float16/bfloat16 for speed; sensitive reductions (loss, BN) may stay in float32. Loss scaling can help with small gradients in low precision.
14 Exported model “inference graphâ€â€”relation to forward pass?Medium
Answer: It is a frozen forward computation graph (no backward), optimized for deployment—same layer order as training forward, possibly fused ops.
15 Walk through a 3-layer MLP forward from x to class probs.Easy
Answer: x → h1 = f(W1x+b1) → h2 = f(W2h1+b2) → logits = W3h2+b3 → probs = softmax(logits). Mention where nonlinearity stops (before softmax).
Draw arrows on a whiteboard—interviewers check you separate linear blocks from f and softmax.
Quick review checklist
- Define forward vs backward; one MLP layer: z, a, batch shapes.
- Training caches; eval mode: dropout off, BN running stats.
- Softmax stability; FLOPs order for matmul; memory = activations.