Loss Functions â€” 15 Interview Questions

Empirical risk, MSE vs cross-entropy, softmax pairing, robust losses, and how regularizers enter the objectiveâ€”what interviewers expect you to connect to gradients.

Colored left borders per card; green / amber / red difficulty chips.

Objective Cross-entropy MSE Regularization

1 What is a loss function in supervised learning?Easy

Answer: A scalar that scores how far model outputs are from targets for one example (or batch). Training minimizes the average loss over the datasetâ€”the empirical risk.

2 Mean squared error (MSE)â€”definition and typical use.Easy

Answer: Average of squared differences between prediction and target. Common for regression; penalizes large errors heavily. With Gaussian noise assumptions, MSE relates to maximum likelihood.

MSE = (1/n) Î£ (Å·_i âˆ’ y_i)Â²

3 Binary cross-entropy in one line.Easy

Answer: For label y âˆˆ {0,1} and predicted probability pÌ‚, loss encourages pÌ‚ â†’ y. It is the negative log-likelihood of a Bernoulli modelâ€”strong gradients when the model is confidently wrong.

4 Multi-class cross-entropy with one-hot targets.Medium

Answer: âˆ’Î£_k y_k log pÌ‚_k with one-hot y picks the log-probability of the true class. With softmax outputs, this is standard classification training.

5 Why softmax + cross-entropy together?Medium

Answer: Softmax turns logits into a distribution; CE matches it to labels. The combined gradient w.r.t. logits is often simple (prediction minus target), which is stable and efficient to implement (e.g. log-softmax + NLL).

6 Hinge lossâ€”when does it appear?Medium

Answer: Classic for SVMs: penalizes margin violations. Less common in standard deep classifiers than CE but shows up in contrastive / max-margin formulations.

7 Huber loss vs MSE for regression.Medium

Answer: Behaves like MSE near zero (smooth) and like L1 far outâ€”less sensitive to outliers than pure MSE while staying differentiable in practice (at the join point subgradient).

8 Where does L2 regularization appear in the loss?Easy

Answer: Add Î»||w||Â² (or similar) to the empirical loss so optimization shrinks weights, improving generalization. It is weight decay in the objective (implementation details can differ in AdamW).

9 Why not train directly on classification accuracy?Medium

Answer: Accuracy is piecewise constant in logitsâ€”gradient is zero almost everywhere. Differentiable surrogates (CE) provide learning signal.

10 Focal lossâ€”purpose in one sentence.Hard

Answer: Down-weights easy examples so training focuses on hard onesâ€”useful with extreme class imbalance in detection settings.

11 Class imbalanceâ€”common loss-side fixes?Medium

Answer: Class weights in CE, resampling, focal loss, or changing the evaluation metric. Mention that rebalancing affects calibration.

12 Label smoothingâ€”what does it change?Hard

Answer: Replace hard one-hot with a mixture with a uniform (or other) distribution so the model is not pushed to infinite confidence. Often improves calibration and regularization.

13 KL divergence as a loss componentâ€”when?Hard

Answer: When matching two distributionsâ€”e.g. knowledge distillation (student vs teacher softmax), variational objectives, or probabilistic models. It measures extra bits if using q instead of p.

14 Multi-label classificationâ€”typical loss?Medium

Answer: Independent sigmoid + binary CE per label (not softmax), because multiple labels can be active at once.

15 How do you pick a loss for a new task?Medium

Answer: Match the output head and probabilistic story: regression â†’ MSE/Huber; exclusive classes â†’ softmax+CE; multi-label â†’ sigmoid+BCE; ranking â†’ pairwise/ranking losses. Align with business metric when possible.

Tie every loss answer to gradients and what is being optimized.

Quick review checklist

Empirical risk; MSE vs CE; softmax+CE gradient story.
Why not accuracy; multi-label vs multi-class losses.
Regularization in objective; label smoothing / focal at high level.

Previous: Forward propagation Next: Gradient descent

Related Neural Networks Links

Loss Functions â€” 15 Interview Questions

Quick review checklist