Loss functions MCQ · test your deep learning knowledge
From MSE to cross‑entropy, hinge, Huber and KL divergence – 15 questions covering regression, classification & robust losses.
Loss functions: the compass of neural networks
Loss functions (or cost functions) quantify the difference between predicted and target values. They guide optimisation algorithms to update model parameters. This MCQ covers the most essential loss functions in deep learning, from classic regression losses to modern classification objectives.
Why loss functions matter
The choice of loss function directly impacts what the model learns. For regression, MSE penalises large errors more; for classification, cross‑entropy measures the dissimilarity between true and predicted distributions. Robust losses like Huber reduce sensitivity to outliers.
Core concepts tested
MSE & MAE
Mean Squared Error (L2) : (y - ŷ)² – sensitive to outliers. Mean Absolute Error (L1) : |y - ŷ| – more robust.
Cross‑entropy families
Binary cross‑entropy for binary classification with sigmoid; Categorical cross‑entropy for multi‑class with softmax.
Hinge loss
Used in SVMs and some neural nets for maximum‑margin classification. Typically with 'max-margin' and square‑hinge variants.
Kullback‑Leibler divergence
Measures how one probability distribution diverges from another. Often used in VAEs and generative models.
Huber loss
Combines MSE and MAE: quadratic for small errors, linear for large ones. Less sensitive to outliers.
Loss + activation
The output layer activation must match the loss: e.g., sigmoid + binary cross‑entropy, softmax + categorical cross‑entropy.
# Binary cross‑entropy (Python one‑liner) bce = - ( y_true * log(y_pred) + (1 - y_true) * log(1 - y_pred) )
Common interview questions on loss functions
- Why is MSE not ideal for binary classification with sigmoid?
- Explain the difference between categorical cross‑entropy and sparse categorical cross‑entropy.
- What problem does the Huber loss address?
- What is the derivative of the hinge loss?
- When would you use KL divergence over cross‑entropy?