Loss functions fundamentals 15 questions 25 min

Loss functions MCQ · test your deep learning knowledge

From MSE to cross‑entropy, hinge, Huber and KL divergence – 15 questions covering regression, classification & robust losses.

Easy: 5 Medium: 6 Hard: 4
MSE / MAE
Cross‑entropy
Hinge loss
KL divergence

Loss functions: the compass of neural networks

Loss functions (or cost functions) quantify the difference between predicted and target values. They guide optimisation algorithms to update model parameters. This MCQ covers the most essential loss functions in deep learning, from classic regression losses to modern classification objectives.

Why loss functions matter

The choice of loss function directly impacts what the model learns. For regression, MSE penalises large errors more; for classification, cross‑entropy measures the dissimilarity between true and predicted distributions. Robust losses like Huber reduce sensitivity to outliers.

Core concepts tested

MSE & MAE

Mean Squared Error (L2) : (y - ŷ)² – sensitive to outliers. Mean Absolute Error (L1) : |y - ŷ| – more robust.

Cross‑entropy families

Binary cross‑entropy for binary classification with sigmoid; Categorical cross‑entropy for multi‑class with softmax.

Hinge loss

Used in SVMs and some neural nets for maximum‑margin classification. Typically with 'max-margin' and square‑hinge variants.

Kullback‑Leibler divergence

Measures how one probability distribution diverges from another. Often used in VAEs and generative models.

Huber loss

Combines MSE and MAE: quadratic for small errors, linear for large ones. Less sensitive to outliers.

Loss + activation

The output layer activation must match the loss: e.g., sigmoid + binary cross‑entropy, softmax + categorical cross‑entropy.

# Binary cross‑entropy (Python one‑liner)
bce = - ( y_true * log(y_pred) + (1 - y_true) * log(1 - y_pred) )
Interview tip: Understand when to use MSE vs MAE, why cross‑entropy is preferred over MSE for classification, and what "label smoothing" means.

Common interview questions on loss functions

  • Why is MSE not ideal for binary classification with sigmoid?
  • Explain the difference between categorical cross‑entropy and sparse categorical cross‑entropy.
  • What problem does the Huber loss address?
  • What is the derivative of the hinge loss?
  • When would you use KL divergence over cross‑entropy?