Model Evaluation Q&A 20 Core Questions
Interview Prep

Model Evaluation & Metrics: Interview Q&A

Short questions and answers on how to evaluate machine learning models, choose metrics and design validation strategies.

Accuracy Precision/Recall ROC‑AUC Regression Metrics
1 What is model evaluation in ML? âš¡ Beginner
Answer: Model evaluation is the process of measuring how well a trained model performs on unseen data using appropriate metrics and validation schemes.
2 Why is a test set needed if we already use validation data? âš¡ Beginner
Answer: The validation set is used to tune models and hyperparameters; the final test set remains untouched so it can provide an unbiased estimate of performance.
3 What is accuracy and when can it be misleading? âš¡ Beginner
Answer: Accuracy is the fraction of correct predictions. It can be misleading on imbalanced datasets where predicting the majority class yields high accuracy but poor minority‑class performance.
4 Define precision and recall briefly. âš¡ Beginner
Answer: Precision is “of the predicted positives, how many are correct?”, while recall is “of the actual positives, how many did we find?”.
5 What is F1-score and why is it useful? 📊 Intermediate
Answer: F1 is the harmonic mean of precision and recall, giving a single score that balances them, especially useful when classes are imbalanced.
6 What is a confusion matrix in simple terms? âš¡ Beginner
Answer: A confusion matrix is a table that shows, for each true class, how many examples were predicted correctly or as other classes (TP, FP, TN, FN).
7 What is ROC‑AUC? 📊 Intermediate
Answer: ROC‑AUC is the area under the ROC curve, which plots true positive rate vs false positive rate across thresholds; it measures how well the model ranks positives above negatives.
8 Name three common regression metrics. âš¡ Beginner
Answer: Common regression metrics: MAE (mean absolute error), MSE/RMSE (mean squared / root mean squared error), and R² (coefficient of determination).
9 What does R² tell you? 📊 Intermediate
Answer: R² indicates the proportion of variance in the target explained by the model compared to a simple baseline that always predicts the mean.
10 What is cross‑validation and why use it? ⚡ Beginner
Answer: Cross‑validation splits data into multiple train/validation folds to get a more robust estimate of performance and reduce dependence on a single split.
11 When would you prefer precision over recall? 📊 Intermediate
Answer: You prefer precision when false positives are very costly, e.g., marking legitimate transactions as fraud or legitimate emails as spam.
12 When would you focus more on recall? 📊 Intermediate
Answer: You focus on recall when missing a positive is very costly, e.g., failing to detect a disease or a critical fault.
13 What is the difference between micro, macro and weighted F1? 🔥 Advanced
Answer: Micro aggregates contributions of all classes, macro averages F1 per class equally, and weighted averages F1 per class weighted by support (class frequency).
14 What is log‑loss (cross‑entropy) in classification? 🔥 Advanced
Answer: Log‑loss penalizes incorrect or over‑confident predicted probabilities; it is the negative log likelihood of the true class under the predicted probability distribution.
15 Why should you compare models with the same evaluation protocol? 📊 Intermediate
Answer: Changing data splits, metrics or preprocessing while comparing models introduces confounders; using the same protocol ensures differences come from the models themselves.
16 What is overfitting to the validation set? 🔥 Advanced
Answer: If you iterate on model and hyperparameters many times based on validation performance, you may implicitly fit noise in the validation set and no longer have an unbiased estimate.
17 How do you evaluate models on time series data? 🔥 Advanced
Answer: You typically use time‑aware splits (train on past, validate on future) and rolling or expanding window validation instead of random shuffles.
18 What is calibration of predicted probabilities? 🔥 Advanced
Answer: Calibration measures how well predicted probabilities match observed frequencies; a calibrated model predicting 0.8 should be correct about 80% of the time.
19 Why is business context important for choosing metrics? 📊 Intermediate
Answer: Different errors have different real‑world costs; metrics must align with business goals (e.g., revenue, risk, user experience), not just abstract scores.
20 Why should model performance be monitored after deployment? 📊 Intermediate
Answer: Data and user behavior change over time; continuous monitoring reveals drift, degradation and fairness issues so you can retrain or adjust the model.

Quick Recap: Model Evaluation

Good evaluation is about asking the right questions: which mistakes matter, which metric captures them, and how stable is performance across datasets and time.