Model Evaluation & Metrics: Interview Q&A

Short questions and answers on how to evaluate machine learning models, choose metrics and design validation strategies.

Accuracy Precision/Recall ROCâ€‘AUC Regression Metrics

1 What is model evaluation in ML? âš¡ Beginner

Answer: Model evaluation is the process of measuring how well a trained model performs on unseen data using appropriate metrics and validation schemes.

2 Why is a test set needed if we already use validation data? âš¡ Beginner

Answer: The validation set is used to tune models and hyperparameters; the final test set remains untouched so it can provide an unbiased estimate of performance.

3 What is accuracy and when can it be misleading? âš¡ Beginner

Answer: Accuracy is the fraction of correct predictions. It can be misleading on imbalanced datasets where predicting the majority class yields high accuracy but poor minorityâ€‘class performance.

4 Define precision and recall briefly. âš¡ Beginner

Answer: Precision is â€œof the predicted positives, how many are correct?â€, while recall is â€œof the actual positives, how many did we find?â€.

5 What is F1-score and why is it useful? ðŸ“Š Intermediate

Answer: F1 is the harmonic mean of precision and recall, giving a single score that balances them, especially useful when classes are imbalanced.

6 What is a confusion matrix in simple terms? âš¡ Beginner

Answer: A confusion matrix is a table that shows, for each true class, how many examples were predicted correctly or as other classes (TP, FP, TN, FN).

7 What is ROCâ€‘AUC? ðŸ“Š Intermediate

Answer: ROCâ€‘AUC is the area under the ROC curve, which plots true positive rate vs false positive rate across thresholds; it measures how well the model ranks positives above negatives.

8 Name three common regression metrics. âš¡ Beginner

Answer: Common regression metrics: MAE (mean absolute error), MSE/RMSE (mean squared / root mean squared error), and RÂ² (coefficient of determination).

9 What does RÂ² tell you? ðŸ“Š Intermediate

Answer: RÂ² indicates the proportion of variance in the target explained by the model compared to a simple baseline that always predicts the mean.

10 What is crossâ€‘validation and why use it? âš¡ Beginner

Answer: Crossâ€‘validation splits data into multiple train/validation folds to get a more robust estimate of performance and reduce dependence on a single split.

11 When would you prefer precision over recall? ðŸ“Š Intermediate

Answer: You prefer precision when false positives are very costly, e.g., marking legitimate transactions as fraud or legitimate emails as spam.

12 When would you focus more on recall? ðŸ“Š Intermediate

Answer: You focus on recall when missing a positive is very costly, e.g., failing to detect a disease or a critical fault.

13 What is the difference between micro, macro and weighted F1? ðŸ”¥ Advanced

Answer: Micro aggregates contributions of all classes, macro averages F1 per class equally, and weighted averages F1 per class weighted by support (class frequency).

14 What is logâ€‘loss (crossâ€‘entropy) in classification? ðŸ”¥ Advanced

Answer: Logâ€‘loss penalizes incorrect or overâ€‘confident predicted probabilities; it is the negative log likelihood of the true class under the predicted probability distribution.

15 Why should you compare models with the same evaluation protocol? ðŸ“Š Intermediate

Answer: Changing data splits, metrics or preprocessing while comparing models introduces confounders; using the same protocol ensures differences come from the models themselves.

16 What is overfitting to the validation set? ðŸ”¥ Advanced

Answer: If you iterate on model and hyperparameters many times based on validation performance, you may implicitly fit noise in the validation set and no longer have an unbiased estimate.

17 How do you evaluate models on time series data? ðŸ”¥ Advanced

Answer: You typically use timeâ€‘aware splits (train on past, validate on future) and rolling or expanding window validation instead of random shuffles.

18 What is calibration of predicted probabilities? ðŸ”¥ Advanced

Answer: Calibration measures how well predicted probabilities match observed frequencies; a calibrated model predicting 0.8 should be correct about 80% of the time.

19 Why is business context important for choosing metrics? ðŸ“Š Intermediate

Answer: Different errors have different realâ€‘world costs; metrics must align with business goals (e.g., revenue, risk, user experience), not just abstract scores.

20 Why should model performance be monitored after deployment? ðŸ“Š Intermediate

Answer: Data and user behavior change over time; continuous monitoring reveals drift, degradation and fairness issues so you can retrain or adjust the model.

Quick Recap: Model Evaluation

Good evaluation is about asking the right questions: which mistakes matter, which metric captures them, and how stable is performance across datasets and time.

Back: Preprocessing Q&A Next: Linear Regression Q&A

Related Machine Learning Links

Model Evaluation & Metrics: Interview Q&A

Quick Recap: Model Evaluation