Random Forest Q&A 20 Core Questions
Interview Prep

Random Forest: Interview Q&A

Short questions and answers on random forests: bagging, feature sampling, out‑of‑bag error and feature importance.

Bagging Feature Sampling Importance Robust Performance
1 What is a random forest in simple terms? âš¡ Beginner
Answer: A random forest is an ensemble of many decision trees whose predictions are combined (e.g., majority vote or average) to improve accuracy and robustness.
2 What two main sources of randomness does a random forest use? 📊 Intermediate
Answer: It uses bootstrap sampling of data for each tree and random subsets of features at each split.
3 What is bagging? 📊 Intermediate
Answer: Bagging (bootstrap aggregating) trains multiple models on bootstrapped samples of the data and aggregates their predictions to reduce variance.
4 Why does random feature selection at splits help? 🔥 Advanced
Answer: It decorrelates trees so they make different errors, making the ensemble more powerful than many identical trees.
5 What is out‑of‑bag (OOB) error in random forests? 🔥 Advanced
Answer: OOB error is the validation error estimated on samples not included in a tree’s bootstrap sample, averaged across trees.
6 Name key hyperparameters of a random forest. âš¡ Beginner
Answer: Important hyperparameters include n_estimators, max_depth, max_features, min_samples_split and min_samples_leaf.
7 Do random forests require heavy feature scaling? âš¡ Beginner
Answer: No, tree‑based methods are largely invariant to monotonic transformations of features, so scaling is usually not critical.
8 How do random forests handle missing values? 🔥 Advanced
Answer: Some implementations support surrogate splits or missing value handling; often you still impute missing data beforehand.
9 What are the advantages of random forests over single trees? âš¡ Beginner
Answer: They reduce variance, improve accuracy and are more robust to noise and overfitting than a single deep tree.
10 What are some disadvantages of random forests? âš¡ Beginner
Answer: They can be computationally heavier, less interpretable than single trees and large in memory.
11 How is feature importance computed in random forests? 📊 Intermediate
Answer: Common methods sum the decrease in impurity (e.g., Gini) brought by each feature across all splits and trees, or use permutation importance.
12 What is permutation feature importance? 🔥 Advanced
Answer: It measures how much the model’s performance deteriorates when a feature’s values are randomly shuffled, breaking its relationship with the target.
13 Can random forests be used for regression? âš¡ Beginner
Answer: Yes, random forest regression averages predictions from many regression trees instead of taking a majority vote.
14 How does the number of trees affect performance? 📊 Intermediate
Answer: More trees usually reduce variance and improve stability up to a point, but also increase training and prediction time.
15 Do random forests extrapolate well outside the training data range? 🔥 Advanced
Answer: No, they typically predict within the range observed in training, not extrapolating linear trends far beyond it.
16 How do random forests handle high‑dimensional data? 🔥 Advanced
Answer: They can handle many features, but feature sampling is crucial; performance may degrade if signal is very sparse across features.
17 How does random forest compare to gradient boosting methods? 🔥 Advanced
Answer: Random forests focus on parallel, variance reduction, while boosting builds trees sequentially to reduce bias; boosting often reaches higher accuracy but is more sensitive to tuning.
18 When is random forest a good default choice? âš¡ Beginner
Answer: It’s a strong default when you have tabular data with mixed feature types and want a robust model without heavy tuning.
19 How can you speed up random forest training on large datasets? 📊 Intermediate
Answer: Strategies include using fewer trees, shallower depth, subsampling rows/features and parallelizing training across CPU cores.
20 What is the key message to remember about random forests? âš¡ Beginner
Answer: Random forests are powerful, robust and easy‑to‑use ensemble models that often perform very well on structured data with minimal tuning.

Quick Recap: Random Forest

Think of a random forest as many slightly different trees voting together; understanding bagging and feature sampling explains most of its behavior and strengths.