Random Forest Ensemble
Powerful Baseline scikit-learn

Random Forest

Learn how Random Forest combines many decision trees to improve accuracy and robustness for both classification and regression tasks.

What is Random Forest?

Random Forest is an ensemble of decision trees. Each tree is trained on a random subset of the data and features, and their predictions are combined:

  • For classification: majority vote of trees.
  • For regression: average of tree predictions.
Idea: Many weak, slightly different trees together form a strong model.

Example: Classification with RandomForestClassifier

Random Forest on Iris Dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

rf_clf = RandomForestClassifier(
    n_estimators=100,   # number of trees
    max_depth=None,    # grow trees fully (can tune)
    random_state=42
)

rf_clf.fit(X_train, y_train)
y_pred = rf_clf.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nReport:\n", classification_report(y_test, y_pred, target_names=iris.target_names))

Feature Importance

Checking Which Features Matter
import pandas as pd

feature_importances = pd.DataFrame({
    "feature": iris.feature_names,
    "importance": rf_clf.feature_importances_
}).sort_values("importance", ascending=False)

print(feature_importances)