Related Data Science Links
Learn Random Forest Data Science Tutorial, validate concepts with Random Forest Data Science MCQ Questions, and prepare interviews through Random Forest Data Science Interview Questions and Answers.
Random Forest
Ensemble
Powerful Baseline
scikit-learn
Random Forest
Learn how Random Forest combines many decision trees to improve accuracy and robustness for both classification and regression tasks.
What is Random Forest?
Random Forest is an ensemble of decision trees. Each tree is trained on a random subset of the data and features, and their predictions are combined:
- For classification: majority vote of trees.
- For regression: average of tree predictions.
Idea: Many weak, slightly different trees together form a strong model.
Example: Classification with RandomForestClassifier
Random Forest on Iris Dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
rf_clf = RandomForestClassifier(
n_estimators=100, # number of trees
max_depth=None, # grow trees fully (can tune)
random_state=42
)
rf_clf.fit(X_train, y_train)
y_pred = rf_clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nReport:\n", classification_report(y_test, y_pred, target_names=iris.target_names))
Feature Importance
Checking Which Features Matter
import pandas as pd
feature_importances = pd.DataFrame({
"feature": iris.feature_names,
"importance": rf_clf.feature_importances_
}).sort_values("importance", ascending=False)
print(feature_importances)