Related Data Science Links
Learn Decision Trees Data Science Tutorial, validate concepts with Decision Trees Data Science MCQ Questions, and prepare interviews through Decision Trees Data Science Interview Questions and Answers.
Decision Trees
Supervised Learning
Intuitive
scikit-learn
Decision Trees
Learn how decision trees split data into regions using questions, and how to use them for classification and regression in Python.
What is a Decision Tree?
A decision tree predicts a target by asking a sequence of questions about the features.
Each internal node checks a condition (e.g., feature < threshold),
and each leaf node outputs a prediction.
- Classification trees: predict categories.
- Regression trees: predict numerical values.
Example: Classification Tree
DecisionTreeClassifier on Iris Dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
# Create decision tree
tree_clf = DecisionTreeClassifier(
max_depth=3, # limit depth to reduce overfitting
random_state=42
)
tree_clf.fit(X_train, y_train)
y_pred = tree_clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nReport:\n", classification_report(y_test, y_pred, target_names=iris.target_names))
Example: Regression Tree
Predict House Price
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import numpy as np
X = np.array([[500], [750], [1000], [1250], [1500], [1750], [2000]])
y = np.array([100, 150, 200, 250, 300, 320, 350])
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
tree_reg = DecisionTreeRegressor(
max_depth=3,
random_state=42
)
tree_reg.fit(X_train, y_train)
y_pred = tree_reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("MSE:", mse)