Related Data Science Links
Learn Cross Validation Data Science Tutorial, validate concepts with Cross Validation Data Science MCQ Questions, and prepare interviews through Cross Validation Data Science Interview Questions and Answers.
Cross-Validation
Reliable Scores
Best Practice
scikit-learn
Cross-Validation
Learn how to use k-fold cross-validation to get a more reliable estimate of model performance than a single train/test split.
Why Cross-Validation?
- A single train/test split can be unlucky (too easy or too hard).
- Cross-validation uses multiple splits to average performance.
- Helps when data is limited.
K-Fold Cross-Validation
In k-fold CV, the data is split into k equal parts (folds). Each fold is used once as a test set, while the remaining kâ1 folds act as training data.
K-Fold with cross_val_score
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score, KFold
from sklearn.ensemble import RandomForestClassifier
import numpy as np
iris = load_iris()
X, y = iris.data, iris.target
rf = RandomForestClassifier(
n_estimators=100,
random_state=42
)
cv = KFold(
n_splits=5,
shuffle=True,
random_state=42
)
scores = cross_val_score(
rf,
X,
y,
cv=cv,
scoring="accuracy"
)
print("CV scores:", scores)
print("Mean accuracy:", np.mean(scores))
print("Std dev:", np.std(scores))