Related Data Science Links
Learn Hierarchical Data Science Tutorial, validate concepts with Hierarchical Data Science MCQ Questions, and prepare interviews through Hierarchical Data Science Interview Questions and Answers.
Hierarchical
Clustering
Dendrograms
scikit-learn
Hierarchical Clustering
Learn how hierarchical clustering builds a tree of clusters and how to visualize it using dendrograms.
What is Hierarchical Clustering?
Hierarchical clustering builds a hierarchy of clusters instead of a single flat partition:
- Agglomerative: bottom-up, starts with each point as a cluster and merges.
- Divisive: top-down, starts with one cluster and splits.
A dendrogram shows the merge/split steps as a tree.
Dendrogram Example
Agglomerative Clustering Dendrogram
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
# Generate sample data
X, _ = make_blobs(
n_samples=20,
centers=3,
random_state=42
)
Z = linkage(X, method="ward") # ward minimizes variance within clusters
plt.figure(figsize=(8, 4))
dendrogram(Z)
plt.title("Hierarchical Clustering Dendrogram")
plt.xlabel("Sample index")
plt.ylabel("Distance")
plt.show()
AgglomerativeClustering in scikit-learn
Clustering into 3 Groups
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
X, y_true = make_blobs(
n_samples=150,
centers=3,
cluster_std=0.60,
random_state=0
)
agg = AgglomerativeClustering(
n_clusters=3,
linkage="ward"
)
labels = agg.fit_predict(X)
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap="viridis", alpha=0.7)
plt.title("Agglomerative Hierarchical Clustering")
plt.show()