Hierarchical Clustering
Dendrograms scikit-learn

Hierarchical Clustering

Learn how hierarchical clustering builds a tree of clusters and how to visualize it using dendrograms.

What is Hierarchical Clustering?

Hierarchical clustering builds a hierarchy of clusters instead of a single flat partition:

  • Agglomerative: bottom-up, starts with each point as a cluster and merges.
  • Divisive: top-down, starts with one cluster and splits.

A dendrogram shows the merge/split steps as a tree.

Dendrogram Example

Agglomerative Clustering Dendrogram
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# Generate sample data
X, _ = make_blobs(
    n_samples=20,
    centers=3,
    random_state=42
)

Z = linkage(X, method="ward")  # ward minimizes variance within clusters

plt.figure(figsize=(8, 4))
dendrogram(Z)
plt.title("Hierarchical Clustering Dendrogram")
plt.xlabel("Sample index")
plt.ylabel("Distance")
plt.show()

AgglomerativeClustering in scikit-learn

Clustering into 3 Groups
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

X, y_true = make_blobs(
    n_samples=150,
    centers=3,
    cluster_std=0.60,
    random_state=0
)

agg = AgglomerativeClustering(
    n_clusters=3,
    linkage="ward"
)

labels = agg.fit_predict(X)

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap="viridis", alpha=0.7)
plt.title("Agglomerative Hierarchical Clustering")
plt.show()