K-Means Clustering

Learn how K-Means groups similar data points into clusters and how to implement it in Python with a simple example.

What is K-Means?

K-Means is an unsupervised learning algorithm that partitions data into K clusters. Each cluster is represented by a centroid (mean of points in that cluster).

Choose number of clusters K.
Initialize K centroids.
Assign each point to nearest centroid.
Recompute centroids as mean of assigned points.
Repeat steps 3–4 until assignments stop changing.

Example: Clustering Synthetic Data

KMeans with Elbow Method

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# Generate sample data
X, y_true = make_blobs(
    n_samples=300,
    centers=4,
    cluster_std=0.60,
    random_state=0
)

# Elbow method to choose K
inertias = []
K_range = range(1, 10)
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X)
    inertias.append(kmeans.inertia_)  # sum of squared distances to centroids

plt.figure(figsize=(8, 4))
plt.plot(K_range, inertias, "bo-")
plt.xlabel("Number of clusters (K)")
plt.ylabel("Inertia")
plt.title("Elbow Method for Optimal K")
plt.grid(True, alpha=0.3)
plt.show()

# Fit final K-Means with chosen K (e.g., 4)
kmeans = KMeans(n_clusters=4, random_state=42)
y_kmeans = kmeans.fit_predict(X)

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, cmap="viridis", alpha=0.7)
plt.scatter(
    kmeans.cluster_centers_[:, 0],
    kmeans.cluster_centers_[:, 1],
    s=200,
    c="red",
    marker="X",
    label="Centroids"
)
plt.title("K-Means Clustering Results")
plt.legend()
plt.show()

Next: Hierarchical Clustering

Related Data Science Links

K-Means Clustering

What is K-Means?

Example: Clustering Synthetic Data