DBSCAN Density-Based
Noise Detection scikit-learn

DBSCAN Clustering

Learn how DBSCAN groups dense regions of points and marks outliers as noise, useful for arbitrarily shaped clusters.

What is DBSCAN?

DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise.

  • Clusters are areas with high point density.
  • Outliers in low-density regions are labeled as noise.
  • Parameters:
    • eps: neighborhood radius.
    • min_samples: minimum points to form a dense region.

Example: DBSCAN with Noisy Data

DBSCAN in scikit-learn
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

# Generate non-linearly separable data (two moons)
X, y_true = make_moons(
    n_samples=300,
    noise=0.05,
    random_state=42
)

dbscan = DBSCAN(
    eps=0.2,        # neighborhood radius
    min_samples=5   # minimum points to form a cluster
)

labels = dbscan.fit_predict(X)

# -1 label is noise
plt.figure(figsize=(8, 6))
scatter = plt.scatter(
    X[:, 0], X[:, 1],
    c=labels,
    cmap="viridis",
    alpha=0.8
)
plt.title("DBSCAN Clustering (Two Moons)")
plt.show()