Union-find (disjoint set)

Union-find (also called disjoint set union, DSU) maintains a partition of vertices into connected components under two fast operations: find (which component?) and union (merge components). It is foundational for Kruskal's MST, offline connectivity queries, and cycle checks in incremental undirected graphs.

On this page
  • Parent-forest representation
  • find, union, and connected query
  • Path compression + union by rank
  • C implementation and trace

1) DSU model

Each set is represented by a rooted tree. Array parent[x] stores a node's parent; a root is its own parent. find(x) returns the root representative. union(a,b) links one root under the other if they differ.

Initially: 0  1  2  3  4  5   (each is its own set)
After union(0,1), union(1,2): 0<-1<-2   and  3  4  5
Representative(find(2)) = 0

2) Operations

Core DSU API
Operation Meaning Typical use
make_set(x)Create singleton set for x.Initialization for all vertices.
find(x)Return component representative.Check whether two vertices are already connected.
union(a,b)Merge sets containing a and b.Add new edge between components.
same(a,b)find(a)==find(b)?Cycle check in undirected edge insertion.

3) Why it is fast

  • Path compression: after find(x), every visited node points directly to the root.
  • Union by rank/size: attach the smaller (or shallower) root under the larger root.
  • Together they give amortized O(α(V)) per operation (inverse Ackermann, practically constant).

4) C — DSU with path compression + rank

#include <stdio.h>

#define N 6

static int parent[N];
static int rnk[N];

void dsu_init(void) {
    for (int i = 0; i < N; i++) {
        parent[i] = i;   /* singleton set: root is itself */
        rnk[i] = 0;
    }
}

int dsu_find(int x) {
    if (parent[x] != x)
        parent[x] = dsu_find(parent[x]);   /* path compression */
    return parent[x];
}

void dsu_union(int a, int b) {
    a = dsu_find(a);
    b = dsu_find(b);
    if (a == b) return;   /* already same component */

    /* union by rank */
    if (rnk[a] < rnk[b]) { int t = a; a = b; b = t; }
    parent[b] = a;
    if (rnk[a] == rnk[b]) rnk[a]++;
}

int same_set(int a, int b) {
    return dsu_find(a) == dsu_find(b);
}

int main(void) {
    dsu_init();
    dsu_union(0, 1);
    dsu_union(1, 2);
    dsu_union(3, 4);
    printf("same(0,2) = %d\n", same_set(0, 2));  /* 1 */
    printf("same(2,4) = %d\n", same_set(2, 4));  /* 0 */
    dsu_union(2, 4);
    printf("same(0,4) = %d\n", same_set(0, 4));  /* 1 */
    return 0;
}

5) Trace — component merges

Sets on N = 6 vertices
Step Operation Components (informal)
Initmake_set(0..5){0} {1} {2} {3} {4} {5}
1union(0,1){0,1} {2} {3} {4} {5}
2union(1,2){0,1,2} {3} {4} {5}
3union(3,4){0,1,2} {3,4} {5}
4same(2,4)false (different reps)
5union(2,4){0,1,2,3,4} {5}

6) Graph uses

  • Kruskal MST: include edge only if endpoints are in different sets.
  • Cycle detection (undirected): adding edge (u,v) forms a cycle if same(u,v) is already true.
  • Offline connectivity: process unions, answer whether two nodes are connected.

Key takeaway

Union-find is a tiny but powerful structure: two arrays plus two optimizations give near-constant component operations. It is the engine behind Kruskal's algorithm and many connectivity problems.

Related: Kruskal's algorithm · Minimum spanning tree · Cycle detection · Graph representation