316 lines
8.6 KiB
YAML
316 lines
8.6 KiB
YAML
name: Union Find
|
||
slug: union-find
|
||
difficulty_level: 3
|
||
pattern_type: data_structure
|
||
display_order: 18
|
||
|
||
description: >
|
||
Track disjoint sets with efficient union and find operations. Union-Find
|
||
(also called Disjoint Set Union) excels at dynamically grouping elements and
|
||
answering "are these two elements in the same group?" queries.
|
||
|
||
when_to_use: |
|
||
- Finding connected components
|
||
- Detecting cycles in undirected graphs
|
||
- Kruskal's minimum spanning tree
|
||
- Dynamic connectivity queries
|
||
- Grouping related items (accounts merge, friend circles)
|
||
|
||
metaphor: |
|
||
Imagine a social network where you want to know if two people are connected
|
||
(directly or through friends of friends). Instead of searching the entire
|
||
network each time, everyone in a connected group points to a group leader.
|
||
To check if two people are connected, just check if they have the same leader.
|
||
When groups merge (someone bridges two groups), you just update one leader to
|
||
point to the other.
|
||
|
||
Another analogy: corporate acquisitions. Each company has a parent company
|
||
(possibly itself). When companies merge, one becomes a subsidiary of the other.
|
||
To find the ultimate parent, you follow the chain of ownership.
|
||
|
||
core_concept: |
|
||
Union-Find maintains a forest of trees where each tree represents a set.
|
||
Each element points to its parent, and the root of the tree is the set's
|
||
representative.
|
||
|
||
**Two key operations:**
|
||
- **Find(x)**: Return the root (representative) of x's set
|
||
- **Union(x, y)**: Merge the sets containing x and y
|
||
|
||
**Two key optimizations:**
|
||
- **Path compression**: During Find, make each node point directly to root.
|
||
This flattens the tree for future queries.
|
||
- **Union by rank/size**: Always attach the smaller tree under the larger.
|
||
This keeps trees shallow.
|
||
|
||
With both optimizations, operations run in nearly O(1) time—technically
|
||
O(α(n)) where α is the inverse Ackermann function, which is ≤ 4 for any
|
||
practical input size.
|
||
|
||
visualization: |
|
||
**Initial state (each element is its own set):**
|
||
|
||
```
|
||
parent: [0, 1, 2, 3, 4] (each points to itself)
|
||
|
||
Sets: {0}, {1}, {2}, {3}, {4}
|
||
```
|
||
|
||
**Union(0, 1):**
|
||
|
||
```
|
||
parent: [0, 0, 2, 3, 4] (1 now points to 0)
|
||
|
||
0
|
||
|
|
||
1
|
||
|
||
Sets: {0, 1}, {2}, {3}, {4}
|
||
```
|
||
|
||
**Union(2, 3) then Union(3, 4):**
|
||
|
||
```
|
||
parent: [0, 0, 2, 2, 3]
|
||
|
||
0 2
|
||
| / \
|
||
1 3 (direct)
|
||
|
|
||
4
|
||
|
||
Sets: {0, 1}, {2, 3, 4}
|
||
```
|
||
|
||
**Union(1, 4) — merges the two trees:**
|
||
|
||
```
|
||
Find(1) = 0, Find(4) = 2
|
||
Union by rank: attach smaller under larger
|
||
|
||
parent: [0, 0, 0, 2, 3]
|
||
|
||
0
|
||
/|
|
||
1 2
|
||
/ \
|
||
3 (direct)
|
||
|
|
||
4
|
||
|
||
Sets: {0, 1, 2, 3, 4}
|
||
```
|
||
|
||
**Path compression during Find(4):**
|
||
|
||
```
|
||
Find(4): 4 → 3 → 2 → 0 (found root)
|
||
Compress: make 4, 3, 2 all point directly to 0
|
||
|
||
parent: [0, 0, 0, 0, 0]
|
||
|
||
0
|
||
/|\ \
|
||
1 2 3 4
|
||
|
||
Now Find(4) is O(1)!
|
||
```
|
||
|
||
code_template: |
|
||
class UnionFind:
|
||
"""Union-Find with path compression and union by rank."""
|
||
|
||
def __init__(self, n: int):
|
||
self.parent = list(range(n))
|
||
self.rank = [0] * n
|
||
self.count = n # Number of disjoint sets
|
||
|
||
def find(self, x: int) -> int:
|
||
"""Find root with path compression."""
|
||
if self.parent[x] != x:
|
||
self.parent[x] = self.find(self.parent[x])
|
||
return self.parent[x]
|
||
|
||
def union(self, x: int, y: int) -> bool:
|
||
"""Union by rank. Returns True if x and y were in different sets."""
|
||
root_x, root_y = self.find(x), self.find(y)
|
||
|
||
if root_x == root_y:
|
||
return False # Already in same set
|
||
|
||
# Union by rank
|
||
if self.rank[root_x] < self.rank[root_y]:
|
||
root_x, root_y = root_y, root_x
|
||
|
||
self.parent[root_y] = root_x
|
||
|
||
if self.rank[root_x] == self.rank[root_y]:
|
||
self.rank[root_x] += 1
|
||
|
||
self.count -= 1
|
||
return True
|
||
|
||
def connected(self, x: int, y: int) -> bool:
|
||
"""Check if x and y are in the same set."""
|
||
return self.find(x) == self.find(y)
|
||
|
||
|
||
def count_components(n: int, edges: list[list[int]]) -> int:
|
||
"""Count connected components in undirected graph."""
|
||
uf = UnionFind(n)
|
||
|
||
for u, v in edges:
|
||
uf.union(u, v)
|
||
|
||
return uf.count
|
||
|
||
|
||
def has_cycle(n: int, edges: list[list[int]]) -> bool:
|
||
"""Detect cycle in undirected graph."""
|
||
uf = UnionFind(n)
|
||
|
||
for u, v in edges:
|
||
if uf.connected(u, v):
|
||
return True # Adding edge creates cycle
|
||
uf.union(u, v)
|
||
|
||
return False
|
||
|
||
|
||
def kruskal_mst(n: int, edges: list[tuple[int, int, int]]) -> int:
|
||
"""Kruskal's MST algorithm using Union-Find."""
|
||
# edges are (weight, u, v)
|
||
edges.sort() # Sort by weight
|
||
uf = UnionFind(n)
|
||
mst_weight = 0
|
||
edges_used = 0
|
||
|
||
for weight, u, v in edges:
|
||
if uf.union(u, v):
|
||
mst_weight += weight
|
||
edges_used += 1
|
||
if edges_used == n - 1:
|
||
break
|
||
|
||
return mst_weight if edges_used == n - 1 else -1
|
||
|
||
|
||
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
|
||
"""Merge accounts with common emails."""
|
||
email_to_id = {}
|
||
email_to_name = {}
|
||
uf = UnionFind(len(accounts))
|
||
|
||
# Map emails to account indices
|
||
for i, account in enumerate(accounts):
|
||
name = account[0]
|
||
for email in account[1:]:
|
||
email_to_name[email] = name
|
||
if email in email_to_id:
|
||
uf.union(i, email_to_id[email])
|
||
else:
|
||
email_to_id[email] = i
|
||
|
||
# Group emails by root account
|
||
from collections import defaultdict
|
||
root_to_emails = defaultdict(set)
|
||
for email, idx in email_to_id.items():
|
||
root = uf.find(idx)
|
||
root_to_emails[root].add(email)
|
||
|
||
# Build result
|
||
return [[email_to_name[next(iter(emails))]] + sorted(emails)
|
||
for emails in root_to_emails.values()]
|
||
|
||
recognition_signals:
|
||
- "connected components"
|
||
- "disjoint sets"
|
||
- "union"
|
||
- "groups"
|
||
- "merge accounts"
|
||
- "friend circles"
|
||
- "detect cycle undirected"
|
||
- "Kruskal"
|
||
- "minimum spanning tree"
|
||
- "redundant connection"
|
||
- "equivalence"
|
||
|
||
common_mistakes:
|
||
- title: Forgetting path compression
|
||
description: |
|
||
Without path compression, repeated Find operations can be O(n) each,
|
||
degrading overall performance.
|
||
fix: |
|
||
Always compress paths during Find:
|
||
```python
|
||
if self.parent[x] != x:
|
||
self.parent[x] = self.find(self.parent[x])
|
||
```
|
||
|
||
- title: Using Union-Find for directed graphs
|
||
description: |
|
||
Union-Find assumes undirected connections. For directed graphs, cycles
|
||
mean something different (back edges in DFS).
|
||
fix: |
|
||
Use DFS with coloring (WHITE/GRAY/BLACK) for cycle detection in directed
|
||
graphs. Union-Find is for undirected connectivity.
|
||
|
||
- title: Not tracking component count
|
||
description: |
|
||
For problems asking "how many components," manually counting at the end
|
||
is inefficient.
|
||
fix: |
|
||
Decrement count in union when merging two different sets:
|
||
```python
|
||
if root_x != root_y:
|
||
self.count -= 1
|
||
```
|
||
|
||
- title: Union returning wrong information
|
||
description: |
|
||
Some solutions need to know if a union actually merged two sets or if
|
||
they were already connected.
|
||
fix: |
|
||
Return boolean from union indicating if merge happened:
|
||
```python
|
||
if root_x == root_y:
|
||
return False # Already same set
|
||
# ... do union ...
|
||
return True # Merged
|
||
```
|
||
|
||
variations:
|
||
- name: Basic connectivity
|
||
description: |
|
||
Track whether elements are in the same connected component.
|
||
example: "Number of Connected Components, Friend Circles"
|
||
|
||
- name: Cycle detection
|
||
description: |
|
||
If union is called on two already-connected elements, adding that edge
|
||
would create a cycle.
|
||
example: "Redundant Connection, Graph Valid Tree"
|
||
|
||
- name: Kruskal's MST
|
||
description: |
|
||
Sort edges by weight, greedily add edges that don't create cycles
|
||
(checked via Union-Find).
|
||
example: "Min Cost to Connect All Points, Connecting Cities With Minimum Cost"
|
||
|
||
- name: Dynamic connectivity
|
||
description: |
|
||
Handle streaming edge insertions while answering connectivity queries.
|
||
example: "Evaluate Division, Accounts Merge"
|
||
|
||
- name: Weighted Union-Find
|
||
description: |
|
||
Track relative weights/distances between elements and their roots.
|
||
Used in problems with equivalence relationships.
|
||
example: "Evaluate Division (weighted paths)"
|
||
|
||
related_patterns:
|
||
- dfs
|
||
- bfs
|
||
|
||
prerequisite_patterns: []
|