Files
codetutor/backend/data/patterns/union-find.yaml

316 lines
8.6 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
name: Union Find
slug: union-find
difficulty_level: 3
pattern_type: data_structure
display_order: 18
description: >
Track disjoint sets with efficient union and find operations. Union-Find
(also called Disjoint Set Union) excels at dynamically grouping elements and
answering "are these two elements in the same group?" queries.
when_to_use: |
- Finding connected components
- Detecting cycles in undirected graphs
- Kruskal's minimum spanning tree
- Dynamic connectivity queries
- Grouping related items (accounts merge, friend circles)
metaphor: |
Imagine a social network where you want to know if two people are connected
(directly or through friends of friends). Instead of searching the entire
network each time, everyone in a connected group points to a group leader.
To check if two people are connected, just check if they have the same leader.
When groups merge (someone bridges two groups), you just update one leader to
point to the other.
Another analogy: corporate acquisitions. Each company has a parent company
(possibly itself). When companies merge, one becomes a subsidiary of the other.
To find the ultimate parent, you follow the chain of ownership.
core_concept: |
Union-Find maintains a forest of trees where each tree represents a set.
Each element points to its parent, and the root of the tree is the set's
representative.
**Two key operations:**
- **Find(x)**: Return the root (representative) of x's set
- **Union(x, y)**: Merge the sets containing x and y
**Two key optimizations:**
- **Path compression**: During Find, make each node point directly to root.
This flattens the tree for future queries.
- **Union by rank/size**: Always attach the smaller tree under the larger.
This keeps trees shallow.
With both optimizations, operations run in nearly O(1) time—technically
O(α(n)) where α is the inverse Ackermann function, which is ≤ 4 for any
practical input size.
visualization: |
**Initial state (each element is its own set):**
```
parent: [0, 1, 2, 3, 4] (each points to itself)
Sets: {0}, {1}, {2}, {3}, {4}
```
**Union(0, 1):**
```
parent: [0, 0, 2, 3, 4] (1 now points to 0)
0
|
1
Sets: {0, 1}, {2}, {3}, {4}
```
**Union(2, 3) then Union(3, 4):**
```
parent: [0, 0, 2, 2, 3]
0 2
| / \
1 3 (direct)
|
4
Sets: {0, 1}, {2, 3, 4}
```
**Union(1, 4) — merges the two trees:**
```
Find(1) = 0, Find(4) = 2
Union by rank: attach smaller under larger
parent: [0, 0, 0, 2, 3]
0
/|
1 2
/ \
3 (direct)
|
4
Sets: {0, 1, 2, 3, 4}
```
**Path compression during Find(4):**
```
Find(4): 4 → 3 → 2 → 0 (found root)
Compress: make 4, 3, 2 all point directly to 0
parent: [0, 0, 0, 0, 0]
0
/|\ \
1 2 3 4
Now Find(4) is O(1)!
```
code_template: |
class UnionFind:
"""Union-Find with path compression and union by rank."""
def __init__(self, n: int):
self.parent = list(range(n))
self.rank = [0] * n
self.count = n # Number of disjoint sets
def find(self, x: int) -> int:
"""Find root with path compression."""
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x: int, y: int) -> bool:
"""Union by rank. Returns True if x and y were in different sets."""
root_x, root_y = self.find(x), self.find(y)
if root_x == root_y:
return False # Already in same set
# Union by rank
if self.rank[root_x] < self.rank[root_y]:
root_x, root_y = root_y, root_x
self.parent[root_y] = root_x
if self.rank[root_x] == self.rank[root_y]:
self.rank[root_x] += 1
self.count -= 1
return True
def connected(self, x: int, y: int) -> bool:
"""Check if x and y are in the same set."""
return self.find(x) == self.find(y)
def count_components(n: int, edges: list[list[int]]) -> int:
"""Count connected components in undirected graph."""
uf = UnionFind(n)
for u, v in edges:
uf.union(u, v)
return uf.count
def has_cycle(n: int, edges: list[list[int]]) -> bool:
"""Detect cycle in undirected graph."""
uf = UnionFind(n)
for u, v in edges:
if uf.connected(u, v):
return True # Adding edge creates cycle
uf.union(u, v)
return False
def kruskal_mst(n: int, edges: list[tuple[int, int, int]]) -> int:
"""Kruskal's MST algorithm using Union-Find."""
# edges are (weight, u, v)
edges.sort() # Sort by weight
uf = UnionFind(n)
mst_weight = 0
edges_used = 0
for weight, u, v in edges:
if uf.union(u, v):
mst_weight += weight
edges_used += 1
if edges_used == n - 1:
break
return mst_weight if edges_used == n - 1 else -1
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
"""Merge accounts with common emails."""
email_to_id = {}
email_to_name = {}
uf = UnionFind(len(accounts))
# Map emails to account indices
for i, account in enumerate(accounts):
name = account[0]
for email in account[1:]:
email_to_name[email] = name
if email in email_to_id:
uf.union(i, email_to_id[email])
else:
email_to_id[email] = i
# Group emails by root account
from collections import defaultdict
root_to_emails = defaultdict(set)
for email, idx in email_to_id.items():
root = uf.find(idx)
root_to_emails[root].add(email)
# Build result
return [[email_to_name[next(iter(emails))]] + sorted(emails)
for emails in root_to_emails.values()]
recognition_signals:
- "connected components"
- "disjoint sets"
- "union"
- "groups"
- "merge accounts"
- "friend circles"
- "detect cycle undirected"
- "Kruskal"
- "minimum spanning tree"
- "redundant connection"
- "equivalence"
common_mistakes:
- title: Forgetting path compression
description: |
Without path compression, repeated Find operations can be O(n) each,
degrading overall performance.
fix: |
Always compress paths during Find:
```python
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
```
- title: Using Union-Find for directed graphs
description: |
Union-Find assumes undirected connections. For directed graphs, cycles
mean something different (back edges in DFS).
fix: |
Use DFS with coloring (WHITE/GRAY/BLACK) for cycle detection in directed
graphs. Union-Find is for undirected connectivity.
- title: Not tracking component count
description: |
For problems asking "how many components," manually counting at the end
is inefficient.
fix: |
Decrement count in union when merging two different sets:
```python
if root_x != root_y:
self.count -= 1
```
- title: Union returning wrong information
description: |
Some solutions need to know if a union actually merged two sets or if
they were already connected.
fix: |
Return boolean from union indicating if merge happened:
```python
if root_x == root_y:
return False # Already same set
# ... do union ...
return True # Merged
```
variations:
- name: Basic connectivity
description: |
Track whether elements are in the same connected component.
example: "Number of Connected Components, Friend Circles"
- name: Cycle detection
description: |
If union is called on two already-connected elements, adding that edge
would create a cycle.
example: "Redundant Connection, Graph Valid Tree"
- name: Kruskal's MST
description: |
Sort edges by weight, greedily add edges that don't create cycles
(checked via Union-Find).
example: "Min Cost to Connect All Points, Connecting Cities With Minimum Cost"
- name: Dynamic connectivity
description: |
Handle streaming edge insertions while answering connectivity queries.
example: "Evaluate Division, Accounts Merge"
- name: Weighted Union-Find
description: |
Track relative weights/distances between elements and their roots.
Used in problems with equivalence relationships.
example: "Evaluate Division (weighted paths)"
related_patterns:
- dfs
- bfs
prerequisite_patterns: []