feat(patterns): data structure tutorials
This commit is contained in:
313
backend/data/patterns/union-find.yaml
Normal file
313
backend/data/patterns/union-find.yaml
Normal file
@@ -0,0 +1,313 @@
|
||||
name: Union Find
|
||||
slug: union-find
|
||||
difficulty_level: 3
|
||||
|
||||
description: >
|
||||
Track disjoint sets with efficient union and find operations. Union-Find
|
||||
(also called Disjoint Set Union) excels at dynamically grouping elements and
|
||||
answering "are these two elements in the same group?" queries.
|
||||
|
||||
when_to_use: |
|
||||
- Finding connected components
|
||||
- Detecting cycles in undirected graphs
|
||||
- Kruskal's minimum spanning tree
|
||||
- Dynamic connectivity queries
|
||||
- Grouping related items (accounts merge, friend circles)
|
||||
|
||||
metaphor: |
|
||||
Imagine a social network where you want to know if two people are connected
|
||||
(directly or through friends of friends). Instead of searching the entire
|
||||
network each time, everyone in a connected group points to a group leader.
|
||||
To check if two people are connected, just check if they have the same leader.
|
||||
When groups merge (someone bridges two groups), you just update one leader to
|
||||
point to the other.
|
||||
|
||||
Another analogy: corporate acquisitions. Each company has a parent company
|
||||
(possibly itself). When companies merge, one becomes a subsidiary of the other.
|
||||
To find the ultimate parent, you follow the chain of ownership.
|
||||
|
||||
core_concept: |
|
||||
Union-Find maintains a forest of trees where each tree represents a set.
|
||||
Each element points to its parent, and the root of the tree is the set's
|
||||
representative.
|
||||
|
||||
**Two key operations:**
|
||||
- **Find(x)**: Return the root (representative) of x's set
|
||||
- **Union(x, y)**: Merge the sets containing x and y
|
||||
|
||||
**Two key optimizations:**
|
||||
- **Path compression**: During Find, make each node point directly to root.
|
||||
This flattens the tree for future queries.
|
||||
- **Union by rank/size**: Always attach the smaller tree under the larger.
|
||||
This keeps trees shallow.
|
||||
|
||||
With both optimizations, operations run in nearly O(1) time—technically
|
||||
O(α(n)) where α is the inverse Ackermann function, which is ≤ 4 for any
|
||||
practical input size.
|
||||
|
||||
visualization: |
|
||||
**Initial state (each element is its own set):**
|
||||
|
||||
```
|
||||
parent: [0, 1, 2, 3, 4] (each points to itself)
|
||||
|
||||
Sets: {0}, {1}, {2}, {3}, {4}
|
||||
```
|
||||
|
||||
**Union(0, 1):**
|
||||
|
||||
```
|
||||
parent: [0, 0, 2, 3, 4] (1 now points to 0)
|
||||
|
||||
0
|
||||
|
|
||||
1
|
||||
|
||||
Sets: {0, 1}, {2}, {3}, {4}
|
||||
```
|
||||
|
||||
**Union(2, 3) then Union(3, 4):**
|
||||
|
||||
```
|
||||
parent: [0, 0, 2, 2, 3]
|
||||
|
||||
0 2
|
||||
| / \
|
||||
1 3 (direct)
|
||||
|
|
||||
4
|
||||
|
||||
Sets: {0, 1}, {2, 3, 4}
|
||||
```
|
||||
|
||||
**Union(1, 4) — merges the two trees:**
|
||||
|
||||
```
|
||||
Find(1) = 0, Find(4) = 2
|
||||
Union by rank: attach smaller under larger
|
||||
|
||||
parent: [0, 0, 0, 2, 3]
|
||||
|
||||
0
|
||||
/|
|
||||
1 2
|
||||
/ \
|
||||
3 (direct)
|
||||
|
|
||||
4
|
||||
|
||||
Sets: {0, 1, 2, 3, 4}
|
||||
```
|
||||
|
||||
**Path compression during Find(4):**
|
||||
|
||||
```
|
||||
Find(4): 4 → 3 → 2 → 0 (found root)
|
||||
Compress: make 4, 3, 2 all point directly to 0
|
||||
|
||||
parent: [0, 0, 0, 0, 0]
|
||||
|
||||
0
|
||||
/|\ \
|
||||
1 2 3 4
|
||||
|
||||
Now Find(4) is O(1)!
|
||||
```
|
||||
|
||||
code_template: |
|
||||
class UnionFind:
|
||||
"""Union-Find with path compression and union by rank."""
|
||||
|
||||
def __init__(self, n: int):
|
||||
self.parent = list(range(n))
|
||||
self.rank = [0] * n
|
||||
self.count = n # Number of disjoint sets
|
||||
|
||||
def find(self, x: int) -> int:
|
||||
"""Find root with path compression."""
|
||||
if self.parent[x] != x:
|
||||
self.parent[x] = self.find(self.parent[x])
|
||||
return self.parent[x]
|
||||
|
||||
def union(self, x: int, y: int) -> bool:
|
||||
"""Union by rank. Returns True if x and y were in different sets."""
|
||||
root_x, root_y = self.find(x), self.find(y)
|
||||
|
||||
if root_x == root_y:
|
||||
return False # Already in same set
|
||||
|
||||
# Union by rank
|
||||
if self.rank[root_x] < self.rank[root_y]:
|
||||
root_x, root_y = root_y, root_x
|
||||
|
||||
self.parent[root_y] = root_x
|
||||
|
||||
if self.rank[root_x] == self.rank[root_y]:
|
||||
self.rank[root_x] += 1
|
||||
|
||||
self.count -= 1
|
||||
return True
|
||||
|
||||
def connected(self, x: int, y: int) -> bool:
|
||||
"""Check if x and y are in the same set."""
|
||||
return self.find(x) == self.find(y)
|
||||
|
||||
|
||||
def count_components(n: int, edges: list[list[int]]) -> int:
|
||||
"""Count connected components in undirected graph."""
|
||||
uf = UnionFind(n)
|
||||
|
||||
for u, v in edges:
|
||||
uf.union(u, v)
|
||||
|
||||
return uf.count
|
||||
|
||||
|
||||
def has_cycle(n: int, edges: list[list[int]]) -> bool:
|
||||
"""Detect cycle in undirected graph."""
|
||||
uf = UnionFind(n)
|
||||
|
||||
for u, v in edges:
|
||||
if uf.connected(u, v):
|
||||
return True # Adding edge creates cycle
|
||||
uf.union(u, v)
|
||||
|
||||
return False
|
||||
|
||||
|
||||
def kruskal_mst(n: int, edges: list[tuple[int, int, int]]) -> int:
|
||||
"""Kruskal's MST algorithm using Union-Find."""
|
||||
# edges are (weight, u, v)
|
||||
edges.sort() # Sort by weight
|
||||
uf = UnionFind(n)
|
||||
mst_weight = 0
|
||||
edges_used = 0
|
||||
|
||||
for weight, u, v in edges:
|
||||
if uf.union(u, v):
|
||||
mst_weight += weight
|
||||
edges_used += 1
|
||||
if edges_used == n - 1:
|
||||
break
|
||||
|
||||
return mst_weight if edges_used == n - 1 else -1
|
||||
|
||||
|
||||
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
|
||||
"""Merge accounts with common emails."""
|
||||
email_to_id = {}
|
||||
email_to_name = {}
|
||||
uf = UnionFind(len(accounts))
|
||||
|
||||
# Map emails to account indices
|
||||
for i, account in enumerate(accounts):
|
||||
name = account[0]
|
||||
for email in account[1:]:
|
||||
email_to_name[email] = name
|
||||
if email in email_to_id:
|
||||
uf.union(i, email_to_id[email])
|
||||
else:
|
||||
email_to_id[email] = i
|
||||
|
||||
# Group emails by root account
|
||||
from collections import defaultdict
|
||||
root_to_emails = defaultdict(set)
|
||||
for email, idx in email_to_id.items():
|
||||
root = uf.find(idx)
|
||||
root_to_emails[root].add(email)
|
||||
|
||||
# Build result
|
||||
return [[email_to_name[next(iter(emails))]] + sorted(emails)
|
||||
for emails in root_to_emails.values()]
|
||||
|
||||
recognition_signals:
|
||||
- "connected components"
|
||||
- "disjoint sets"
|
||||
- "union"
|
||||
- "groups"
|
||||
- "merge accounts"
|
||||
- "friend circles"
|
||||
- "detect cycle undirected"
|
||||
- "Kruskal"
|
||||
- "minimum spanning tree"
|
||||
- "redundant connection"
|
||||
- "equivalence"
|
||||
|
||||
common_mistakes:
|
||||
- title: Forgetting path compression
|
||||
description: |
|
||||
Without path compression, repeated Find operations can be O(n) each,
|
||||
degrading overall performance.
|
||||
fix: |
|
||||
Always compress paths during Find:
|
||||
```python
|
||||
if self.parent[x] != x:
|
||||
self.parent[x] = self.find(self.parent[x])
|
||||
```
|
||||
|
||||
- title: Using Union-Find for directed graphs
|
||||
description: |
|
||||
Union-Find assumes undirected connections. For directed graphs, cycles
|
||||
mean something different (back edges in DFS).
|
||||
fix: |
|
||||
Use DFS with coloring (WHITE/GRAY/BLACK) for cycle detection in directed
|
||||
graphs. Union-Find is for undirected connectivity.
|
||||
|
||||
- title: Not tracking component count
|
||||
description: |
|
||||
For problems asking "how many components," manually counting at the end
|
||||
is inefficient.
|
||||
fix: |
|
||||
Decrement count in union when merging two different sets:
|
||||
```python
|
||||
if root_x != root_y:
|
||||
self.count -= 1
|
||||
```
|
||||
|
||||
- title: Union returning wrong information
|
||||
description: |
|
||||
Some solutions need to know if a union actually merged two sets or if
|
||||
they were already connected.
|
||||
fix: |
|
||||
Return boolean from union indicating if merge happened:
|
||||
```python
|
||||
if root_x == root_y:
|
||||
return False # Already same set
|
||||
# ... do union ...
|
||||
return True # Merged
|
||||
```
|
||||
|
||||
variations:
|
||||
- name: Basic connectivity
|
||||
description: |
|
||||
Track whether elements are in the same connected component.
|
||||
example: "Number of Connected Components, Friend Circles"
|
||||
|
||||
- name: Cycle detection
|
||||
description: |
|
||||
If union is called on two already-connected elements, adding that edge
|
||||
would create a cycle.
|
||||
example: "Redundant Connection, Graph Valid Tree"
|
||||
|
||||
- name: Kruskal's MST
|
||||
description: |
|
||||
Sort edges by weight, greedily add edges that don't create cycles
|
||||
(checked via Union-Find).
|
||||
example: "Min Cost to Connect All Points, Connecting Cities With Minimum Cost"
|
||||
|
||||
- name: Dynamic connectivity
|
||||
description: |
|
||||
Handle streaming edge insertions while answering connectivity queries.
|
||||
example: "Evaluate Division, Accounts Merge"
|
||||
|
||||
- name: Weighted Union-Find
|
||||
description: |
|
||||
Track relative weights/distances between elements and their roots.
|
||||
Used in problems with equivalence relationships.
|
||||
example: "Evaluate Division (weighted paths)"
|
||||
|
||||
related_patterns:
|
||||
- dfs
|
||||
- bfs
|
||||
|
||||
prerequisite_patterns: []
|
||||
Reference in New Issue
Block a user