feat(patterns): data structure tutorials

This commit is contained in:
2025-08-23 19:25:47 +01:00
parent f105ffa677
commit 7bf6d1f472
4 changed files with 1194 additions and 0 deletions

View File

@@ -0,0 +1,307 @@
name: Heap / Priority Queue
slug: heap
difficulty_level: 3
description: >
A data structure that efficiently maintains the minimum or maximum element,
supporting O(log n) insertion and extraction. Heaps are essential when you
repeatedly need to access the smallest or largest element from a changing set.
when_to_use: |
- Finding K largest/smallest elements
- K-way merge of sorted lists
- Finding median from data stream
- Task scheduling by priority
- Dijkstra's shortest path algorithm
metaphor: |
Imagine a hospital emergency room where patients are treated by urgency, not
arrival time. A priority queue (heap) lets you always know who's next without
sorting everyone whenever someone new arrives. The most urgent patient "bubbles
up" to the front automatically.
Another analogy: a to-do list that always shows your most important task first.
When you add or complete tasks, the list reorganizes itself so the highest
priority is always accessible in O(1) time.
core_concept: |
A **heap** is a complete binary tree where each parent is smaller (min-heap) or
larger (max-heap) than its children. This property guarantees:
- **Peek min/max**: O(1) — it's always at the root
- **Insert**: O(log n) — bubble up to maintain heap property
- **Extract min/max**: O(log n) — remove root, bubble down to reheapify
Key insight: heaps don't fully sort the data. They only guarantee the root is
the min/max. This partial ordering is enough for many problems and is more
efficient than maintaining full sorted order.
**When to use heaps:**
- Need repeated access to min/max element
- Data changes frequently (insertions/deletions)
- Full sorting is overkill (only need top K, not all elements sorted)
visualization: |
**Min-Heap Structure:**
```
Array: [1, 3, 2, 7, 6, 4, 5]
As tree:
1 (index 0)
/ \
3 2 (indices 1, 2)
/ \ / \
7 6 4 5 (indices 3, 4, 5, 6)
Parent of index i: (i-1) // 2
Left child: 2*i + 1
Right child: 2*i + 2
```
**Inserting 0 into heap:**
```
Add 0 at end:
1
/ \
3 2
/ \ / \
7 6 4 5
/
0
Bubble up (0 < 7, swap):
1
/ \
3 2
/ \ / \
0 6 4 5
/
7
Bubble up (0 < 3, swap):
1
/ \
0 2
/ \ / \
3 6 4 5
/
7
Bubble up (0 < 1, swap):
0
/ \
1 2
/ \ / \
3 6 4 5
/
7
```
**Top K Elements using Min-Heap:**
```
Find 3 largest from [3, 1, 4, 1, 5, 9, 2, 6]
Maintain min-heap of size 3:
Process 3: heap = [3]
Process 1: heap = [1, 3]
Process 4: heap = [1, 3, 4]
Process 1: 1 <= heap[0]=1, skip
Process 5: 5 > 1, remove 1, add 5 → heap = [3, 5, 4]
Process 9: 9 > 3, remove 3, add 9 → heap = [4, 5, 9]
Process 2: 2 <= 4, skip
Process 6: 6 > 4, remove 4, add 6 → heap = [5, 9, 6]
Result: [5, 6, 9] are the top 3
```
code_template: |
import heapq
def find_k_largest(nums: list[int], k: int) -> list[int]:
"""Find k largest elements using min-heap."""
# Min-heap of size k keeps k largest
heap = []
for num in nums:
if len(heap) < k:
heapq.heappush(heap, num)
elif num > heap[0]:
heapq.heapreplace(heap, num) # Pop min, push new
return heap
def find_k_smallest(nums: list[int], k: int) -> list[int]:
"""Find k smallest elements using max-heap (negated values)."""
# Max-heap (negated) of size k keeps k smallest
heap = []
for num in nums:
if len(heap) < k:
heapq.heappush(heap, -num)
elif num < -heap[0]:
heapq.heapreplace(heap, -num)
return [-x for x in heap]
def merge_k_sorted_lists(lists: list[list[int]]) -> list[int]:
"""Merge k sorted lists using min-heap."""
heap = []
result = []
# Initialize heap with first element from each list
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst[0], i, 0))
while heap:
val, list_idx, elem_idx = heapq.heappop(heap)
result.append(val)
# Add next element from same list
if elem_idx + 1 < len(lists[list_idx]):
next_val = lists[list_idx][elem_idx + 1]
heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
return result
class MedianFinder:
"""Find median from data stream using two heaps."""
def __init__(self):
self.small = [] # Max-heap (negated) for smaller half
self.large = [] # Min-heap for larger half
def add_num(self, num: int) -> None:
# Add to max-heap (smaller half)
heapq.heappush(self.small, -num)
# Balance: largest of small should be <= smallest of large
if self.large and -self.small[0] > self.large[0]:
heapq.heappush(self.large, -heapq.heappop(self.small))
# Size balance: small can have at most 1 more element
if len(self.small) > len(self.large) + 1:
heapq.heappush(self.large, -heapq.heappop(self.small))
elif len(self.large) > len(self.small):
heapq.heappush(self.small, -heapq.heappop(self.large))
def find_median(self) -> float:
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
def kth_smallest_in_matrix(matrix: list[list[int]], k: int) -> int:
"""Find kth smallest in row-wise and column-wise sorted matrix."""
n = len(matrix)
heap = [(matrix[0][0], 0, 0)]
visited = {(0, 0)}
for _ in range(k - 1):
val, r, c = heapq.heappop(heap)
# Add right neighbor
if c + 1 < n and (r, c + 1) not in visited:
visited.add((r, c + 1))
heapq.heappush(heap, (matrix[r][c + 1], r, c + 1))
# Add bottom neighbor
if r + 1 < n and (r + 1, c) not in visited:
visited.add((r + 1, c))
heapq.heappush(heap, (matrix[r + 1][c], r + 1, c))
return heap[0][0]
recognition_signals:
- "kth largest"
- "kth smallest"
- "top k"
- "merge sorted"
- "median"
- "priority"
- "schedule"
- "Dijkstra"
- "frequency"
- "closest points"
common_mistakes:
- title: Using max-heap when min-heap needed (or vice versa)
description: |
Python's heapq is a min-heap. Using it directly for "k largest" keeps
k smallest instead.
fix: |
For max-heap behavior, negate values:
```python
heapq.heappush(heap, -num) # Push negative
max_val = -heapq.heappop(heap) # Negate back
```
- title: Wrong heap size for "top K" problems
description: |
For "k largest," keeping a max-heap of all elements and extracting k times
is O(n + k log n). Using min-heap of size k is O(n log k).
fix: |
For k largest: use min-heap of size k, remove smallest when full.
For k smallest: use max-heap of size k, remove largest when full.
- title: Forgetting tuple comparison order
description: |
When heap contains tuples, Python compares by first element, then second,
etc. If first elements are equal, comparison moves to second element.
fix: |
Put the comparison key first in the tuple:
```python
heapq.heappush(heap, (priority, item))
```
If items aren't comparable, use a counter as tiebreaker.
- title: Modifying heap elements directly
description: |
Changing an element's value after it's in the heap breaks heap property.
fix: |
Heaps don't support "decrease key" directly. Either: (1) use lazy deletion
(mark as invalid, skip when popped), or (2) re-heapify the entire heap.
variations:
- name: Top K elements
description: |
Keep k largest using min-heap of size k, or k smallest using max-heap
of size k.
example: "Kth Largest Element, Top K Frequent Elements"
- name: K-way merge
description: |
Merge k sorted lists efficiently by maintaining heap of current elements
from each list.
example: "Merge K Sorted Lists, Smallest Range Covering K Lists"
- name: Two heaps (median)
description: |
Maintain two heaps: max-heap for smaller half, min-heap for larger half.
Median is at the roots.
example: "Find Median from Data Stream, Sliding Window Median"
- name: Dijkstra's algorithm
description: |
Min-heap tracks vertices by shortest known distance. Extract minimum,
relax edges, update heap.
example: "Network Delay Time, Cheapest Flights Within K Stops"
- name: Task scheduling
description: |
Prioritize tasks by some criteria (deadline, duration). Process highest
priority first.
example: "Task Scheduler, Meeting Rooms III"
related_patterns:
- binary-search
- two-pointers
prerequisite_patterns: []

View File

@@ -0,0 +1,269 @@
name: Monotonic Stack
slug: monotonic-stack
difficulty_level: 3
description: >
Maintain a stack where elements are always in sorted order (either increasing or
decreasing). This enables efficient solutions for "next greater element" problems
by leveraging the stack's ability to track candidates that might be the answer
for future elements.
when_to_use: |
- Next greater/smaller element
- Previous greater/smaller element
- Largest rectangle in histogram
- Daily temperatures
- Stock span problems
metaphor: |
Imagine standing in a line of people of varying heights, all facing forward.
You want to know who's the next taller person for each person in line. The
trick: as you walk backward through the line, keep track of "potentially
useful" tall people. When you encounter someone taller than people you're
tracking, those shorter people will never be the answer—remove them. The
remaining stack always contains candidates in decreasing height order.
Another analogy: a bouncer at a club with height requirements. As people line
up, anyone shorter than the person in front can be removed from consideration—
they'll never be visible from the front.
core_concept: |
A **monotonic stack** maintains elements in sorted order by popping elements
that violate the ordering when pushing new ones:
- **Monotonically decreasing**: Pop elements smaller than current before pushing
- **Monotonically increasing**: Pop elements larger than current before pushing
The key insight is that when we pop an element, we've found its "next
greater/smaller"—it's the current element we're about to push. The stack
efficiently tracks candidates that might be answers for future elements.
**Pattern recognition:**
- "Next greater" → decreasing stack (pop when current > top)
- "Next smaller" → increasing stack (pop when current < top)
- "Previous greater/smaller" → process elements and query stack before pushing
visualization: |
**Next Greater Element:**
```
Array: [4, 5, 2, 10, 8]
Find next greater element for each
Process right to left (or left to right with index tracking):
Process 8: stack=[] → no greater, push 8
stack=[8] → answer[4] = -1
Process 10: stack=[8] → 10 > 8, pop 8
stack=[] → no greater, push 10
stack=[10] → answer[3] = -1
Process 2: stack=[10] → 2 < 10, don't pop
stack=[10,2] → answer[2] = 10
Process 5: stack=[10,2] → 5 > 2, pop 2
stack=[10] → 5 < 10, don't pop
stack=[10,5] → answer[1] = 10
Process 4: stack=[10,5] → 4 < 5, don't pop
stack=[10,5,4] → answer[0] = 5
Result: [5, 10, 10, -1, -1]
```
**Largest Rectangle in Histogram:**
```
Heights: [2, 1, 5, 6, 2, 3]
Use increasing stack (pop when current < top)
When popping, calculate rectangle with popped height as the smallest bar.
Process each bar:
- 2: push (0,2)
- 1: 1 < 2, pop (0,2) → width=1, area=2×1=2
push (0,1) [take popped index]
- 5: push (2,5)
- 6: push (3,6)
- 2: 2 < 6, pop (3,6) → width=1, area=6×1=6
2 < 5, pop (2,5) → width=2, area=5×2=10
push (2,2)
- 3: push (5,3)
- end: pop remaining, calculate areas
Max area = 10
```
code_template: |
def next_greater_element(nums: list[int]) -> list[int]:
"""Find next greater element for each position."""
n = len(nums)
result = [-1] * n
stack = [] # Stack of indices
for i in range(n):
# Pop elements smaller than current
while stack and nums[stack[-1]] < nums[i]:
idx = stack.pop()
result[idx] = nums[i]
stack.append(i)
return result
def next_smaller_element(nums: list[int]) -> list[int]:
"""Find next smaller element for each position."""
n = len(nums)
result = [-1] * n
stack = []
for i in range(n):
# Pop elements larger than current
while stack and nums[stack[-1]] > nums[i]:
idx = stack.pop()
result[idx] = nums[i]
stack.append(i)
return result
def daily_temperatures(temperatures: list[int]) -> list[int]:
"""Days until warmer temperature."""
n = len(temperatures)
result = [0] * n
stack = [] # Stack of indices
for i in range(n):
while stack and temperatures[stack[-1]] < temperatures[i]:
idx = stack.pop()
result[idx] = i - idx # Days difference
stack.append(i)
return result
def largest_rectangle_histogram(heights: list[int]) -> int:
"""Largest rectangle area in histogram."""
stack = [] # Stack of (index, height)
max_area = 0
for i, h in enumerate(heights):
start = i
while stack and stack[-1][1] > h:
idx, height = stack.pop()
max_area = max(max_area, height * (i - idx))
start = idx # This index can extend back
stack.append((start, h))
# Process remaining in stack
for idx, height in stack:
max_area = max(max_area, height * (len(heights) - idx))
return max_area
def stock_span(prices: list[int]) -> list[int]:
"""Days since last higher price (inclusive of today)."""
n = len(prices)
result = [0] * n
stack = [] # Stack of indices
for i in range(n):
while stack and prices[stack[-1]] <= prices[i]:
stack.pop()
# Span = distance to previous higher (or from start)
result[i] = i - stack[-1] if stack else i + 1
stack.append(i)
return result
recognition_signals:
- "next greater element"
- "next smaller element"
- "previous greater"
- "daily temperatures"
- "stock span"
- "largest rectangle"
- "histogram"
- "trapping rain water"
- "132 pattern"
- "buildings with ocean view"
common_mistakes:
- title: Wrong comparison direction
description: |
Using `<` when you should use `>` (or vice versa) results in the wrong
type of monotonic stack.
fix: |
Remember: "next greater" needs decreasing stack, so pop when `nums[top] < current`.
"Next smaller" needs increasing stack, so pop when `nums[top] > current`.
- title: Storing values instead of indices
description: |
Storing just values makes it impossible to calculate distances (like
"how many days until...").
fix: |
Store indices in the stack. You can always access `nums[stack[-1]]` for
the value when needed.
- title: Not processing remaining stack elements
description: |
Elements left in the stack after processing all input have no "next
greater/smaller" in the array.
fix: |
After the main loop, process remaining elements. For histogram problems,
their rectangle extends to the end. For "next greater," their answer is -1.
- title: Off-by-one with span calculations
description: |
Forgetting whether to include the current element in span calculations
gives wrong results.
fix: |
For span problems, if stack is empty, span = i + 1 (from beginning).
If stack has elements, span = i - stack[-1] (not +1 because previous
greater is exclusive).
variations:
- name: Next greater element
description: |
Find the first element to the right that is greater than current.
Decreasing monotonic stack.
example: "Next Greater Element I/II, Daily Temperatures"
- name: Next smaller element
description: |
Find the first element to the right that is smaller than current.
Increasing monotonic stack.
example: "Next Smaller Element"
- name: Previous greater/smaller
description: |
Query the stack before pushing to find the previous greater/smaller.
The top of stack is the answer.
example: "Stock Span, Buildings With Ocean View"
- name: Largest rectangle
description: |
Use increasing stack. When popping, calculate area using popped height
and width from popped index to current index.
example: "Largest Rectangle in Histogram, Maximal Rectangle"
- name: Trapping rain water
description: |
Can use monotonic stack to track left boundaries, calculating trapped
water when finding right boundary. (Alternative: two-pointer approach)
example: "Trapping Rain Water"
related_patterns:
- two-pointers
- sliding-window
prerequisite_patterns: []

View File

@@ -0,0 +1,305 @@
name: Trie
slug: trie
difficulty_level: 3
description: >
A tree-like data structure for efficient string prefix operations. Each node
represents a character, and paths from root to nodes spell out prefixes. Tries
enable O(m) search, insert, and prefix queries where m is the word length.
when_to_use: |
- Autocomplete systems
- Spell checkers
- Word dictionary with prefix search
- Word break problems
- IP routing (longest prefix matching)
metaphor: |
Imagine a filing cabinet where files are organized by name, one letter per
drawer. To find "apple," you open drawer 'a', then find sub-drawer 'p', then
'p', then 'l', then 'e'. You don't search through all files—you navigate
directly to the right location. Finding "application" shares the same path
up to "appl" before diverging.
Another analogy: a phone book organized as a tree. Instead of a flat
alphabetical list, common prefixes are grouped, making it fast to find all
names starting with "Joh" or check if "Johnson" exists.
core_concept: |
A **Trie** (pronounced "try") stores strings character by character:
- **Root**: Empty node representing the empty prefix
- **Edges**: Labeled with characters
- **Nodes**: Represent prefixes; may be marked as "end of word"
Key insight: all words sharing a prefix share the same path from root.
This makes prefix operations extremely efficient:
- **Insert word**: O(m) — create path from root
- **Search word**: O(m) — follow path, check end marker
- **Starts with prefix**: O(m) — just follow path, no end check needed
**Trade-off**: Tries use more memory than hash sets (each character is a node),
but enable prefix queries that hash sets cannot support.
visualization: |
**Trie containing: ["app", "apple", "apply", "apt", "bat"]**
```
(root)
/ \
a b
| |
p a
/ \ |
p t* t*
|
l
/ \
e* y*
* = end of word marker
Paths:
- "app" → a-p-p*
- "apple" → a-p-p-l-e*
- "apply" → a-p-p-l-y*
- "apt" → a-p-t*
- "bat" → b-a-t*
```
**Search for "apple":**
```
Start at root
→ 'a': found, move to 'a' node
→ 'p': found, move to 'p' node
→ 'p': found, move to second 'p' node
→ 'l': found, move to 'l' node
→ 'e': found, move to 'e' node
→ end of word marker? Yes!
"apple" exists ✓
```
**Search for "app":**
```
Follow path a-p-p
→ end of word marker on second 'p'? Yes!
"app" exists ✓
```
**Starts with "ap":**
```
Follow path a-p
→ reached end of prefix successfully
Words with prefix "ap" exist ✓
```
code_template: |
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
"""Insert a word into the trie."""
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
def search(self, word: str) -> bool:
"""Check if word exists in trie."""
node = self._traverse(word)
return node is not None and node.is_end
def starts_with(self, prefix: str) -> bool:
"""Check if any word starts with prefix."""
return self._traverse(prefix) is not None
def _traverse(self, s: str) -> TrieNode:
"""Traverse trie following string s."""
node = self.root
for char in s:
if char not in node.children:
return None
node = node.children[char]
return node
class WordDictionary:
"""Trie with wildcard search support."""
def __init__(self):
self.root = TrieNode()
def add_word(self, word: str) -> None:
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
def search(self, word: str) -> bool:
"""Search with '.' as wildcard for any character."""
def dfs(node: TrieNode, i: int) -> bool:
if i == len(word):
return node.is_end
char = word[i]
if char == '.':
# Try all children
return any(dfs(child, i + 1)
for child in node.children.values())
else:
if char not in node.children:
return False
return dfs(node.children[char], i + 1)
return dfs(self.root, 0)
def word_break(s: str, word_dict: list[str]) -> bool:
"""Check if string can be segmented into dictionary words."""
trie = Trie()
for word in word_dict:
trie.insert(word)
n = len(s)
dp = [False] * (n + 1)
dp[0] = True # Empty string can be segmented
for i in range(n):
if not dp[i]:
continue
node = trie.root
for j in range(i, n):
if s[j] not in node.children:
break
node = node.children[s[j]]
if node.is_end:
dp[j + 1] = True
return dp[n]
def find_words_with_prefix(trie: Trie, prefix: str) -> list[str]:
"""Find all words starting with prefix."""
node = trie._traverse(prefix)
if not node:
return []
results = []
def dfs(node: TrieNode, path: str):
if node.is_end:
results.append(path)
for char, child in node.children.items():
dfs(child, path + char)
dfs(node, prefix)
return results
recognition_signals:
- "prefix"
- "autocomplete"
- "word dictionary"
- "spell check"
- "word search"
- "word break"
- "longest common prefix"
- "starts with"
- "implement trie"
- "wildcard"
common_mistakes:
- title: Confusing search vs starts_with
description: |
Search checks if the exact word exists (must have end marker).
Starts_with only checks if the prefix path exists.
fix: |
For search, always check `node.is_end` at the end:
```python
def search(self, word):
node = self._traverse(word)
return node is not None and node.is_end
```
- title: Not handling empty string
description: |
Empty string is a valid prefix (everything starts with it) but may not
be a valid word in the dictionary.
fix: |
starts_with("") should return True if trie has any words.
search("") should return True only if empty string was explicitly inserted.
- title: Using array instead of dict for children
description: |
Using `children = [None] * 26` assumes only lowercase letters. This fails
for other character sets.
fix: |
Use a dictionary for flexibility:
```python
self.children = {} # Works for any characters
```
Or use array only when character set is known and fixed.
- title: Memory leaks when deleting words
description: |
Simply unmarking is_end doesn't free memory for nodes that are no longer
part of any word.
fix: |
For deletion, either: (1) accept memory isn't freed (common), or
(2) implement proper deletion that removes orphaned nodes bottom-up.
variations:
- name: Basic Trie
description: |
Standard insert, search, and prefix check operations.
example: "Implement Trie (Prefix Tree)"
- name: Wildcard search
description: |
Support '.' as wildcard matching any single character. Requires DFS
to explore all possibilities when encountering wildcard.
example: "Design Add and Search Words Data Structure"
- name: Word search in grid
description: |
Use Trie to efficiently search for multiple words in a 2D grid.
Prune branches that don't match any word prefix.
example: "Word Search II"
- name: Autocomplete
description: |
Find all words starting with a given prefix. DFS from the prefix
endpoint to collect all words.
example: "Design Search Autocomplete System"
- name: Compressed Trie (Radix Tree)
description: |
Merge chains of single-child nodes into one node with a string label.
Saves space for sparse tries.
example: "Longest Common Prefix optimizations"
related_patterns:
- dfs
- backtracking
- dynamic-programming
prerequisite_patterns: []

View File

@@ -0,0 +1,313 @@
name: Union Find
slug: union-find
difficulty_level: 3
description: >
Track disjoint sets with efficient union and find operations. Union-Find
(also called Disjoint Set Union) excels at dynamically grouping elements and
answering "are these two elements in the same group?" queries.
when_to_use: |
- Finding connected components
- Detecting cycles in undirected graphs
- Kruskal's minimum spanning tree
- Dynamic connectivity queries
- Grouping related items (accounts merge, friend circles)
metaphor: |
Imagine a social network where you want to know if two people are connected
(directly or through friends of friends). Instead of searching the entire
network each time, everyone in a connected group points to a group leader.
To check if two people are connected, just check if they have the same leader.
When groups merge (someone bridges two groups), you just update one leader to
point to the other.
Another analogy: corporate acquisitions. Each company has a parent company
(possibly itself). When companies merge, one becomes a subsidiary of the other.
To find the ultimate parent, you follow the chain of ownership.
core_concept: |
Union-Find maintains a forest of trees where each tree represents a set.
Each element points to its parent, and the root of the tree is the set's
representative.
**Two key operations:**
- **Find(x)**: Return the root (representative) of x's set
- **Union(x, y)**: Merge the sets containing x and y
**Two key optimizations:**
- **Path compression**: During Find, make each node point directly to root.
This flattens the tree for future queries.
- **Union by rank/size**: Always attach the smaller tree under the larger.
This keeps trees shallow.
With both optimizations, operations run in nearly O(1) time—technically
O(α(n)) where α is the inverse Ackermann function, which is ≤ 4 for any
practical input size.
visualization: |
**Initial state (each element is its own set):**
```
parent: [0, 1, 2, 3, 4] (each points to itself)
Sets: {0}, {1}, {2}, {3}, {4}
```
**Union(0, 1):**
```
parent: [0, 0, 2, 3, 4] (1 now points to 0)
0
|
1
Sets: {0, 1}, {2}, {3}, {4}
```
**Union(2, 3) then Union(3, 4):**
```
parent: [0, 0, 2, 2, 3]
0 2
| / \
1 3 (direct)
|
4
Sets: {0, 1}, {2, 3, 4}
```
**Union(1, 4) — merges the two trees:**
```
Find(1) = 0, Find(4) = 2
Union by rank: attach smaller under larger
parent: [0, 0, 0, 2, 3]
0
/|
1 2
/ \
3 (direct)
|
4
Sets: {0, 1, 2, 3, 4}
```
**Path compression during Find(4):**
```
Find(4): 4 → 3 → 2 → 0 (found root)
Compress: make 4, 3, 2 all point directly to 0
parent: [0, 0, 0, 0, 0]
0
/|\ \
1 2 3 4
Now Find(4) is O(1)!
```
code_template: |
class UnionFind:
"""Union-Find with path compression and union by rank."""
def __init__(self, n: int):
self.parent = list(range(n))
self.rank = [0] * n
self.count = n # Number of disjoint sets
def find(self, x: int) -> int:
"""Find root with path compression."""
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x: int, y: int) -> bool:
"""Union by rank. Returns True if x and y were in different sets."""
root_x, root_y = self.find(x), self.find(y)
if root_x == root_y:
return False # Already in same set
# Union by rank
if self.rank[root_x] < self.rank[root_y]:
root_x, root_y = root_y, root_x
self.parent[root_y] = root_x
if self.rank[root_x] == self.rank[root_y]:
self.rank[root_x] += 1
self.count -= 1
return True
def connected(self, x: int, y: int) -> bool:
"""Check if x and y are in the same set."""
return self.find(x) == self.find(y)
def count_components(n: int, edges: list[list[int]]) -> int:
"""Count connected components in undirected graph."""
uf = UnionFind(n)
for u, v in edges:
uf.union(u, v)
return uf.count
def has_cycle(n: int, edges: list[list[int]]) -> bool:
"""Detect cycle in undirected graph."""
uf = UnionFind(n)
for u, v in edges:
if uf.connected(u, v):
return True # Adding edge creates cycle
uf.union(u, v)
return False
def kruskal_mst(n: int, edges: list[tuple[int, int, int]]) -> int:
"""Kruskal's MST algorithm using Union-Find."""
# edges are (weight, u, v)
edges.sort() # Sort by weight
uf = UnionFind(n)
mst_weight = 0
edges_used = 0
for weight, u, v in edges:
if uf.union(u, v):
mst_weight += weight
edges_used += 1
if edges_used == n - 1:
break
return mst_weight if edges_used == n - 1 else -1
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
"""Merge accounts with common emails."""
email_to_id = {}
email_to_name = {}
uf = UnionFind(len(accounts))
# Map emails to account indices
for i, account in enumerate(accounts):
name = account[0]
for email in account[1:]:
email_to_name[email] = name
if email in email_to_id:
uf.union(i, email_to_id[email])
else:
email_to_id[email] = i
# Group emails by root account
from collections import defaultdict
root_to_emails = defaultdict(set)
for email, idx in email_to_id.items():
root = uf.find(idx)
root_to_emails[root].add(email)
# Build result
return [[email_to_name[next(iter(emails))]] + sorted(emails)
for emails in root_to_emails.values()]
recognition_signals:
- "connected components"
- "disjoint sets"
- "union"
- "groups"
- "merge accounts"
- "friend circles"
- "detect cycle undirected"
- "Kruskal"
- "minimum spanning tree"
- "redundant connection"
- "equivalence"
common_mistakes:
- title: Forgetting path compression
description: |
Without path compression, repeated Find operations can be O(n) each,
degrading overall performance.
fix: |
Always compress paths during Find:
```python
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
```
- title: Using Union-Find for directed graphs
description: |
Union-Find assumes undirected connections. For directed graphs, cycles
mean something different (back edges in DFS).
fix: |
Use DFS with coloring (WHITE/GRAY/BLACK) for cycle detection in directed
graphs. Union-Find is for undirected connectivity.
- title: Not tracking component count
description: |
For problems asking "how many components," manually counting at the end
is inefficient.
fix: |
Decrement count in union when merging two different sets:
```python
if root_x != root_y:
self.count -= 1
```
- title: Union returning wrong information
description: |
Some solutions need to know if a union actually merged two sets or if
they were already connected.
fix: |
Return boolean from union indicating if merge happened:
```python
if root_x == root_y:
return False # Already same set
# ... do union ...
return True # Merged
```
variations:
- name: Basic connectivity
description: |
Track whether elements are in the same connected component.
example: "Number of Connected Components, Friend Circles"
- name: Cycle detection
description: |
If union is called on two already-connected elements, adding that edge
would create a cycle.
example: "Redundant Connection, Graph Valid Tree"
- name: Kruskal's MST
description: |
Sort edges by weight, greedily add edges that don't create cycles
(checked via Union-Find).
example: "Min Cost to Connect All Points, Connecting Cities With Minimum Cost"
- name: Dynamic connectivity
description: |
Handle streaming edge insertions while answering connectivity queries.
example: "Evaluate Division, Accounts Merge"
- name: Weighted Union-Find
description: |
Track relative weights/distances between elements and their roots.
Used in problems with equivalence relationships.
example: "Evaluate Division (weighted paths)"
related_patterns:
- dfs
- bfs
prerequisite_patterns: []