feat(patterns): data structure tutorials
This commit is contained in:
307
backend/data/patterns/heap.yaml
Normal file
307
backend/data/patterns/heap.yaml
Normal file
@@ -0,0 +1,307 @@
|
||||
name: Heap / Priority Queue
|
||||
slug: heap
|
||||
difficulty_level: 3
|
||||
|
||||
description: >
|
||||
A data structure that efficiently maintains the minimum or maximum element,
|
||||
supporting O(log n) insertion and extraction. Heaps are essential when you
|
||||
repeatedly need to access the smallest or largest element from a changing set.
|
||||
|
||||
when_to_use: |
|
||||
- Finding K largest/smallest elements
|
||||
- K-way merge of sorted lists
|
||||
- Finding median from data stream
|
||||
- Task scheduling by priority
|
||||
- Dijkstra's shortest path algorithm
|
||||
|
||||
metaphor: |
|
||||
Imagine a hospital emergency room where patients are treated by urgency, not
|
||||
arrival time. A priority queue (heap) lets you always know who's next without
|
||||
sorting everyone whenever someone new arrives. The most urgent patient "bubbles
|
||||
up" to the front automatically.
|
||||
|
||||
Another analogy: a to-do list that always shows your most important task first.
|
||||
When you add or complete tasks, the list reorganizes itself so the highest
|
||||
priority is always accessible in O(1) time.
|
||||
|
||||
core_concept: |
|
||||
A **heap** is a complete binary tree where each parent is smaller (min-heap) or
|
||||
larger (max-heap) than its children. This property guarantees:
|
||||
|
||||
- **Peek min/max**: O(1) — it's always at the root
|
||||
- **Insert**: O(log n) — bubble up to maintain heap property
|
||||
- **Extract min/max**: O(log n) — remove root, bubble down to reheapify
|
||||
|
||||
Key insight: heaps don't fully sort the data. They only guarantee the root is
|
||||
the min/max. This partial ordering is enough for many problems and is more
|
||||
efficient than maintaining full sorted order.
|
||||
|
||||
**When to use heaps:**
|
||||
- Need repeated access to min/max element
|
||||
- Data changes frequently (insertions/deletions)
|
||||
- Full sorting is overkill (only need top K, not all elements sorted)
|
||||
|
||||
visualization: |
|
||||
**Min-Heap Structure:**
|
||||
|
||||
```
|
||||
Array: [1, 3, 2, 7, 6, 4, 5]
|
||||
|
||||
As tree:
|
||||
1 (index 0)
|
||||
/ \
|
||||
3 2 (indices 1, 2)
|
||||
/ \ / \
|
||||
7 6 4 5 (indices 3, 4, 5, 6)
|
||||
|
||||
Parent of index i: (i-1) // 2
|
||||
Left child: 2*i + 1
|
||||
Right child: 2*i + 2
|
||||
```
|
||||
|
||||
**Inserting 0 into heap:**
|
||||
|
||||
```
|
||||
Add 0 at end:
|
||||
1
|
||||
/ \
|
||||
3 2
|
||||
/ \ / \
|
||||
7 6 4 5
|
||||
/
|
||||
0
|
||||
|
||||
Bubble up (0 < 7, swap):
|
||||
1
|
||||
/ \
|
||||
3 2
|
||||
/ \ / \
|
||||
0 6 4 5
|
||||
/
|
||||
7
|
||||
|
||||
Bubble up (0 < 3, swap):
|
||||
1
|
||||
/ \
|
||||
0 2
|
||||
/ \ / \
|
||||
3 6 4 5
|
||||
/
|
||||
7
|
||||
|
||||
Bubble up (0 < 1, swap):
|
||||
0
|
||||
/ \
|
||||
1 2
|
||||
/ \ / \
|
||||
3 6 4 5
|
||||
/
|
||||
7
|
||||
```
|
||||
|
||||
**Top K Elements using Min-Heap:**
|
||||
|
||||
```
|
||||
Find 3 largest from [3, 1, 4, 1, 5, 9, 2, 6]
|
||||
|
||||
Maintain min-heap of size 3:
|
||||
|
||||
Process 3: heap = [3]
|
||||
Process 1: heap = [1, 3]
|
||||
Process 4: heap = [1, 3, 4]
|
||||
Process 1: 1 <= heap[0]=1, skip
|
||||
Process 5: 5 > 1, remove 1, add 5 → heap = [3, 5, 4]
|
||||
Process 9: 9 > 3, remove 3, add 9 → heap = [4, 5, 9]
|
||||
Process 2: 2 <= 4, skip
|
||||
Process 6: 6 > 4, remove 4, add 6 → heap = [5, 9, 6]
|
||||
|
||||
Result: [5, 6, 9] are the top 3
|
||||
```
|
||||
|
||||
code_template: |
|
||||
import heapq
|
||||
|
||||
def find_k_largest(nums: list[int], k: int) -> list[int]:
|
||||
"""Find k largest elements using min-heap."""
|
||||
# Min-heap of size k keeps k largest
|
||||
heap = []
|
||||
|
||||
for num in nums:
|
||||
if len(heap) < k:
|
||||
heapq.heappush(heap, num)
|
||||
elif num > heap[0]:
|
||||
heapq.heapreplace(heap, num) # Pop min, push new
|
||||
|
||||
return heap
|
||||
|
||||
|
||||
def find_k_smallest(nums: list[int], k: int) -> list[int]:
|
||||
"""Find k smallest elements using max-heap (negated values)."""
|
||||
# Max-heap (negated) of size k keeps k smallest
|
||||
heap = []
|
||||
|
||||
for num in nums:
|
||||
if len(heap) < k:
|
||||
heapq.heappush(heap, -num)
|
||||
elif num < -heap[0]:
|
||||
heapq.heapreplace(heap, -num)
|
||||
|
||||
return [-x for x in heap]
|
||||
|
||||
|
||||
def merge_k_sorted_lists(lists: list[list[int]]) -> list[int]:
|
||||
"""Merge k sorted lists using min-heap."""
|
||||
heap = []
|
||||
result = []
|
||||
|
||||
# Initialize heap with first element from each list
|
||||
for i, lst in enumerate(lists):
|
||||
if lst:
|
||||
heapq.heappush(heap, (lst[0], i, 0))
|
||||
|
||||
while heap:
|
||||
val, list_idx, elem_idx = heapq.heappop(heap)
|
||||
result.append(val)
|
||||
|
||||
# Add next element from same list
|
||||
if elem_idx + 1 < len(lists[list_idx]):
|
||||
next_val = lists[list_idx][elem_idx + 1]
|
||||
heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
|
||||
|
||||
return result
|
||||
|
||||
|
||||
class MedianFinder:
|
||||
"""Find median from data stream using two heaps."""
|
||||
|
||||
def __init__(self):
|
||||
self.small = [] # Max-heap (negated) for smaller half
|
||||
self.large = [] # Min-heap for larger half
|
||||
|
||||
def add_num(self, num: int) -> None:
|
||||
# Add to max-heap (smaller half)
|
||||
heapq.heappush(self.small, -num)
|
||||
|
||||
# Balance: largest of small should be <= smallest of large
|
||||
if self.large and -self.small[0] > self.large[0]:
|
||||
heapq.heappush(self.large, -heapq.heappop(self.small))
|
||||
|
||||
# Size balance: small can have at most 1 more element
|
||||
if len(self.small) > len(self.large) + 1:
|
||||
heapq.heappush(self.large, -heapq.heappop(self.small))
|
||||
elif len(self.large) > len(self.small):
|
||||
heapq.heappush(self.small, -heapq.heappop(self.large))
|
||||
|
||||
def find_median(self) -> float:
|
||||
if len(self.small) > len(self.large):
|
||||
return -self.small[0]
|
||||
return (-self.small[0] + self.large[0]) / 2
|
||||
|
||||
|
||||
def kth_smallest_in_matrix(matrix: list[list[int]], k: int) -> int:
|
||||
"""Find kth smallest in row-wise and column-wise sorted matrix."""
|
||||
n = len(matrix)
|
||||
heap = [(matrix[0][0], 0, 0)]
|
||||
visited = {(0, 0)}
|
||||
|
||||
for _ in range(k - 1):
|
||||
val, r, c = heapq.heappop(heap)
|
||||
|
||||
# Add right neighbor
|
||||
if c + 1 < n and (r, c + 1) not in visited:
|
||||
visited.add((r, c + 1))
|
||||
heapq.heappush(heap, (matrix[r][c + 1], r, c + 1))
|
||||
|
||||
# Add bottom neighbor
|
||||
if r + 1 < n and (r + 1, c) not in visited:
|
||||
visited.add((r + 1, c))
|
||||
heapq.heappush(heap, (matrix[r + 1][c], r + 1, c))
|
||||
|
||||
return heap[0][0]
|
||||
|
||||
recognition_signals:
|
||||
- "kth largest"
|
||||
- "kth smallest"
|
||||
- "top k"
|
||||
- "merge sorted"
|
||||
- "median"
|
||||
- "priority"
|
||||
- "schedule"
|
||||
- "Dijkstra"
|
||||
- "frequency"
|
||||
- "closest points"
|
||||
|
||||
common_mistakes:
|
||||
- title: Using max-heap when min-heap needed (or vice versa)
|
||||
description: |
|
||||
Python's heapq is a min-heap. Using it directly for "k largest" keeps
|
||||
k smallest instead.
|
||||
fix: |
|
||||
For max-heap behavior, negate values:
|
||||
```python
|
||||
heapq.heappush(heap, -num) # Push negative
|
||||
max_val = -heapq.heappop(heap) # Negate back
|
||||
```
|
||||
|
||||
- title: Wrong heap size for "top K" problems
|
||||
description: |
|
||||
For "k largest," keeping a max-heap of all elements and extracting k times
|
||||
is O(n + k log n). Using min-heap of size k is O(n log k).
|
||||
fix: |
|
||||
For k largest: use min-heap of size k, remove smallest when full.
|
||||
For k smallest: use max-heap of size k, remove largest when full.
|
||||
|
||||
- title: Forgetting tuple comparison order
|
||||
description: |
|
||||
When heap contains tuples, Python compares by first element, then second,
|
||||
etc. If first elements are equal, comparison moves to second element.
|
||||
fix: |
|
||||
Put the comparison key first in the tuple:
|
||||
```python
|
||||
heapq.heappush(heap, (priority, item))
|
||||
```
|
||||
If items aren't comparable, use a counter as tiebreaker.
|
||||
|
||||
- title: Modifying heap elements directly
|
||||
description: |
|
||||
Changing an element's value after it's in the heap breaks heap property.
|
||||
fix: |
|
||||
Heaps don't support "decrease key" directly. Either: (1) use lazy deletion
|
||||
(mark as invalid, skip when popped), or (2) re-heapify the entire heap.
|
||||
|
||||
variations:
|
||||
- name: Top K elements
|
||||
description: |
|
||||
Keep k largest using min-heap of size k, or k smallest using max-heap
|
||||
of size k.
|
||||
example: "Kth Largest Element, Top K Frequent Elements"
|
||||
|
||||
- name: K-way merge
|
||||
description: |
|
||||
Merge k sorted lists efficiently by maintaining heap of current elements
|
||||
from each list.
|
||||
example: "Merge K Sorted Lists, Smallest Range Covering K Lists"
|
||||
|
||||
- name: Two heaps (median)
|
||||
description: |
|
||||
Maintain two heaps: max-heap for smaller half, min-heap for larger half.
|
||||
Median is at the roots.
|
||||
example: "Find Median from Data Stream, Sliding Window Median"
|
||||
|
||||
- name: Dijkstra's algorithm
|
||||
description: |
|
||||
Min-heap tracks vertices by shortest known distance. Extract minimum,
|
||||
relax edges, update heap.
|
||||
example: "Network Delay Time, Cheapest Flights Within K Stops"
|
||||
|
||||
- name: Task scheduling
|
||||
description: |
|
||||
Prioritize tasks by some criteria (deadline, duration). Process highest
|
||||
priority first.
|
||||
example: "Task Scheduler, Meeting Rooms III"
|
||||
|
||||
related_patterns:
|
||||
- binary-search
|
||||
- two-pointers
|
||||
|
||||
prerequisite_patterns: []
|
||||
Reference in New Issue
Block a user