feat(patterns): data structure tutorials

This commit is contained in:
2025-08-23 19:25:47 +01:00
parent 8ddd48508a
commit cd09974c2a
4 changed files with 1194 additions and 0 deletions

View File

@@ -0,0 +1,307 @@
name: Heap / Priority Queue
slug: heap
difficulty_level: 3
description: >
A data structure that efficiently maintains the minimum or maximum element,
supporting O(log n) insertion and extraction. Heaps are essential when you
repeatedly need to access the smallest or largest element from a changing set.
when_to_use: |
- Finding K largest/smallest elements
- K-way merge of sorted lists
- Finding median from data stream
- Task scheduling by priority
- Dijkstra's shortest path algorithm
metaphor: |
Imagine a hospital emergency room where patients are treated by urgency, not
arrival time. A priority queue (heap) lets you always know who's next without
sorting everyone whenever someone new arrives. The most urgent patient "bubbles
up" to the front automatically.
Another analogy: a to-do list that always shows your most important task first.
When you add or complete tasks, the list reorganizes itself so the highest
priority is always accessible in O(1) time.
core_concept: |
A **heap** is a complete binary tree where each parent is smaller (min-heap) or
larger (max-heap) than its children. This property guarantees:
- **Peek min/max**: O(1) — it's always at the root
- **Insert**: O(log n) — bubble up to maintain heap property
- **Extract min/max**: O(log n) — remove root, bubble down to reheapify
Key insight: heaps don't fully sort the data. They only guarantee the root is
the min/max. This partial ordering is enough for many problems and is more
efficient than maintaining full sorted order.
**When to use heaps:**
- Need repeated access to min/max element
- Data changes frequently (insertions/deletions)
- Full sorting is overkill (only need top K, not all elements sorted)
visualization: |
**Min-Heap Structure:**
```
Array: [1, 3, 2, 7, 6, 4, 5]
As tree:
1 (index 0)
/ \
3 2 (indices 1, 2)
/ \ / \
7 6 4 5 (indices 3, 4, 5, 6)
Parent of index i: (i-1) // 2
Left child: 2*i + 1
Right child: 2*i + 2
```
**Inserting 0 into heap:**
```
Add 0 at end:
1
/ \
3 2
/ \ / \
7 6 4 5
/
0
Bubble up (0 < 7, swap):
1
/ \
3 2
/ \ / \
0 6 4 5
/
7
Bubble up (0 < 3, swap):
1
/ \
0 2
/ \ / \
3 6 4 5
/
7
Bubble up (0 < 1, swap):
0
/ \
1 2
/ \ / \
3 6 4 5
/
7
```
**Top K Elements using Min-Heap:**
```
Find 3 largest from [3, 1, 4, 1, 5, 9, 2, 6]
Maintain min-heap of size 3:
Process 3: heap = [3]
Process 1: heap = [1, 3]
Process 4: heap = [1, 3, 4]
Process 1: 1 <= heap[0]=1, skip
Process 5: 5 > 1, remove 1, add 5 → heap = [3, 5, 4]
Process 9: 9 > 3, remove 3, add 9 → heap = [4, 5, 9]
Process 2: 2 <= 4, skip
Process 6: 6 > 4, remove 4, add 6 → heap = [5, 9, 6]
Result: [5, 6, 9] are the top 3
```
code_template: |
import heapq
def find_k_largest(nums: list[int], k: int) -> list[int]:
"""Find k largest elements using min-heap."""
# Min-heap of size k keeps k largest
heap = []
for num in nums:
if len(heap) < k:
heapq.heappush(heap, num)
elif num > heap[0]:
heapq.heapreplace(heap, num) # Pop min, push new
return heap
def find_k_smallest(nums: list[int], k: int) -> list[int]:
"""Find k smallest elements using max-heap (negated values)."""
# Max-heap (negated) of size k keeps k smallest
heap = []
for num in nums:
if len(heap) < k:
heapq.heappush(heap, -num)
elif num < -heap[0]:
heapq.heapreplace(heap, -num)
return [-x for x in heap]
def merge_k_sorted_lists(lists: list[list[int]]) -> list[int]:
"""Merge k sorted lists using min-heap."""
heap = []
result = []
# Initialize heap with first element from each list
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst[0], i, 0))
while heap:
val, list_idx, elem_idx = heapq.heappop(heap)
result.append(val)
# Add next element from same list
if elem_idx + 1 < len(lists[list_idx]):
next_val = lists[list_idx][elem_idx + 1]
heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
return result
class MedianFinder:
"""Find median from data stream using two heaps."""
def __init__(self):
self.small = [] # Max-heap (negated) for smaller half
self.large = [] # Min-heap for larger half
def add_num(self, num: int) -> None:
# Add to max-heap (smaller half)
heapq.heappush(self.small, -num)
# Balance: largest of small should be <= smallest of large
if self.large and -self.small[0] > self.large[0]:
heapq.heappush(self.large, -heapq.heappop(self.small))
# Size balance: small can have at most 1 more element
if len(self.small) > len(self.large) + 1:
heapq.heappush(self.large, -heapq.heappop(self.small))
elif len(self.large) > len(self.small):
heapq.heappush(self.small, -heapq.heappop(self.large))
def find_median(self) -> float:
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
def kth_smallest_in_matrix(matrix: list[list[int]], k: int) -> int:
"""Find kth smallest in row-wise and column-wise sorted matrix."""
n = len(matrix)
heap = [(matrix[0][0], 0, 0)]
visited = {(0, 0)}
for _ in range(k - 1):
val, r, c = heapq.heappop(heap)
# Add right neighbor
if c + 1 < n and (r, c + 1) not in visited:
visited.add((r, c + 1))
heapq.heappush(heap, (matrix[r][c + 1], r, c + 1))
# Add bottom neighbor
if r + 1 < n and (r + 1, c) not in visited:
visited.add((r + 1, c))
heapq.heappush(heap, (matrix[r + 1][c], r + 1, c))
return heap[0][0]
recognition_signals:
- "kth largest"
- "kth smallest"
- "top k"
- "merge sorted"
- "median"
- "priority"
- "schedule"
- "Dijkstra"
- "frequency"
- "closest points"
common_mistakes:
- title: Using max-heap when min-heap needed (or vice versa)
description: |
Python's heapq is a min-heap. Using it directly for "k largest" keeps
k smallest instead.
fix: |
For max-heap behavior, negate values:
```python
heapq.heappush(heap, -num) # Push negative
max_val = -heapq.heappop(heap) # Negate back
```
- title: Wrong heap size for "top K" problems
description: |
For "k largest," keeping a max-heap of all elements and extracting k times
is O(n + k log n). Using min-heap of size k is O(n log k).
fix: |
For k largest: use min-heap of size k, remove smallest when full.
For k smallest: use max-heap of size k, remove largest when full.
- title: Forgetting tuple comparison order
description: |
When heap contains tuples, Python compares by first element, then second,
etc. If first elements are equal, comparison moves to second element.
fix: |
Put the comparison key first in the tuple:
```python
heapq.heappush(heap, (priority, item))
```
If items aren't comparable, use a counter as tiebreaker.
- title: Modifying heap elements directly
description: |
Changing an element's value after it's in the heap breaks heap property.
fix: |
Heaps don't support "decrease key" directly. Either: (1) use lazy deletion
(mark as invalid, skip when popped), or (2) re-heapify the entire heap.
variations:
- name: Top K elements
description: |
Keep k largest using min-heap of size k, or k smallest using max-heap
of size k.
example: "Kth Largest Element, Top K Frequent Elements"
- name: K-way merge
description: |
Merge k sorted lists efficiently by maintaining heap of current elements
from each list.
example: "Merge K Sorted Lists, Smallest Range Covering K Lists"
- name: Two heaps (median)
description: |
Maintain two heaps: max-heap for smaller half, min-heap for larger half.
Median is at the roots.
example: "Find Median from Data Stream, Sliding Window Median"
- name: Dijkstra's algorithm
description: |
Min-heap tracks vertices by shortest known distance. Extract minimum,
relax edges, update heap.
example: "Network Delay Time, Cheapest Flights Within K Stops"
- name: Task scheduling
description: |
Prioritize tasks by some criteria (deadline, duration). Process highest
priority first.
example: "Task Scheduler, Meeting Rooms III"
related_patterns:
- binary-search
- two-pointers
prerequisite_patterns: []