Files
codetutor/backend/data/patterns/counting-sort.yaml

93 lines
2.8 KiB
YAML

name: Counting / Bucket Sort
slug: counting-sort
difficulty_level: 2
pattern_type: technique
display_order: 20
description: >
Exploit bounded value ranges to achieve linear time sorting or selection
by using values as array indices.
when_to_use: |
- Finding top K elements when frequencies are bounded
- Sorting when values are in a known, limited range
- Problems involving frequency counting with bounded inputs
- Color sorting (Dutch National Flag)
metaphor: |
Imagine sorting mail into numbered PO boxes. Instead of comparing letters
to each other, you simply look at the box number and drop it in. If you
have 100 boxes, sorting 1000 letters takes 1000 steps, not 1000 x log(1000).
core_concept: |
When values are bounded within a known range [0, k], you can use the value
itself as an index into an array of "buckets." This converts comparison-based
O(n log n) sorting into O(n + k) counting operations.
The key insight: **bounded values = direct addressing is possible**.
code_template: |
def bucket_sort_approach(nums: list[int], k: int) -> list[int]:
# Create buckets indexed by value/frequency
n = len(nums)
buckets = [[] for _ in range(n + 1)] # n+1 for frequency 0 to n
# Count frequencies
count = {}
for num in nums:
count[num] = count.get(num, 0) + 1
# Place elements in frequency buckets
for num, freq in count.items():
buckets[freq].append(num)
# Collect from highest frequency
result = []
for i in range(n, 0, -1):
for num in buckets[i]:
result.append(num)
if len(result) == k:
return result
return result
recognition_signals:
- "top k frequent"
- "sort colors"
- "values in range [0, n]"
- "frequency bounded by array size"
- "O(n) time required"
- "counting occurrences"
common_mistakes:
- title: Using Heap When Bucket Sort is Optimal
description: |
Heap gives O(n log k) but bucket sort gives O(n) when frequencies
are bounded. Always check if values/frequencies have a known upper bound.
fix: |
Ask: "What's the maximum possible value/frequency?" If bounded by n,
use bucket sort.
- title: Off-by-One in Bucket Array
description: |
Creating `n` buckets for frequencies 0 to n-1 misses frequency `n`
(when all elements are identical).
fix: |
Create `n + 1` buckets to handle frequencies from 0 to n inclusive.
variations:
- name: Top K Frequent Elements
description: Use frequency as bucket index, collect from highest
example: "top-k-frequent-elements"
- name: Sort Colors (Dutch National Flag)
description: Three buckets for 0, 1, 2
example: "sort-colors"
- name: H-Index
description: Citation count buckets
example: "h-index"
related_patterns:
- heap
- two-pointers
prerequisite_patterns: []