feat(patterns): data structure tutorials

This commit is contained in:
2025-08-23 19:25:47 +01:00
parent f2e4149e52
commit b1fc2f4e99
4 changed files with 1194 additions and 0 deletions

View File

@@ -0,0 +1,305 @@
name: Trie
slug: trie
difficulty_level: 3
description: >
A tree-like data structure for efficient string prefix operations. Each node
represents a character, and paths from root to nodes spell out prefixes. Tries
enable O(m) search, insert, and prefix queries where m is the word length.
when_to_use: |
- Autocomplete systems
- Spell checkers
- Word dictionary with prefix search
- Word break problems
- IP routing (longest prefix matching)
metaphor: |
Imagine a filing cabinet where files are organized by name, one letter per
drawer. To find "apple," you open drawer 'a', then find sub-drawer 'p', then
'p', then 'l', then 'e'. You don't search through all files—you navigate
directly to the right location. Finding "application" shares the same path
up to "appl" before diverging.
Another analogy: a phone book organized as a tree. Instead of a flat
alphabetical list, common prefixes are grouped, making it fast to find all
names starting with "Joh" or check if "Johnson" exists.
core_concept: |
A **Trie** (pronounced "try") stores strings character by character:
- **Root**: Empty node representing the empty prefix
- **Edges**: Labeled with characters
- **Nodes**: Represent prefixes; may be marked as "end of word"
Key insight: all words sharing a prefix share the same path from root.
This makes prefix operations extremely efficient:
- **Insert word**: O(m) — create path from root
- **Search word**: O(m) — follow path, check end marker
- **Starts with prefix**: O(m) — just follow path, no end check needed
**Trade-off**: Tries use more memory than hash sets (each character is a node),
but enable prefix queries that hash sets cannot support.
visualization: |
**Trie containing: ["app", "apple", "apply", "apt", "bat"]**
```
(root)
/ \
a b
| |
p a
/ \ |
p t* t*
|
l
/ \
e* y*
* = end of word marker
Paths:
- "app" → a-p-p*
- "apple" → a-p-p-l-e*
- "apply" → a-p-p-l-y*
- "apt" → a-p-t*
- "bat" → b-a-t*
```
**Search for "apple":**
```
Start at root
→ 'a': found, move to 'a' node
→ 'p': found, move to 'p' node
→ 'p': found, move to second 'p' node
→ 'l': found, move to 'l' node
→ 'e': found, move to 'e' node
→ end of word marker? Yes!
"apple" exists ✓
```
**Search for "app":**
```
Follow path a-p-p
→ end of word marker on second 'p'? Yes!
"app" exists ✓
```
**Starts with "ap":**
```
Follow path a-p
→ reached end of prefix successfully
Words with prefix "ap" exist ✓
```
code_template: |
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
"""Insert a word into the trie."""
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
def search(self, word: str) -> bool:
"""Check if word exists in trie."""
node = self._traverse(word)
return node is not None and node.is_end
def starts_with(self, prefix: str) -> bool:
"""Check if any word starts with prefix."""
return self._traverse(prefix) is not None
def _traverse(self, s: str) -> TrieNode:
"""Traverse trie following string s."""
node = self.root
for char in s:
if char not in node.children:
return None
node = node.children[char]
return node
class WordDictionary:
"""Trie with wildcard search support."""
def __init__(self):
self.root = TrieNode()
def add_word(self, word: str) -> None:
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
def search(self, word: str) -> bool:
"""Search with '.' as wildcard for any character."""
def dfs(node: TrieNode, i: int) -> bool:
if i == len(word):
return node.is_end
char = word[i]
if char == '.':
# Try all children
return any(dfs(child, i + 1)
for child in node.children.values())
else:
if char not in node.children:
return False
return dfs(node.children[char], i + 1)
return dfs(self.root, 0)
def word_break(s: str, word_dict: list[str]) -> bool:
"""Check if string can be segmented into dictionary words."""
trie = Trie()
for word in word_dict:
trie.insert(word)
n = len(s)
dp = [False] * (n + 1)
dp[0] = True # Empty string can be segmented
for i in range(n):
if not dp[i]:
continue
node = trie.root
for j in range(i, n):
if s[j] not in node.children:
break
node = node.children[s[j]]
if node.is_end:
dp[j + 1] = True
return dp[n]
def find_words_with_prefix(trie: Trie, prefix: str) -> list[str]:
"""Find all words starting with prefix."""
node = trie._traverse(prefix)
if not node:
return []
results = []
def dfs(node: TrieNode, path: str):
if node.is_end:
results.append(path)
for char, child in node.children.items():
dfs(child, path + char)
dfs(node, prefix)
return results
recognition_signals:
- "prefix"
- "autocomplete"
- "word dictionary"
- "spell check"
- "word search"
- "word break"
- "longest common prefix"
- "starts with"
- "implement trie"
- "wildcard"
common_mistakes:
- title: Confusing search vs starts_with
description: |
Search checks if the exact word exists (must have end marker).
Starts_with only checks if the prefix path exists.
fix: |
For search, always check `node.is_end` at the end:
```python
def search(self, word):
node = self._traverse(word)
return node is not None and node.is_end
```
- title: Not handling empty string
description: |
Empty string is a valid prefix (everything starts with it) but may not
be a valid word in the dictionary.
fix: |
starts_with("") should return True if trie has any words.
search("") should return True only if empty string was explicitly inserted.
- title: Using array instead of dict for children
description: |
Using `children = [None] * 26` assumes only lowercase letters. This fails
for other character sets.
fix: |
Use a dictionary for flexibility:
```python
self.children = {} # Works for any characters
```
Or use array only when character set is known and fixed.
- title: Memory leaks when deleting words
description: |
Simply unmarking is_end doesn't free memory for nodes that are no longer
part of any word.
fix: |
For deletion, either: (1) accept memory isn't freed (common), or
(2) implement proper deletion that removes orphaned nodes bottom-up.
variations:
- name: Basic Trie
description: |
Standard insert, search, and prefix check operations.
example: "Implement Trie (Prefix Tree)"
- name: Wildcard search
description: |
Support '.' as wildcard matching any single character. Requires DFS
to explore all possibilities when encountering wildcard.
example: "Design Add and Search Words Data Structure"
- name: Word search in grid
description: |
Use Trie to efficiently search for multiple words in a 2D grid.
Prune branches that don't match any word prefix.
example: "Word Search II"
- name: Autocomplete
description: |
Find all words starting with a given prefix. DFS from the prefix
endpoint to collect all words.
example: "Design Search Autocomplete System"
- name: Compressed Trie (Radix Tree)
description: |
Merge chains of single-child nodes into one node with a string label.
Saves space for sparse tries.
example: "Longest Common Prefix optimizations"
related_patterns:
- dfs
- backtracking
- dynamic-programming
prerequisite_patterns: []