feat(patterns): data structure tutorials
This commit is contained in:
305
backend/data/patterns/trie.yaml
Normal file
305
backend/data/patterns/trie.yaml
Normal file
@@ -0,0 +1,305 @@
|
||||
name: Trie
|
||||
slug: trie
|
||||
difficulty_level: 3
|
||||
|
||||
description: >
|
||||
A tree-like data structure for efficient string prefix operations. Each node
|
||||
represents a character, and paths from root to nodes spell out prefixes. Tries
|
||||
enable O(m) search, insert, and prefix queries where m is the word length.
|
||||
|
||||
when_to_use: |
|
||||
- Autocomplete systems
|
||||
- Spell checkers
|
||||
- Word dictionary with prefix search
|
||||
- Word break problems
|
||||
- IP routing (longest prefix matching)
|
||||
|
||||
metaphor: |
|
||||
Imagine a filing cabinet where files are organized by name, one letter per
|
||||
drawer. To find "apple," you open drawer 'a', then find sub-drawer 'p', then
|
||||
'p', then 'l', then 'e'. You don't search through all files—you navigate
|
||||
directly to the right location. Finding "application" shares the same path
|
||||
up to "appl" before diverging.
|
||||
|
||||
Another analogy: a phone book organized as a tree. Instead of a flat
|
||||
alphabetical list, common prefixes are grouped, making it fast to find all
|
||||
names starting with "Joh" or check if "Johnson" exists.
|
||||
|
||||
core_concept: |
|
||||
A **Trie** (pronounced "try") stores strings character by character:
|
||||
|
||||
- **Root**: Empty node representing the empty prefix
|
||||
- **Edges**: Labeled with characters
|
||||
- **Nodes**: Represent prefixes; may be marked as "end of word"
|
||||
|
||||
Key insight: all words sharing a prefix share the same path from root.
|
||||
This makes prefix operations extremely efficient:
|
||||
|
||||
- **Insert word**: O(m) — create path from root
|
||||
- **Search word**: O(m) — follow path, check end marker
|
||||
- **Starts with prefix**: O(m) — just follow path, no end check needed
|
||||
|
||||
**Trade-off**: Tries use more memory than hash sets (each character is a node),
|
||||
but enable prefix queries that hash sets cannot support.
|
||||
|
||||
visualization: |
|
||||
**Trie containing: ["app", "apple", "apply", "apt", "bat"]**
|
||||
|
||||
```
|
||||
(root)
|
||||
/ \
|
||||
a b
|
||||
| |
|
||||
p a
|
||||
/ \ |
|
||||
p t* t*
|
||||
|
|
||||
l
|
||||
/ \
|
||||
e* y*
|
||||
|
||||
* = end of word marker
|
||||
|
||||
Paths:
|
||||
- "app" → a-p-p*
|
||||
- "apple" → a-p-p-l-e*
|
||||
- "apply" → a-p-p-l-y*
|
||||
- "apt" → a-p-t*
|
||||
- "bat" → b-a-t*
|
||||
```
|
||||
|
||||
**Search for "apple":**
|
||||
|
||||
```
|
||||
Start at root
|
||||
→ 'a': found, move to 'a' node
|
||||
→ 'p': found, move to 'p' node
|
||||
→ 'p': found, move to second 'p' node
|
||||
→ 'l': found, move to 'l' node
|
||||
→ 'e': found, move to 'e' node
|
||||
→ end of word marker? Yes!
|
||||
|
||||
"apple" exists ✓
|
||||
```
|
||||
|
||||
**Search for "app":**
|
||||
|
||||
```
|
||||
Follow path a-p-p
|
||||
→ end of word marker on second 'p'? Yes!
|
||||
|
||||
"app" exists ✓
|
||||
```
|
||||
|
||||
**Starts with "ap":**
|
||||
|
||||
```
|
||||
Follow path a-p
|
||||
→ reached end of prefix successfully
|
||||
|
||||
Words with prefix "ap" exist ✓
|
||||
```
|
||||
|
||||
code_template: |
|
||||
class TrieNode:
|
||||
def __init__(self):
|
||||
self.children = {}
|
||||
self.is_end = False
|
||||
|
||||
|
||||
class Trie:
|
||||
def __init__(self):
|
||||
self.root = TrieNode()
|
||||
|
||||
def insert(self, word: str) -> None:
|
||||
"""Insert a word into the trie."""
|
||||
node = self.root
|
||||
for char in word:
|
||||
if char not in node.children:
|
||||
node.children[char] = TrieNode()
|
||||
node = node.children[char]
|
||||
node.is_end = True
|
||||
|
||||
def search(self, word: str) -> bool:
|
||||
"""Check if word exists in trie."""
|
||||
node = self._traverse(word)
|
||||
return node is not None and node.is_end
|
||||
|
||||
def starts_with(self, prefix: str) -> bool:
|
||||
"""Check if any word starts with prefix."""
|
||||
return self._traverse(prefix) is not None
|
||||
|
||||
def _traverse(self, s: str) -> TrieNode:
|
||||
"""Traverse trie following string s."""
|
||||
node = self.root
|
||||
for char in s:
|
||||
if char not in node.children:
|
||||
return None
|
||||
node = node.children[char]
|
||||
return node
|
||||
|
||||
|
||||
class WordDictionary:
|
||||
"""Trie with wildcard search support."""
|
||||
|
||||
def __init__(self):
|
||||
self.root = TrieNode()
|
||||
|
||||
def add_word(self, word: str) -> None:
|
||||
node = self.root
|
||||
for char in word:
|
||||
if char not in node.children:
|
||||
node.children[char] = TrieNode()
|
||||
node = node.children[char]
|
||||
node.is_end = True
|
||||
|
||||
def search(self, word: str) -> bool:
|
||||
"""Search with '.' as wildcard for any character."""
|
||||
def dfs(node: TrieNode, i: int) -> bool:
|
||||
if i == len(word):
|
||||
return node.is_end
|
||||
|
||||
char = word[i]
|
||||
|
||||
if char == '.':
|
||||
# Try all children
|
||||
return any(dfs(child, i + 1)
|
||||
for child in node.children.values())
|
||||
else:
|
||||
if char not in node.children:
|
||||
return False
|
||||
return dfs(node.children[char], i + 1)
|
||||
|
||||
return dfs(self.root, 0)
|
||||
|
||||
|
||||
def word_break(s: str, word_dict: list[str]) -> bool:
|
||||
"""Check if string can be segmented into dictionary words."""
|
||||
trie = Trie()
|
||||
for word in word_dict:
|
||||
trie.insert(word)
|
||||
|
||||
n = len(s)
|
||||
dp = [False] * (n + 1)
|
||||
dp[0] = True # Empty string can be segmented
|
||||
|
||||
for i in range(n):
|
||||
if not dp[i]:
|
||||
continue
|
||||
|
||||
node = trie.root
|
||||
for j in range(i, n):
|
||||
if s[j] not in node.children:
|
||||
break
|
||||
node = node.children[s[j]]
|
||||
if node.is_end:
|
||||
dp[j + 1] = True
|
||||
|
||||
return dp[n]
|
||||
|
||||
|
||||
def find_words_with_prefix(trie: Trie, prefix: str) -> list[str]:
|
||||
"""Find all words starting with prefix."""
|
||||
node = trie._traverse(prefix)
|
||||
if not node:
|
||||
return []
|
||||
|
||||
results = []
|
||||
|
||||
def dfs(node: TrieNode, path: str):
|
||||
if node.is_end:
|
||||
results.append(path)
|
||||
for char, child in node.children.items():
|
||||
dfs(child, path + char)
|
||||
|
||||
dfs(node, prefix)
|
||||
return results
|
||||
|
||||
recognition_signals:
|
||||
- "prefix"
|
||||
- "autocomplete"
|
||||
- "word dictionary"
|
||||
- "spell check"
|
||||
- "word search"
|
||||
- "word break"
|
||||
- "longest common prefix"
|
||||
- "starts with"
|
||||
- "implement trie"
|
||||
- "wildcard"
|
||||
|
||||
common_mistakes:
|
||||
- title: Confusing search vs starts_with
|
||||
description: |
|
||||
Search checks if the exact word exists (must have end marker).
|
||||
Starts_with only checks if the prefix path exists.
|
||||
fix: |
|
||||
For search, always check `node.is_end` at the end:
|
||||
```python
|
||||
def search(self, word):
|
||||
node = self._traverse(word)
|
||||
return node is not None and node.is_end
|
||||
```
|
||||
|
||||
- title: Not handling empty string
|
||||
description: |
|
||||
Empty string is a valid prefix (everything starts with it) but may not
|
||||
be a valid word in the dictionary.
|
||||
fix: |
|
||||
starts_with("") should return True if trie has any words.
|
||||
search("") should return True only if empty string was explicitly inserted.
|
||||
|
||||
- title: Using array instead of dict for children
|
||||
description: |
|
||||
Using `children = [None] * 26` assumes only lowercase letters. This fails
|
||||
for other character sets.
|
||||
fix: |
|
||||
Use a dictionary for flexibility:
|
||||
```python
|
||||
self.children = {} # Works for any characters
|
||||
```
|
||||
Or use array only when character set is known and fixed.
|
||||
|
||||
- title: Memory leaks when deleting words
|
||||
description: |
|
||||
Simply unmarking is_end doesn't free memory for nodes that are no longer
|
||||
part of any word.
|
||||
fix: |
|
||||
For deletion, either: (1) accept memory isn't freed (common), or
|
||||
(2) implement proper deletion that removes orphaned nodes bottom-up.
|
||||
|
||||
variations:
|
||||
- name: Basic Trie
|
||||
description: |
|
||||
Standard insert, search, and prefix check operations.
|
||||
example: "Implement Trie (Prefix Tree)"
|
||||
|
||||
- name: Wildcard search
|
||||
description: |
|
||||
Support '.' as wildcard matching any single character. Requires DFS
|
||||
to explore all possibilities when encountering wildcard.
|
||||
example: "Design Add and Search Words Data Structure"
|
||||
|
||||
- name: Word search in grid
|
||||
description: |
|
||||
Use Trie to efficiently search for multiple words in a 2D grid.
|
||||
Prune branches that don't match any word prefix.
|
||||
example: "Word Search II"
|
||||
|
||||
- name: Autocomplete
|
||||
description: |
|
||||
Find all words starting with a given prefix. DFS from the prefix
|
||||
endpoint to collect all words.
|
||||
example: "Design Search Autocomplete System"
|
||||
|
||||
- name: Compressed Trie (Radix Tree)
|
||||
description: |
|
||||
Merge chains of single-child nodes into one node with a string label.
|
||||
Saves space for sparse tries.
|
||||
example: "Longest Common Prefix optimizations"
|
||||
|
||||
related_patterns:
|
||||
- dfs
|
||||
- backtracking
|
||||
- dynamic-programming
|
||||
|
||||
prerequisite_patterns: []
|
||||
Reference in New Issue
Block a user