Files
codetutor/backend/data/questions/word-ladder.yaml

297 lines
13 KiB
YAML

title: Word Ladder
slug: word-ladder
difficulty: hard
leetcode_id: 127
leetcode_url: https://leetcode.com/problems/word-ladder/
categories:
- strings
- graphs
- hash-tables
patterns:
- slug: bfs
is_optimal: true
function_signature: "def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:"
test_cases:
visible:
- input: { begin_word: "hit", end_word: "cog", word_list: ["hot", "dot", "dog", "lot", "log", "cog"] }
expected: 5
- input: { begin_word: "hit", end_word: "cog", word_list: ["hot", "dot", "dog", "lot", "log"] }
expected: 0
hidden:
- input: { begin_word: "a", end_word: "c", word_list: ["a", "b", "c"] }
expected: 2
- input: { begin_word: "hot", end_word: "dog", word_list: ["hot", "dog"] }
expected: 0
- input: { begin_word: "hot", end_word: "dog", word_list: ["hot", "dog", "dot"] }
expected: 3
- input: { begin_word: "leet", end_word: "code", word_list: ["lest", "leet", "lose", "code", "lode", "robe", "lost"] }
expected: 0
- input: { begin_word: "cat", end_word: "dog", word_list: ["cat", "bat", "bet", "bot", "dot", "dog"] }
expected: 6
- input: { begin_word: "red", end_word: "tax", word_list: ["ted", "tex", "red", "tax", "tad", "den", "rex", "pee"] }
expected: 4
description: |
A **transformation sequence** from word `beginWord` to word `endWord` using a dictionary `wordList` is a sequence of words `beginWord -> s1 -> s2 -> ... -> sk` such that:
- Every adjacent pair of words differs by a single letter.
- Every `si` for `1 <= i <= k` is in `wordList`. Note that `beginWord` does not need to be in `wordList`.
- `sk == endWord`
Given two words, `beginWord` and `endWord`, and a dictionary `wordList`, return *the **number of words** in the **shortest transformation sequence** from `beginWord` to `endWord`, or `0` if no such sequence exists*.
constraints: |
- `1 <= beginWord.length <= 10`
- `endWord.length == beginWord.length`
- `1 <= wordList.length <= 5000`
- `wordList[i].length == beginWord.length`
- `beginWord`, `endWord`, and `wordList[i]` consist of lowercase English letters
- `beginWord != endWord`
- All the words in `wordList` are **unique**
examples:
- input: 'beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]'
output: "5"
explanation: 'One shortest transformation sequence is "hit" -> "hot" -> "dot" -> "dog" -> "cog", which is 5 words long.'
- input: 'beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]'
output: "0"
explanation: 'The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.'
explanation:
intuition: |
Imagine each word as a node in a graph. Two nodes are connected by an edge if they differ by exactly one letter. The problem then becomes: **find the shortest path** from `beginWord` to `endWord` in this graph.
Why BFS? When searching for the shortest path in an unweighted graph (where every edge has the same "cost"), **Breadth-First Search** is the ideal algorithm. BFS explores all nodes at distance 1, then all nodes at distance 2, and so on. The first time we reach `endWord`, we've guaranteed found the shortest path.
Think of it like ripples spreading outward from a stone dropped in water. Starting from `beginWord`, we explore all words reachable by changing one letter. Then from each of those words, we explore their one-letter neighbours. The "ripple" that first touches `endWord` tells us the shortest transformation length.
The key insight is recognising this as a **graph shortest-path problem** disguised as a string manipulation problem. Once you see the graph structure, BFS becomes the natural choice.
approach: |
We solve this using **Breadth-First Search (BFS)** with a word set for O(1) lookups:
**Step 1: Handle early termination**
- If `endWord` is not in `wordList`, return `0` immediately since no valid transformation exists
- Convert `wordList` to a set for O(1) membership checks
&nbsp;
**Step 2: Initialise BFS data structures**
- `queue`: Contains tuples of `(current_word, transformation_length)`, starting with `(beginWord, 1)`
- `visited`: A set to track words we've already processed, preventing cycles
&nbsp;
**Step 3: Process the BFS queue**
- Dequeue the front word and its current transformation length
- If this word equals `endWord`, return the transformation length (shortest path found)
- Otherwise, generate all possible one-letter transformations
&nbsp;
**Step 4: Generate neighbour words efficiently**
- For each position in the word, try replacing it with every letter from `a` to `z`
- If the new word exists in `wordList` and hasn't been visited:
- Mark it as visited
- Add it to the queue with `length + 1`
&nbsp;
**Step 5: Return result**
- If the queue empties without finding `endWord`, return `0`
&nbsp;
This approach guarantees we find the shortest path because BFS explores all words at distance `d` before any word at distance `d+1`.
common_pitfalls:
- title: Using DFS Instead of BFS
description: |
DFS will find *a* path but not necessarily the *shortest* path. DFS explores one branch deeply before backtracking, so it might find a longer transformation sequence first.
For example, DFS might find `hit -> hot -> lot -> log -> cog` (5 words) but miss that `hit -> hot -> dot -> dog -> cog` is equally short. Worse, on different inputs DFS could find much longer paths.
BFS guarantees shortest path in unweighted graphs because it explores level by level.
wrong_approach: "Use DFS with path tracking"
correct_approach: "Use BFS to guarantee shortest path"
- title: Comparing Every Word Pair (O(n^2) Neighbour Check)
description: |
A naive approach compares every word against every other word to find neighbours differing by one letter. With `n` words of length `m`, this is O(n^2 * m) just for building the graph.
Instead, for each word, generate all possible one-letter variations and check if they exist in the word set. This is O(n * m * 26) = O(n * m), which is much faster when `n` is large.
With `wordList.length <= 5000` and word length up to 10, the optimised approach does ~1.3M operations vs potentially 250M for the naive approach.
wrong_approach: "Compare every pair of words"
correct_approach: "Generate variations and check set membership"
- title: Forgetting to Check if endWord Exists
description: |
If `endWord` is not in `wordList`, no valid transformation can exist. Failing to check this upfront means BFS runs to exhaustion before returning `0`.
Always validate inputs first: `if endWord not in word_set: return 0`.
- title: Not Marking Words as Visited
description: |
Without tracking visited words, BFS can revisit the same word multiple times from different paths, leading to:
- Infinite loops in graphs with cycles
- Exponential time complexity as the same subgraphs are explored repeatedly
Mark words as visited **when adding to the queue**, not when dequeuing. This prevents adding duplicates to the queue.
wrong_approach: "Process words without tracking visited"
correct_approach: "Mark visited when enqueuing to prevent duplicates"
key_takeaways:
- "**Graph recognition**: Many string transformation problems are graph shortest-path problems in disguise. When you see 'minimum steps' or 'shortest sequence', think BFS"
- "**BFS for shortest path**: In unweighted graphs, BFS guarantees the shortest path. This is fundamental and appears in many problems"
- "**Optimise neighbour generation**: Instead of comparing all pairs, generate possible variations and check set membership. This changes O(n^2) to O(n * alphabet_size)"
- "**Foundation for Word Ladder II**: This problem (LeetCode 126) asks for all shortest paths, requiring you to track parent pointers during BFS"
time_complexity: "O(n * m * 26) where `n` is the number of words and `m` is the word length. For each word, we generate `m * 26` variations and check set membership in O(m) for hashing."
space_complexity: "O(n * m). The visited set and queue can each hold up to `n` words of length `m`."
solutions:
- approach_name: BFS with Set Lookup
is_optimal: true
code: |
from collections import deque
def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
# Convert to set for O(1) lookups
word_set = set(word_list)
# Early termination: end_word must be reachable
if end_word not in word_set:
return 0
# BFS setup: (current_word, transformation_count)
queue = deque([(begin_word, 1)])
visited = {begin_word}
while queue:
current_word, length = queue.popleft()
# Try changing each character position
for i in range(len(current_word)):
# Try all 26 letters
for c in 'abcdefghijklmnopqrstuvwxyz':
# Build the new word with one character changed
next_word = current_word[:i] + c + current_word[i+1:]
# Found the target!
if next_word == end_word:
return length + 1
# Valid unvisited word? Add to queue
if next_word in word_set and next_word not in visited:
visited.add(next_word)
queue.append((next_word, length + 1))
# No path found
return 0
explanation: |
**Time Complexity:** O(n * m * 26) where n is the word list size and m is word length.
**Space Complexity:** O(n * m) for the visited set and queue.
BFS explores words level by level, guaranteeing the first path found to `endWord` is the shortest. We optimise neighbour finding by generating all single-character variations rather than comparing against all words.
- approach_name: Bidirectional BFS
is_optimal: true
code: |
def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
word_set = set(word_list)
if end_word not in word_set:
return 0
# Search from both ends simultaneously
front = {begin_word}
back = {end_word}
visited = set()
length = 1
while front and back:
# Always expand the smaller frontier for efficiency
if len(front) > len(back):
front, back = back, front
next_front = set()
for word in front:
for i in range(len(word)):
for c in 'abcdefghijklmnopqrstuvwxyz':
next_word = word[:i] + c + word[i+1:]
# Frontiers meet! Path found
if next_word in back:
return length + 1
if next_word in word_set and next_word not in visited:
visited.add(next_word)
next_front.add(next_word)
front = next_front
length += 1
return 0
explanation: |
**Time Complexity:** O(n * m * 26), but often faster in practice due to smaller search space.
**Space Complexity:** O(n * m) for the visited set and frontiers.
Bidirectional BFS searches from both `beginWord` and `endWord` simultaneously. When the two search frontiers meet, we've found the shortest path. This reduces the search space from O(b^d) to O(b^(d/2)) where b is branching factor and d is depth, providing significant speedup on large graphs.
- approach_name: BFS with Wildcard Preprocessing
is_optimal: false
code: |
from collections import deque, defaultdict
def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
if end_word not in word_list:
return 0
# Preprocess: group words by wildcard patterns
# "hot" -> ["*ot", "h*t", "ho*"]
word_len = len(begin_word)
patterns = defaultdict(list)
for word in word_list:
for i in range(word_len):
pattern = word[:i] + '*' + word[i+1:]
patterns[pattern].append(word)
# BFS using pattern lookup
queue = deque([(begin_word, 1)])
visited = {begin_word}
while queue:
current_word, length = queue.popleft()
# Find neighbours through shared patterns
for i in range(word_len):
pattern = current_word[:i] + '*' + current_word[i+1:]
for neighbour in patterns[pattern]:
if neighbour == end_word:
return length + 1
if neighbour not in visited:
visited.add(neighbour)
queue.append((neighbour, length + 1))
return 0
explanation: |
**Time Complexity:** O(n * m^2) for preprocessing plus O(n * m) for BFS.
**Space Complexity:** O(n * m^2) for the pattern dictionary.
This approach preprocesses words into "wildcard buckets" (e.g., `h*t` contains both `hot` and `hat`). Finding neighbours becomes a dictionary lookup. This trades space for faster neighbour finding but uses more memory. Best when the word list is dense (many words share patterns).