questions S-W

2025-05-30 19:18:33 +01:00
parent 68699f35ec
commit f7e491f1e8
46 changed files with 9696 additions and 0 deletions
--- a/backend/data/questions/word-ladder.yaml
+++ b/backend/data/questions/word-ladder.yaml
@@ -0,0 +1,273 @@
+title: Word Ladder
+slug: word-ladder
+difficulty: hard
+leetcode_id: 127
+leetcode_url: https://leetcode.com/problems/word-ladder/
+categories:
+  - strings
+  - graphs
+  - hash-tables
+patterns:
+  - bfs
+
+description: |
+  A **transformation sequence** from word `beginWord` to word `endWord` using a dictionary `wordList` is a sequence of words `beginWord -> s1 -> s2 -> ... -> sk` such that:
+
+  - Every adjacent pair of words differs by a single letter.
+  - Every `si` for `1 <= i <= k` is in `wordList`. Note that `beginWord` does not need to be in `wordList`.
+  - `sk == endWord`
+
+  Given two words, `beginWord` and `endWord`, and a dictionary `wordList`, return *the **number of words** in the **shortest transformation sequence** from `beginWord` to `endWord`, or `0` if no such sequence exists*.
+
+constraints: |
+  - `1 <= beginWord.length <= 10`
+  - `endWord.length == beginWord.length`
+  - `1 <= wordList.length <= 5000`
+  - `wordList[i].length == beginWord.length`
+  - `beginWord`, `endWord`, and `wordList[i]` consist of lowercase English letters
+  - `beginWord != endWord`
+  - All the words in `wordList` are **unique**
+
+examples:
+  - input: 'beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]'
+    output: "5"
+    explanation: 'One shortest transformation sequence is "hit" -> "hot" -> "dot" -> "dog" -> "cog", which is 5 words long.'
+  - input: 'beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]'
+    output: "0"
+    explanation: 'The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.'
+
+explanation:
+  intuition: |
+    Imagine each word as a node in a graph. Two nodes are connected by an edge if they differ by exactly one letter. The problem then becomes: **find the shortest path** from `beginWord` to `endWord` in this graph.
+
+    Why BFS? When searching for the shortest path in an unweighted graph (where every edge has the same "cost"), **Breadth-First Search** is the ideal algorithm. BFS explores all nodes at distance 1, then all nodes at distance 2, and so on. The first time we reach `endWord`, we've guaranteed found the shortest path.
+
+    Think of it like ripples spreading outward from a stone dropped in water. Starting from `beginWord`, we explore all words reachable by changing one letter. Then from each of those words, we explore their one-letter neighbours. The "ripple" that first touches `endWord` tells us the shortest transformation length.
+
+    The key insight is recognising this as a **graph shortest-path problem** disguised as a string manipulation problem. Once you see the graph structure, BFS becomes the natural choice.
+
+  approach: |
+    We solve this using **Breadth-First Search (BFS)** with a word set for O(1) lookups:
+
+    **Step 1: Handle early termination**
+
+    - If `endWord` is not in `wordList`, return `0` immediately since no valid transformation exists
+    - Convert `wordList` to a set for O(1) membership checks
+
+    &nbsp;
+
+    **Step 2: Initialise BFS data structures**
+
+    - `queue`: Contains tuples of `(current_word, transformation_length)`, starting with `(beginWord, 1)`
+    - `visited`: A set to track words we've already processed, preventing cycles
+
+    &nbsp;
+
+    **Step 3: Process the BFS queue**
+
+    - Dequeue the front word and its current transformation length
+    - If this word equals `endWord`, return the transformation length (shortest path found)
+    - Otherwise, generate all possible one-letter transformations
+
+    &nbsp;
+
+    **Step 4: Generate neighbour words efficiently**
+
+    - For each position in the word, try replacing it with every letter from `a` to `z`
+    - If the new word exists in `wordList` and hasn't been visited:
+      - Mark it as visited
+      - Add it to the queue with `length + 1`
+
+    &nbsp;
+
+    **Step 5: Return result**
+
+    - If the queue empties without finding `endWord`, return `0`
+
+    &nbsp;
+
+    This approach guarantees we find the shortest path because BFS explores all words at distance `d` before any word at distance `d+1`.
+
+  common_pitfalls:
+    - title: Using DFS Instead of BFS
+      description: |
+        DFS will find *a* path but not necessarily the *shortest* path. DFS explores one branch deeply before backtracking, so it might find a longer transformation sequence first.
+
+        For example, DFS might find `hit -> hot -> lot -> log -> cog` (5 words) but miss that `hit -> hot -> dot -> dog -> cog` is equally short. Worse, on different inputs DFS could find much longer paths.
+
+        BFS guarantees shortest path in unweighted graphs because it explores level by level.
+      wrong_approach: "Use DFS with path tracking"
+      correct_approach: "Use BFS to guarantee shortest path"
+
+    - title: Comparing Every Word Pair (O(n^2) Neighbour Check)
+      description: |
+        A naive approach compares every word against every other word to find neighbours differing by one letter. With `n` words of length `m`, this is O(n^2 * m) just for building the graph.
+
+        Instead, for each word, generate all possible one-letter variations and check if they exist in the word set. This is O(n * m * 26) = O(n * m), which is much faster when `n` is large.
+
+        With `wordList.length <= 5000` and word length up to 10, the optimised approach does ~1.3M operations vs potentially 250M for the naive approach.
+      wrong_approach: "Compare every pair of words"
+      correct_approach: "Generate variations and check set membership"
+
+    - title: Forgetting to Check if endWord Exists
+      description: |
+        If `endWord` is not in `wordList`, no valid transformation can exist. Failing to check this upfront means BFS runs to exhaustion before returning `0`.
+
+        Always validate inputs first: `if endWord not in word_set: return 0`.
+
+    - title: Not Marking Words as Visited
+      description: |
+        Without tracking visited words, BFS can revisit the same word multiple times from different paths, leading to:
+        - Infinite loops in graphs with cycles
+        - Exponential time complexity as the same subgraphs are explored repeatedly
+
+        Mark words as visited **when adding to the queue**, not when dequeuing. This prevents adding duplicates to the queue.
+      wrong_approach: "Process words without tracking visited"
+      correct_approach: "Mark visited when enqueuing to prevent duplicates"
+
+  key_takeaways:
+    - "**Graph recognition**: Many string transformation problems are graph shortest-path problems in disguise. When you see 'minimum steps' or 'shortest sequence', think BFS"
+    - "**BFS for shortest path**: In unweighted graphs, BFS guarantees the shortest path. This is fundamental and appears in many problems"
+    - "**Optimise neighbour generation**: Instead of comparing all pairs, generate possible variations and check set membership. This changes O(n^2) to O(n * alphabet_size)"
+    - "**Foundation for Word Ladder II**: This problem (LeetCode 126) asks for all shortest paths, requiring you to track parent pointers during BFS"
+
+  time_complexity: "O(n * m * 26) where `n` is the number of words and `m` is the word length. For each word, we generate `m * 26` variations and check set membership in O(m) for hashing."
+  space_complexity: "O(n * m). The visited set and queue can each hold up to `n` words of length `m`."
+
+solutions:
+  - approach_name: BFS with Set Lookup
+    is_optimal: true
+    code: |
+      from collections import deque
+
+      def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
+          # Convert to set for O(1) lookups
+          word_set = set(word_list)
+
+          # Early termination: end_word must be reachable
+          if end_word not in word_set:
+              return 0
+
+          # BFS setup: (current_word, transformation_count)
+          queue = deque([(begin_word, 1)])
+          visited = {begin_word}
+
+          while queue:
+              current_word, length = queue.popleft()
+
+              # Try changing each character position
+              for i in range(len(current_word)):
+                  # Try all 26 letters
+                  for c in 'abcdefghijklmnopqrstuvwxyz':
+                      # Build the new word with one character changed
+                      next_word = current_word[:i] + c + current_word[i+1:]
+
+                      # Found the target!
+                      if next_word == end_word:
+                          return length + 1
+
+                      # Valid unvisited word? Add to queue
+                      if next_word in word_set and next_word not in visited:
+                          visited.add(next_word)
+                          queue.append((next_word, length + 1))
+
+          # No path found
+          return 0
+    explanation: |
+      **Time Complexity:** O(n * m * 26) where n is the word list size and m is word length.
+
+      **Space Complexity:** O(n * m) for the visited set and queue.
+
+      BFS explores words level by level, guaranteeing the first path found to `endWord` is the shortest. We optimise neighbour finding by generating all single-character variations rather than comparing against all words.
+
+  - approach_name: Bidirectional BFS
+    is_optimal: true
+    code: |
+      def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
+          word_set = set(word_list)
+
+          if end_word not in word_set:
+              return 0
+
+          # Search from both ends simultaneously
+          front = {begin_word}
+          back = {end_word}
+          visited = set()
+          length = 1
+
+          while front and back:
+              # Always expand the smaller frontier for efficiency
+              if len(front) > len(back):
+                  front, back = back, front
+
+              next_front = set()
+
+              for word in front:
+                  for i in range(len(word)):
+                      for c in 'abcdefghijklmnopqrstuvwxyz':
+                          next_word = word[:i] + c + word[i+1:]
+
+                          # Frontiers meet! Path found
+                          if next_word in back:
+                              return length + 1
+
+                          if next_word in word_set and next_word not in visited:
+                              visited.add(next_word)
+                              next_front.add(next_word)
+
+              front = next_front
+              length += 1
+
+          return 0
+    explanation: |
+      **Time Complexity:** O(n * m * 26), but often faster in practice due to smaller search space.
+
+      **Space Complexity:** O(n * m) for the visited set and frontiers.
+
+      Bidirectional BFS searches from both `beginWord` and `endWord` simultaneously. When the two search frontiers meet, we've found the shortest path. This reduces the search space from O(b^d) to O(b^(d/2)) where b is branching factor and d is depth, providing significant speedup on large graphs.
+
+  - approach_name: BFS with Wildcard Preprocessing
+    is_optimal: false
+    code: |
+      from collections import deque, defaultdict
+
+      def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
+          if end_word not in word_list:
+              return 0
+
+          # Preprocess: group words by wildcard patterns
+          # "hot" -> ["*ot", "h*t", "ho*"]
+          word_len = len(begin_word)
+          patterns = defaultdict(list)
+
+          for word in word_list:
+              for i in range(word_len):
+                  pattern = word[:i] + '*' + word[i+1:]
+                  patterns[pattern].append(word)
+
+          # BFS using pattern lookup
+          queue = deque([(begin_word, 1)])
+          visited = {begin_word}
+
+          while queue:
+              current_word, length = queue.popleft()
+
+              # Find neighbours through shared patterns
+              for i in range(word_len):
+                  pattern = current_word[:i] + '*' + current_word[i+1:]
+
+                  for neighbour in patterns[pattern]:
+                      if neighbour == end_word:
+                          return length + 1
+
+                      if neighbour not in visited:
+                          visited.add(neighbour)
+                          queue.append((neighbour, length + 1))
+
+          return 0
+    explanation: |
+      **Time Complexity:** O(n * m^2) for preprocessing plus O(n * m) for BFS.
+
+      **Space Complexity:** O(n * m^2) for the pattern dictionary.
+
+      This approach preprocesses words into "wildcard buckets" (e.g., `h*t` contains both `hot` and `hat`). Finding neighbours becomes a dictionary lookup. This trades space for faster neighbour finding but uses more memory. Best when the word list is dense (many words share patterns).