title: Word Ladder
slug: word-ladder
difficulty: hard
leetcode_id: 127
leetcode_url: https://leetcode.com/problems/word-ladder/
categories:
  - strings
  - graphs
  - hash-tables
patterns:
  - bfs

description: |
  A **transformation sequence** from word `beginWord` to word `endWord` using a dictionary `wordList` is a sequence of words `beginWord -> s1 -> s2 -> ... -> sk` such that:

  - Every adjacent pair of words differs by a single letter.
  - Every `si` for `1 <= i <= k` is in `wordList`. Note that `beginWord` does not need to be in `wordList`.
  - `sk == endWord`

  Given two words, `beginWord` and `endWord`, and a dictionary `wordList`, return *the **number of words** in the **shortest transformation sequence** from `beginWord` to `endWord`, or `0` if no such sequence exists*.

constraints: |
  - `1 <= beginWord.length <= 10`
  - `endWord.length == beginWord.length`
  - `1 <= wordList.length <= 5000`
  - `wordList[i].length == beginWord.length`
  - `beginWord`, `endWord`, and `wordList[i]` consist of lowercase English letters
  - `beginWord != endWord`
  - All the words in `wordList` are **unique**

examples:
  - input: 'beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]'
    output: "5"
    explanation: 'One shortest transformation sequence is "hit" -> "hot" -> "dot" -> "dog" -> "cog", which is 5 words long.'
  - input: 'beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]'
    output: "0"
    explanation: 'The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.'

explanation:
  intuition: |
    Imagine each word as a node in a graph. Two nodes are connected by an edge if they differ by exactly one letter. The problem then becomes: **find the shortest path** from `beginWord` to `endWord` in this graph.

    Why BFS? When searching for the shortest path in an unweighted graph (where every edge has the same "cost"), **Breadth-First Search** is the ideal algorithm. BFS explores all nodes at distance 1, then all nodes at distance 2, and so on. The first time we reach `endWord`, we've guaranteed found the shortest path.

    Think of it like ripples spreading outward from a stone dropped in water. Starting from `beginWord`, we explore all words reachable by changing one letter. Then from each of those words, we explore their one-letter neighbours. The "ripple" that first touches `endWord` tells us the shortest transformation length.

    The key insight is recognising this as a **graph shortest-path problem** disguised as a string manipulation problem. Once you see the graph structure, BFS becomes the natural choice.

  approach: |
    We solve this using **Breadth-First Search (BFS)** with a word set for O(1) lookups:

    **Step 1: Handle early termination**

    - If `endWord` is not in `wordList`, return `0` immediately since no valid transformation exists
    - Convert `wordList` to a set for O(1) membership checks

    &nbsp;

    **Step 2: Initialise BFS data structures**

    - `queue`: Contains tuples of `(current_word, transformation_length)`, starting with `(beginWord, 1)`
    - `visited`: A set to track words we've already processed, preventing cycles

    &nbsp;

    **Step 3: Process the BFS queue**

    - Dequeue the front word and its current transformation length
    - If this word equals `endWord`, return the transformation length (shortest path found)
    - Otherwise, generate all possible one-letter transformations

    &nbsp;

    **Step 4: Generate neighbour words efficiently**

    - For each position in the word, try replacing it with every letter from `a` to `z`
    - If the new word exists in `wordList` and hasn't been visited:
      - Mark it as visited
      - Add it to the queue with `length + 1`

    &nbsp;

    **Step 5: Return result**

    - If the queue empties without finding `endWord`, return `0`

    &nbsp;

    This approach guarantees we find the shortest path because BFS explores all words at distance `d` before any word at distance `d+1`.

  common_pitfalls:
    - title: Using DFS Instead of BFS
      description: |
        DFS will find *a* path but not necessarily the *shortest* path. DFS explores one branch deeply before backtracking, so it might find a longer transformation sequence first.

        For example, DFS might find `hit -> hot -> lot -> log -> cog` (5 words) but miss that `hit -> hot -> dot -> dog -> cog` is equally short. Worse, on different inputs DFS could find much longer paths.

        BFS guarantees shortest path in unweighted graphs because it explores level by level.
      wrong_approach: "Use DFS with path tracking"
      correct_approach: "Use BFS to guarantee shortest path"

    - title: Comparing Every Word Pair (O(n^2) Neighbour Check)
      description: |
        A naive approach compares every word against every other word to find neighbours differing by one letter. With `n` words of length `m`, this is O(n^2 * m) just for building the graph.

        Instead, for each word, generate all possible one-letter variations and check if they exist in the word set. This is O(n * m * 26) = O(n * m), which is much faster when `n` is large.

        With `wordList.length <= 5000` and word length up to 10, the optimised approach does ~1.3M operations vs potentially 250M for the naive approach.
      wrong_approach: "Compare every pair of words"
      correct_approach: "Generate variations and check set membership"

    - title: Forgetting to Check if endWord Exists
      description: |
        If `endWord` is not in `wordList`, no valid transformation can exist. Failing to check this upfront means BFS runs to exhaustion before returning `0`.

        Always validate inputs first: `if endWord not in word_set: return 0`.

    - title: Not Marking Words as Visited
      description: |
        Without tracking visited words, BFS can revisit the same word multiple times from different paths, leading to:
        - Infinite loops in graphs with cycles
        - Exponential time complexity as the same subgraphs are explored repeatedly

        Mark words as visited **when adding to the queue**, not when dequeuing. This prevents adding duplicates to the queue.
      wrong_approach: "Process words without tracking visited"
      correct_approach: "Mark visited when enqueuing to prevent duplicates"

  key_takeaways:
    - "**Graph recognition**: Many string transformation problems are graph shortest-path problems in disguise. When you see 'minimum steps' or 'shortest sequence', think BFS"
    - "**BFS for shortest path**: In unweighted graphs, BFS guarantees the shortest path. This is fundamental and appears in many problems"
    - "**Optimise neighbour generation**: Instead of comparing all pairs, generate possible variations and check set membership. This changes O(n^2) to O(n * alphabet_size)"
    - "**Foundation for Word Ladder II**: This problem (LeetCode 126) asks for all shortest paths, requiring you to track parent pointers during BFS"

  time_complexity: "O(n * m * 26) where `n` is the number of words and `m` is the word length. For each word, we generate `m * 26` variations and check set membership in O(m) for hashing."
  space_complexity: "O(n * m). The visited set and queue can each hold up to `n` words of length `m`."

solutions:
  - approach_name: BFS with Set Lookup
    is_optimal: true
    code: |
      from collections import deque

      def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
          # Convert to set for O(1) lookups
          word_set = set(word_list)

          # Early termination: end_word must be reachable
          if end_word not in word_set:
              return 0

          # BFS setup: (current_word, transformation_count)
          queue = deque([(begin_word, 1)])
          visited = {begin_word}

          while queue:
              current_word, length = queue.popleft()

              # Try changing each character position
              for i in range(len(current_word)):
                  # Try all 26 letters
                  for c in 'abcdefghijklmnopqrstuvwxyz':
                      # Build the new word with one character changed
                      next_word = current_word[:i] + c + current_word[i+1:]

                      # Found the target!
                      if next_word == end_word:
                          return length + 1

                      # Valid unvisited word? Add to queue
                      if next_word in word_set and next_word not in visited:
                          visited.add(next_word)
                          queue.append((next_word, length + 1))

          # No path found
          return 0
    explanation: |
      **Time Complexity:** O(n * m * 26) where n is the word list size and m is word length.

      **Space Complexity:** O(n * m) for the visited set and queue.

      BFS explores words level by level, guaranteeing the first path found to `endWord` is the shortest. We optimise neighbour finding by generating all single-character variations rather than comparing against all words.

  - approach_name: Bidirectional BFS
    is_optimal: true
    code: |
      def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
          word_set = set(word_list)

          if end_word not in word_set:
              return 0

          # Search from both ends simultaneously
          front = {begin_word}
          back = {end_word}
          visited = set()
          length = 1

          while front and back:
              # Always expand the smaller frontier for efficiency
              if len(front) > len(back):
                  front, back = back, front

              next_front = set()

              for word in front:
                  for i in range(len(word)):
                      for c in 'abcdefghijklmnopqrstuvwxyz':
                          next_word = word[:i] + c + word[i+1:]

                          # Frontiers meet! Path found
                          if next_word in back:
                              return length + 1

                          if next_word in word_set and next_word not in visited:
                              visited.add(next_word)
                              next_front.add(next_word)

              front = next_front
              length += 1

          return 0
    explanation: |
      **Time Complexity:** O(n * m * 26), but often faster in practice due to smaller search space.

      **Space Complexity:** O(n * m) for the visited set and frontiers.

      Bidirectional BFS searches from both `beginWord` and `endWord` simultaneously. When the two search frontiers meet, we've found the shortest path. This reduces the search space from O(b^d) to O(b^(d/2)) where b is branching factor and d is depth, providing significant speedup on large graphs.

  - approach_name: BFS with Wildcard Preprocessing
    is_optimal: false
    code: |
      from collections import deque, defaultdict

      def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
          if end_word not in word_list:
              return 0

          # Preprocess: group words by wildcard patterns
          # "hot" -> ["*ot", "h*t", "ho*"]
          word_len = len(begin_word)
          patterns = defaultdict(list)

          for word in word_list:
              for i in range(word_len):
                  pattern = word[:i] + '*' + word[i+1:]
                  patterns[pattern].append(word)

          # BFS using pattern lookup
          queue = deque([(begin_word, 1)])
          visited = {begin_word}

          while queue:
              current_word, length = queue.popleft()

              # Find neighbours through shared patterns
              for i in range(word_len):
                  pattern = current_word[:i] + '*' + current_word[i+1:]

                  for neighbour in patterns[pattern]:
                      if neighbour == end_word:
                          return length + 1

                      if neighbour not in visited:
                          visited.add(neighbour)
                          queue.append((neighbour, length + 1))

          return 0
    explanation: |
      **Time Complexity:** O(n * m^2) for preprocessing plus O(n * m) for BFS.

      **Space Complexity:** O(n * m^2) for the pattern dictionary.

      This approach preprocesses words into "wildcard buckets" (e.g., `h*t` contains both `hot` and `hat`). Finding neighbours becomes a dictionary lookup. This trades space for faster neighbour finding but uses more memory. Best when the word list is dense (many words share patterns).