codetutor/backend/data/questions/word-break.yaml

title: Word Break
slug: word-break
difficulty: medium
leetcode_id: 139
leetcode_url: https://leetcode.com/problems/word-break/
categories:
  - dynamic-programming
  - strings
  - hash-tables
patterns:
  - dynamic-programming

description: |
  Given a string `s` and a dictionary of strings `wordDict`, return `true` if `s` can be segmented into a space-separated sequence of one or more dictionary words.

  **Note** that the same word in the dictionary may be reused multiple times in the segmentation.

constraints: |
  - `1 <= s.length <= 300`
  - `1 <= wordDict.length <= 1000`
  - `1 <= wordDict[i].length <= 20`
  - `s` and `wordDict[i]` consist of only lowercase English letters
  - All the strings of `wordDict` are **unique**

examples:
  - input: 's = "leetcode", wordDict = ["leet","code"]'
    output: "true"
    explanation: 'Return true because "leetcode" can be segmented as "leet code".'
  - input: 's = "applepenapple", wordDict = ["apple","pen"]'
    output: "true"
    explanation: 'Return true because "applepenapple" can be segmented as "apple pen apple". Note that you are allowed to reuse a dictionary word.'
  - input: 's = "catsandog", wordDict = ["cats","dog","sand","and","cat"]'
    output: "false"
    explanation: "Cannot segment the string using only words from the dictionary."

explanation:
  intuition: |
    Imagine you're reading a string with all the spaces removed, like "ilovecoding", and you need to figure out if it can be split back into valid words using a given dictionary.

    Think of it like this: you're walking through the string character by character, and at each position you ask: "Is there any dictionary word that ends right here, AND was the position just before that word the end of a valid segmentation?"

    For "leetcode" with dictionary ["leet", "code"]:
    - At position 4, we find "leet" — and position 0 (the start) is a valid starting point
    - At position 8, we find "code" — and position 4 was already marked as valid
    - Therefore, the entire string can be segmented

    This is the **optimal substructure** that makes dynamic programming work: if we know which positions in the string can be reached by valid segmentations, we can determine if new positions are reachable by checking if any dictionary word "bridges" from a known-valid position.

  approach: |
    We solve this using **Bottom-Up Dynamic Programming**:

    **Step 1: Set up for efficient lookups**

    - Convert `wordDict` to a set for O(1) lookup time
    - Create `dp` array of size `n + 1` where `dp[i]` = "can the first `i` characters be segmented?"
    - Set `dp[0] = True` as the base case: an empty string is trivially "segmented"

    &nbsp;

    **Step 2: Build up solutions for each position**

    - For each position `i` from 1 to n (where we're checking if `s[:i]` can be segmented):
      - Try each possible starting position `j` from 0 to `i-1`
      - If `dp[j]` is True (meaning `s[:j]` can be segmented), check if `s[j:i]` is in the dictionary
      - If both conditions hold, set `dp[i] = True` and break (no need to check further)

    &nbsp;

    **Step 3: Return the answer**

    - Return `dp[n]` — whether the entire string can be segmented

    &nbsp;

    This approach efficiently builds on previously computed results, avoiding redundant work through memoisation in the DP array.

  common_pitfalls:
    - title: Exponential Backtracking
      description: |
        A naive recursive approach without memoisation leads to exponential time complexity. Consider the string "aaaaaaaaab" with dictionary ["a", "aa", "aaa", ...].

        At each position, you branch into multiple recursive calls. Without caching, you'll recompute the same subproblems many times. With `n = 300`, this will **Time Limit Exceed (TLE)**.
      wrong_approach: "Pure recursion without memoisation"
      correct_approach: "Use DP array or memoisation to cache subproblem results"

    - title: Using List Instead of Set for Dictionary
      description: |
        Checking if a word exists in a list is O(m) where m is the dictionary size. With up to 1000 words, this adds significant overhead inside nested loops.

        Converting to a set gives O(1) average lookup, which can be the difference between passing and failing time limits.
      wrong_approach: "word in wordDict (list)"
      correct_approach: "word in word_set (set)"

    - title: Missing the Empty String Base Case
      description: |
        Forgetting to set `dp[0] = True` breaks the entire algorithm. The base case represents "the empty prefix is always valid" — it's the foundation from which we build all other solutions.

        Without it, no position in the string can ever become True.
      wrong_approach: "dp = [False] * (n + 1)"
      correct_approach: "dp = [False] * (n + 1); dp[0] = True"

    - title: Off-by-One Errors with String Slicing
      description: |
        The DP array is of size `n + 1` where `dp[i]` represents whether `s[:i]` (first i characters) can be segmented. Be careful that:
        - `dp[0]` corresponds to the empty string
        - `dp[n]` corresponds to the entire string `s[:n]` which is just `s`
        - When checking substring `s[j:i]`, this includes characters from index `j` up to but not including `i`
      wrong_approach: "Confusing 1-indexed vs 0-indexed positions"
      correct_approach: "dp[i] means 'first i characters can be segmented'"

  key_takeaways:
    - "**Substring segmentation pattern**: This approach generalises to problems where you must partition a string into valid segments"
    - "**Set for dictionary lookup**: Always convert word lists to sets for O(1) containment checks"
    - "**Foundation for Word Break II**: The same DP logic extends to finding all valid segmentations, not just checking if one exists"
    - "**BFS alternative**: This problem can also be modelled as a graph where each position is a node, with edges to positions reachable by dictionary words"

  time_complexity: "O(n^2 * m). For each of n positions, we check up to n previous positions, and each substring comparison takes O(m) where m is the maximum word length (up to 20). With set lookup optimisation, the inner comparison becomes O(m) for hashing."
  space_complexity: "O(n + k). The DP array uses O(n) space, and the word set uses O(k) where k is the total length of all dictionary words."

solutions:
  - approach_name: Bottom-Up DP
    is_optimal: true
    code: |
      def word_break(s: str, word_dict: list[str]) -> bool:
          # Convert to set for O(1) lookup
          word_set = set(word_dict)
          n = len(s)

          # dp[i] = True if s[:i] can be segmented
          dp = [False] * (n + 1)

          # Base case: empty string is always "segmented"
          dp[0] = True

          # Check each ending position
          for i in range(1, n + 1):
              # Try each possible starting position for the last word
              for j in range(i):
                  # If s[:j] can be segmented AND s[j:i] is a valid word
                  if dp[j] and s[j:i] in word_set:
                      dp[i] = True
                      break  # Found a valid segmentation, no need to check more

          return dp[n]
    explanation: |
      **Time Complexity:** O(n^2 * m) — Two nested loops over string length, with O(m) substring hashing.

      **Space Complexity:** O(n + k) — DP array plus word set storage.

      We build solutions from left to right. For each position i, we check if any dictionary word ends there by trying all possible starting positions j. If position j was reachable and the substring s[j:i] is a valid word, then position i is also reachable.

  - approach_name: Top-Down DP (Memoisation)
    is_optimal: false
    code: |
      def word_break(s: str, word_dict: list[str]) -> bool:
          word_set = set(word_dict)
          memo = {}

          def can_break(start: int) -> bool:
              # Base case: reached end of string
              if start == len(s):
                  return True

              # Return cached result if available
              if start in memo:
                  return memo[start]

              # Try each possible word starting at 'start'
              for end in range(start + 1, len(s) + 1):
                  word = s[start:end]
                  if word in word_set and can_break(end):
                      memo[start] = True
                      return True

              # No valid segmentation found from this position
              memo[start] = False
              return False

          return can_break(0)
    explanation: |
      **Time Complexity:** O(n^2 * m) — Same as bottom-up in the worst case.

      **Space Complexity:** O(n) — Recursion stack plus memoisation cache.

      This recursive approach with memoisation is conceptually similar to the iterative DP. We try to segment starting from index 0, and for each starting position, we try all possible first words. Memoisation prevents recomputation of subproblems. Some find this more intuitive than the bottom-up approach.

  - approach_name: BFS
    is_optimal: false
    code: |
      from collections import deque

      def word_break(s: str, word_dict: list[str]) -> bool:
          word_set = set(word_dict)
          n = len(s)

          # visited[i] = True if we've processed position i
          visited = [False] * n

          # BFS queue holds starting positions to explore
          queue = deque([0])

          while queue:
              start = queue.popleft()

              # Skip if already processed
              if visited[start]:
                  continue
              visited[start] = True

              # Try all possible words starting at 'start'
              for end in range(start + 1, n + 1):
                  if s[start:end] in word_set:
                      # Reached the end of string
                      if end == n:
                          return True
                      # Add next position to explore
                      queue.append(end)

          return False
    explanation: |
      **Time Complexity:** O(n^2 * m) — Each position visited once, with O(n * m) work per position.

      **Space Complexity:** O(n) — Visited array and queue.

      BFS models the problem as a graph traversal. Each position in the string is a node. There's an edge from position i to position j if s[i:j] is a dictionary word. We perform BFS from position 0 and check if we can reach position n. This approach is particularly intuitive for those familiar with graph algorithms.