codetutor/backend/data/questions/check-if-an-original-string-exists-given-two-encoded-strings.yaml

title: Check if an Original String Exists Given Two Encoded Strings
slug: check-if-an-original-string-exists-given-two-encoded-strings
difficulty: hard
leetcode_id: 2060
leetcode_url: https://leetcode.com/problems/check-if-an-original-string-exists-given-two-encoded-strings/
categories:
  - strings
  - dynamic-programming
patterns:
  - dynamic-programming

description: |
  An original string, consisting of lowercase English letters, can be encoded by the following steps:

  - Arbitrarily **split** it into a **sequence** of some number of **non-empty** substrings.
  - Arbitrarily choose some elements (possibly none) of the sequence, and **replace** each with **its length** (as a numeric string).
  - **Concatenate** the sequence as the encoded string.

  For example, **one way** to encode an original string `"abcdefghijklmnop"` might be:

  - Split it as a sequence: `["ab", "cdefghijklmn", "o", "p"]`.
  - Choose the second and third elements to be replaced by their lengths, respectively. The sequence becomes `["ab", "12", "1", "p"]`.
  - Concatenate the elements of the sequence to get the encoded string: `"ab121p"`.

  Given two encoded strings `s1` and `s2`, consisting of lowercase English letters and digits `1-9` (inclusive), return `true` *if there exists an original string that could be encoded as **both*** `s1` *and* `s2`. Otherwise, return `false`.

  **Note:** The test cases are generated such that the number of consecutive digits in `s1` and `s2` does not exceed `3`.

constraints: |
  - `1 <= s1.length, s2.length <= 40`
  - `s1` and `s2` consist of digits `1-9` (inclusive), and lowercase English letters only.
  - The number of consecutive digits in `s1` and `s2` does not exceed `3`.

examples:
  - input: 's1 = "internationalization", s2 = "i18n"'
    output: "true"
    explanation: "It is possible that 'internationalization' was the original string. s1 keeps the full string unchanged, while s2 splits it as ['i', 'nternationalizatio', 'n'] and replaces the middle part with its length '18'."
  - input: 's1 = "l123e", s2 = "44"'
    output: "true"
    explanation: "It is possible that 'leetcode' was the original string. s1 encodes it as ['l', '1', '2', '3', 'e'] and s2 encodes it as ['4', '4'] (two groups of 4 characters each)."
  - input: 's1 = "a5b", s2 = "c5b"'
    output: "false"
    explanation: "The original string encoded as s1 must start with 'a', but the original string encoded as s2 must start with 'c'. These are incompatible."

explanation:
  intuition: |
    Imagine you're trying to synchronise two tape recorders playing the same song, but each has been compressed in different ways. Some parts are played normally (letters), while other parts are fast-forwarded (numbers representing skipped characters).

    The key insight is that we need to track a **difference** in position between the two strings as we process them. When one string has a number, it's essentially saying "skip ahead by this many characters in the original." When both strings have letters, they must match exactly.

    Think of it like this: if `s1` says "skip 5 characters" and `s2` says "the next character is 'a'", then one of those 5 skipped characters in `s1` could be the 'a' that `s2` is showing. We track this positional difference and explore all possibilities.

    The challenge is that numbers can combine in many ways (e.g., "12" could mean 12 characters, or "1" followed by "2" meaning 1+2=3 characters). We use dynamic programming with memoisation to avoid recomputing the same states.

  approach: |
    We solve this using **Dynamic Programming with Memoisation**:

    **Step 1: Define the state**

    - `i`: Current position in `s1`
    - `j`: Current position in `s2`
    - `diff`: The difference in "pending" characters between the two strings
      - Positive `diff`: `s1` has `diff` more characters to consume before matching position with `s2`
      - Negative `diff`: `s2` has `|diff|` more characters to consume
      - Zero `diff`: Both strings are at the same position in the original string

    &nbsp;

    **Step 2: Handle the base case**

    - If both `i == len(s1)` and `j == len(s2)` and `diff == 0`, we've successfully matched both strings to the same original string
    - Return `True` in this case

    &nbsp;

    **Step 3: Process each state recursively**

    - **If `diff > 0`**: `s1` is ahead, so we need `s2` to catch up
      - If `s2[j]` is a digit, parse all consecutive digits and add to `s2`'s count (reduce `diff`)
      - If `s2[j]` is a letter, it consumes one character from the difference (`diff -= 1`)

    - **If `diff < 0`**: `s2` is ahead, so we need `s1` to catch up
      - If `s1[i]` is a digit, parse all consecutive digits and add to `s1`'s count (increase `diff`)
      - If `s1[i]` is a letter, it consumes one character from the difference (`diff += 1`)

    - **If `diff == 0`**: Both are synchronised
      - If both have letters, they must match exactly
      - If one has a digit, parse it and update the difference accordingly

    &nbsp;

    **Step 4: Use memoisation**

    - Cache results for `(i, j, diff)` states to avoid redundant computation
    - The `diff` range is bounded by the maximum possible encoded length (up to 999 per number group)

    &nbsp;

    This approach systematically explores all valid ways the two encoded strings could represent the same original string.

  common_pitfalls:
    - title: Mishandling Multi-Digit Numbers
      description: |
        The encoded strings can have up to 3 consecutive digits (e.g., "123" meaning 123 characters). A common mistake is to process digits one at a time instead of considering all possible interpretations.

        For example, "12" could be:
        - The number 12 (twelve characters)
        - The number 1 followed by the number 2 (1 + 2 = 3 characters)

        You must explore **all valid splits** of consecutive digits, not just treat them as a single number.
      wrong_approach: "Always parsing consecutive digits as one number"
      correct_approach: "Try all possible ways to split digit sequences"

    - title: Forgetting Negative Diff Values
      description: |
        The difference `diff` can be negative when `s2` is "ahead" of `s1`. If you only track positive differences, you'll miss valid matches.

        For example, if `s1 = "a2b"` and `s2 = "3b"`, after processing 'a' from `s1` and '3' from `s2`, the diff becomes -2 (s2 has 2 more pending characters). The 'a' from `s1` consumes one of those, leaving diff = -1.
      wrong_approach: "Only tracking when s1 is ahead"
      correct_approach: "Track both positive and negative differences"

    - title: Not Handling Empty Remaining Strings
      description: |
        When one string is exhausted but the other still has content, you need to handle this carefully.

        If `s1` is exhausted but `diff > 0` and `s2` has remaining digits, those digits might exactly cancel out the difference. Similarly for the reverse case.

        The recursion must continue until both strings are exhausted AND `diff == 0`.
      wrong_approach: "Returning False immediately when one string is exhausted"
      correct_approach: "Continue processing until both strings are exhausted with diff = 0"

  key_takeaways:
    - "**State-based DP**: When comparing two sequences with flexible interpretations, define a state that captures the essential difference between positions"
    - "**Difference tracking**: Instead of tracking absolute positions in a hypothetical original string, track the relative difference between the two encoded strings"
    - "**Exploring all splits**: When parsing numbers, consider all valid ways to split consecutive digits, not just the maximum length interpretation"
    - "**Memoisation is essential**: The state space is bounded (positions in both strings plus a bounded difference), making memoisation highly effective"

  time_complexity: "O(n * m * D) where `n` and `m` are the lengths of `s1` and `s2`, and `D` is the range of possible difference values (bounded by 999 since numbers have at most 3 digits)."
  space_complexity: "O(n * m * D) for the memoisation cache storing results for each unique `(i, j, diff)` state."

solutions:
  - approach_name: Dynamic Programming with Memoisation
    is_optimal: true
    code: |
      def possiblyEquals(s1: str, s2: str) -> bool:
          from functools import lru_cache

          @lru_cache(maxsize=None)
          def dp(i: int, j: int, diff: int) -> bool:
              # Base case: both strings exhausted and difference is zero
              if i == len(s1) and j == len(s2):
                  return diff == 0

              # If diff > 0, s1 is ahead - s2 needs to catch up
              if diff > 0:
                  if j < len(s2):
                      if s2[j].isdigit():
                          # Try all possible ways to parse consecutive digits
                          num = 0
                          for k in range(j, len(s2)):
                              if not s2[k].isdigit():
                                  break
                              num = num * 10 + int(s2[k])
                              # s2 advances by 'num' characters, reducing diff
                              if dp(i, k + 1, diff - num):
                                  return True
                      else:
                          # s2 has a letter - it consumes one from the difference
                          if dp(i, j + 1, diff - 1):
                              return True
                  return False

              # If diff < 0, s2 is ahead - s1 needs to catch up
              if diff < 0:
                  if i < len(s1):
                      if s1[i].isdigit():
                          # Try all possible ways to parse consecutive digits
                          num = 0
                          for k in range(i, len(s1)):
                              if not s1[k].isdigit():
                                  break
                              num = num * 10 + int(s1[k])
                              # s1 advances by 'num' characters, increasing diff
                              if dp(k + 1, j, diff + num):
                                  return True
                      else:
                          # s1 has a letter - it consumes one from the difference
                          if dp(i + 1, j, diff + 1):
                              return True
                  return False

              # diff == 0: both are synchronised
              # Both exhausted - already handled in base case
              if i == len(s1):
                  # s1 exhausted, s2 must have digits to create difference
                  if s2[j].isdigit():
                      num = 0
                      for k in range(j, len(s2)):
                          if not s2[k].isdigit():
                              break
                          num = num * 10 + int(s2[k])
                          if dp(i, k + 1, -num):
                              return True
                  return False

              if j == len(s2):
                  # s2 exhausted, s1 must have digits to create difference
                  if s1[i].isdigit():
                      num = 0
                      for k in range(i, len(s1)):
                          if not s1[k].isdigit():
                              break
                          num = num * 10 + int(s1[k])
                          if dp(k + 1, j, num):
                              return True
                  return False

              # Both have characters at current positions
              if s1[i].isdigit() and s2[j].isdigit():
                  # Both are digits - try all combinations
                  for k1 in range(i, len(s1)):
                      if not s1[k1].isdigit():
                          break
                      num1 = int(s1[i:k1+1])
                      for k2 in range(j, len(s2)):
                          if not s2[k2].isdigit():
                              break
                          num2 = int(s2[j:k2+1])
                          if dp(k1 + 1, k2 + 1, num1 - num2):
                              return True
                  return False

              if s1[i].isdigit():
                  # s1 has digit, s2 has letter
                  num = 0
                  for k in range(i, len(s1)):
                      if not s1[k].isdigit():
                          break
                      num = num * 10 + int(s1[k])
                      if dp(k + 1, j, num):
                          return True
                  return False

              if s2[j].isdigit():
                  # s2 has digit, s1 has letter
                  num = 0
                  for k in range(j, len(s2)):
                      if not s2[k].isdigit():
                          break
                      num = num * 10 + int(s2[k])
                      if dp(i, k + 1, -num):
                          return True
                  return False

              # Both are letters - they must match
              if s1[i] == s2[j]:
                  return dp(i + 1, j + 1, 0)
              return False

          return dp(0, 0, 0)
    explanation: |
      **Time Complexity:** O(n * m * D) where n and m are string lengths and D is the difference range (up to 999).

      **Space Complexity:** O(n * m * D) for memoisation cache.

      We use recursion with memoisation to explore all valid ways the two encoded strings could represent the same original string. The key insight is tracking the "difference" between positions - when one string encodes characters as a number, it creates a positional difference that the other string must eventually match.