Files
codetutor/backend/data/questions/interleaving-string.yaml

276 lines
12 KiB
YAML

title: Interleaving String
slug: interleaving-string
difficulty: medium
leetcode_id: 97
leetcode_url: https://leetcode.com/problems/interleaving-string/
categories:
- strings
- dynamic-programming
patterns:
- dynamic-programming
function_signature: "def is_interleave(s1: str, s2: str, s3: str) -> bool:"
test_cases:
visible:
- input: { s1: "aabcc", s2: "dbbca", s3: "aadbbcbcac" }
expected: true
- input: { s1: "aabcc", s2: "dbbca", s3: "aadbbbaccc" }
expected: false
- input: { s1: "", s2: "", s3: "" }
expected: true
hidden:
- input: { s1: "a", s2: "b", s3: "ab" }
expected: true
- input: { s1: "a", s2: "b", s3: "ba" }
expected: true
- input: { s1: "abc", s2: "def", s3: "abcdef" }
expected: true
- input: { s1: "", s2: "abc", s3: "abc" }
expected: true
- input: { s1: "abc", s2: "", s3: "abc" }
expected: true
- input: { s1: "a", s2: "b", s3: "abc" }
expected: false
- input: { s1: "aaa", s2: "aaa", s3: "aaaaaa" }
expected: true
description: |
Given strings `s1`, `s2`, and `s3`, find whether `s3` is formed by an **interleaving** of `s1` and `s2`.
An **interleaving** of two strings `s` and `t` is a configuration where `s` and `t` are divided into `n` and `m` substrings respectively, such that:
- `s = s1 + s2 + ... + sn`
- `t = t1 + t2 + ... + tm`
- `|n - m| <= 1`
- The **interleaving** is `s1 + t1 + s2 + t2 + s3 + t3 + ...` or `t1 + s1 + t2 + s2 + t3 + s3 + ...`
**Note:** `a + b` is the concatenation of strings `a` and `b`.
constraints: |
- `0 <= s1.length, s2.length <= 100`
- `0 <= s3.length <= 200`
- `s1`, `s2`, and `s3` consist of lowercase English letters.
examples:
- input: 's1 = "aabcc", s2 = "dbbca", s3 = "aadbbcbcac"'
output: "true"
explanation: "One way to obtain s3 is: Split s1 into s1 = \"aa\" + \"bc\" + \"c\", and s2 into s2 = \"dbbc\" + \"a\". Interleaving the two splits, we get \"aa\" + \"dbbc\" + \"bc\" + \"a\" + \"c\" = \"aadbbcbcac\"."
- input: 's1 = "aabcc", s2 = "dbbca", s3 = "aadbbbaccc"'
output: "false"
explanation: "It is impossible to interleave s2 with any other string to obtain s3."
- input: 's1 = "", s2 = "", s3 = ""'
output: "true"
explanation: "Two empty strings trivially interleave to form an empty string."
explanation:
intuition: |
Imagine you have two decks of cards (representing `s1` and `s2`), and you want to merge them into a single pile (`s3`) while preserving the relative order of cards within each original deck. The question is: can `s3` be formed by picking cards alternately (or in any valid interleaving pattern) from the tops of these two decks?
The key insight is that at any point in building `s3`, you have a **choice**: take the next character from `s1` or from `s2`. This decision tree branches exponentially, but many branches lead to the same "state" — defined by how many characters we've used from each string.
Think of it as navigating a 2D grid where the x-axis represents progress through `s1` and the y-axis represents progress through `s2`. Starting at `(0, 0)`, you want to reach `(len(s1), len(s2))`. At each cell `(i, j)`, you can move right (use a character from `s1`) or down (use a character from `s2`) — but only if that character matches the next character needed in `s3`.
This grid perspective reveals the **optimal substructure**: whether we can reach `(i, j)` depends only on whether we could reach `(i-1, j)` or `(i, j-1)` with a matching character. This is the hallmark of dynamic programming.
approach: |
We solve this using **2D Dynamic Programming**:
**Step 1: Early termination check**
- If `len(s1) + len(s2) != len(s3)`, return `False` immediately — the lengths don't match, so interleaving is impossible
&nbsp;
**Step 2: Initialize the DP table**
- Create a 2D boolean table `dp` of size `(len(s1) + 1) x (len(s2) + 1)`
- `dp[i][j]` represents: "Can the first `i` characters of `s1` and first `j` characters of `s2` interleave to form the first `i + j` characters of `s3`?"
- Set `dp[0][0] = True` — empty strings trivially interleave to form an empty string
&nbsp;
**Step 3: Fill the first row and column**
- First row (`dp[0][j]`): Can `s2[:j]` alone form `s3[:j]`? Only if all characters match sequentially
- First column (`dp[i][0]`): Can `s1[:i]` alone form `s3[:i]`? Only if all characters match sequentially
- These represent paths that use only one string
&nbsp;
**Step 4: Fill the rest of the table**
- For each cell `dp[i][j]`, check two possibilities:
- **From the left** (`dp[i-1][j]`): If we could form `s3[:i+j-1]` and `s1[i-1] == s3[i+j-1]`, then `dp[i][j] = True`
- **From above** (`dp[i][j-1]`): If we could form `s3[:i+j-1]` and `s2[j-1] == s3[i+j-1]`, then `dp[i][j] = True`
- Either path being valid makes the current state valid
&nbsp;
**Step 5: Return the answer**
- Return `dp[len(s1)][len(s2)]` — whether we can use all of both strings to form all of `s3`
common_pitfalls:
- title: Exponential Brute Force
description: |
A naive recursive approach tries every possible way to interleave:
- At each position in `s3`, try matching with `s1` or `s2`
- This leads to `2^(m+n)` possibilities in the worst case
With `s1.length, s2.length <= 100`, this means up to `2^200` operations — astronomically too slow. The key insight is that many recursive calls compute the same subproblem (same `(i, j)` position), making this a perfect candidate for memoization or bottom-up DP.
wrong_approach: "Recursive backtracking without memoization"
correct_approach: "Dynamic programming with O(m*n) states"
- title: Forgetting the Length Check
description: |
If `len(s1) + len(s2) != len(s3)`, it's impossible to interleave — every character from `s1` and `s2` must appear exactly once in `s3`.
Without this early check, your DP might give false positives for cases like `s1 = "a"`, `s2 = "b"`, `s3 = "ab"` (valid) vs `s3 = "abc"` (invalid — extra character). Always verify lengths first.
wrong_approach: "Skip length validation"
correct_approach: "Check len(s1) + len(s2) == len(s3) upfront"
- title: Off-by-One Index Errors
description: |
The DP table has dimensions `(m+1) x (n+1)` to handle empty prefixes. When accessing characters:
- `dp[i][j]` uses `s1[i-1]` and `s2[j-1]` (0-indexed strings)
- The corresponding `s3` character is at index `i + j - 1`
Confusing 0-indexed strings with 1-indexed DP indices is a common source of bugs. Draw the grid and trace through an example to verify your indexing.
wrong_approach: "Using dp[i][j] with s1[i] and s2[j]"
correct_approach: "Using dp[i][j] with s1[i-1] and s2[j-1]"
key_takeaways:
- "**2D DP for two sequences**: When combining or comparing two sequences, think of a 2D grid where axes represent progress through each sequence"
- "**State definition is crucial**: Here, `dp[i][j]` captures whether prefixes of length `i` and `j` can form a prefix of `s3` — a clean, sufficient state"
- "**Space optimization possible**: The follow-up asks for `O(s2.length)` space — since each row only depends on the previous row and current row, you can use a 1D array"
- "**Early termination**: Simple checks like length validation can save significant computation and handle edge cases cleanly"
time_complexity: "O(m * n). We fill a 2D table of size `(m+1) x (n+1)` where `m = len(s1)` and `n = len(s2)`, with O(1) work per cell."
space_complexity: "O(m * n). We store a 2D boolean table. This can be optimized to O(n) by using a 1D array and updating in-place."
solutions:
- approach_name: 2D Dynamic Programming
is_optimal: true
code: |
def is_interleave(s1: str, s2: str, s3: str) -> bool:
m, n = len(s1), len(s2)
# Early termination: lengths must match
if m + n != len(s3):
return False
# dp[i][j] = can s1[:i] and s2[:j] interleave to form s3[:i+j]?
dp = [[False] * (n + 1) for _ in range(m + 1)]
# Base case: empty strings form empty string
dp[0][0] = True
# Fill first column: using only s1
for i in range(1, m + 1):
dp[i][0] = dp[i - 1][0] and s1[i - 1] == s3[i - 1]
# Fill first row: using only s2
for j in range(1, n + 1):
dp[0][j] = dp[0][j - 1] and s2[j - 1] == s3[j - 1]
# Fill the rest of the table
for i in range(1, m + 1):
for j in range(1, n + 1):
# Current position in s3
k = i + j - 1
# Can we get here from the left (using s1[i-1])?
from_s1 = dp[i - 1][j] and s1[i - 1] == s3[k]
# Can we get here from above (using s2[j-1])?
from_s2 = dp[i][j - 1] and s2[j - 1] == s3[k]
dp[i][j] = from_s1 or from_s2
return dp[m][n]
explanation: |
**Time Complexity:** O(m * n) — We iterate through every cell in the DP table once.
**Space Complexity:** O(m * n) — We store the full 2D table.
This solution builds up the answer by considering all valid ways to consume characters from `s1` and `s2`. Each cell represents a subproblem that's computed exactly once.
- approach_name: Space-Optimized DP (1D Array)
is_optimal: true
code: |
def is_interleave(s1: str, s2: str, s3: str) -> bool:
m, n = len(s1), len(s2)
# Early termination: lengths must match
if m + n != len(s3):
return False
# Use 1D array: dp[j] represents dp[i][j] for current row i
dp = [False] * (n + 1)
# Fill the DP table row by row
for i in range(m + 1):
for j in range(n + 1):
if i == 0 and j == 0:
dp[j] = True
elif i == 0:
# First row: only using s2
dp[j] = dp[j - 1] and s2[j - 1] == s3[j - 1]
elif j == 0:
# First column: only using s1
dp[j] = dp[j] and s1[i - 1] == s3[i - 1]
else:
# General case: from left (s1) or from above (s2)
k = i + j - 1
dp[j] = (dp[j] and s1[i - 1] == s3[k]) or \
(dp[j - 1] and s2[j - 1] == s3[k])
return dp[n]
explanation: |
**Time Complexity:** O(m * n) — Same iteration as 2D approach.
**Space Complexity:** O(n) — Only one row of the DP table is stored.
This answers the follow-up question. Since each row only depends on the current and previous row values, we can overwrite the array in-place. `dp[j]` holds the "from above" value before we update it, and `dp[j-1]` holds the already-updated "from left" value.
- approach_name: Recursive with Memoization
is_optimal: false
code: |
def is_interleave(s1: str, s2: str, s3: str) -> bool:
m, n = len(s1), len(s2)
if m + n != len(s3):
return False
# Memoization cache
memo = {}
def dp(i: int, j: int) -> bool:
# Base case: used all characters
if i == m and j == n:
return True
# Check cache
if (i, j) in memo:
return memo[(i, j)]
k = i + j # Current position in s3
result = False
# Try using next character from s1
if i < m and s1[i] == s3[k]:
result = dp(i + 1, j)
# Try using next character from s2
if not result and j < n and s2[j] == s3[k]:
result = dp(i, j + 1)
memo[(i, j)] = result
return result
return dp(0, 0)
explanation: |
**Time Complexity:** O(m * n) — Each unique `(i, j)` pair is computed once.
**Space Complexity:** O(m * n) — For the memoization cache, plus O(m + n) recursion stack.
This top-down approach is often more intuitive to write. It explores the decision tree but caches results to avoid redundant computation. The bottom-up DP is generally preferred for avoiding stack overflow on large inputs.