234 lines
11 KiB
YAML
234 lines
11 KiB
YAML
title: Longest Common Subsequence
|
|
slug: longest-common-subsequence
|
|
difficulty: medium
|
|
leetcode_id: 1143
|
|
leetcode_url: https://leetcode.com/problems/longest-common-subsequence/
|
|
categories:
|
|
- strings
|
|
- dynamic-programming
|
|
patterns:
|
|
- slug: dynamic-programming
|
|
is_optimal: true
|
|
|
|
function_signature: "def longest_common_subsequence(text1: str, text2: str) -> int:"
|
|
|
|
test_cases:
|
|
visible:
|
|
- input: { text1: "abcde", text2: "ace" }
|
|
expected: 3
|
|
- input: { text1: "abc", text2: "abc" }
|
|
expected: 3
|
|
- input: { text1: "abc", text2: "def" }
|
|
expected: 0
|
|
hidden:
|
|
- input: { text1: "a", text2: "a" }
|
|
expected: 1
|
|
- input: { text1: "a", text2: "b" }
|
|
expected: 0
|
|
- input: { text1: "oxcpqrsvwf", text2: "shmtulqrypy" }
|
|
expected: 2
|
|
- input: { text1: "abcdefg", text2: "bdfxyz" }
|
|
expected: 3
|
|
- input: { text1: "aaaa", text2: "aaaa" }
|
|
expected: 4
|
|
- input: { text1: "ezupkr", text2: "ubmrapg" }
|
|
expected: 2
|
|
|
|
description: |
|
|
Given two strings `text1` and `text2`, return *the length of their longest **common subsequence***. If there is no **common subsequence**, return `0`.
|
|
|
|
A **subsequence** of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.
|
|
|
|
For example, `"ace"` is a subsequence of `"abcde"`.
|
|
|
|
A **common subsequence** of two strings is a subsequence that is common to both strings.
|
|
|
|
constraints: |
|
|
- `1 <= text1.length, text2.length <= 1000`
|
|
- `text1` and `text2` consist of only lowercase English characters.
|
|
|
|
examples:
|
|
- input: 'text1 = "abcde", text2 = "ace"'
|
|
output: "3"
|
|
explanation: 'The longest common subsequence is "ace" and its length is 3.'
|
|
- input: 'text1 = "abc", text2 = "abc"'
|
|
output: "3"
|
|
explanation: 'The longest common subsequence is "abc" and its length is 3.'
|
|
- input: 'text1 = "abc", text2 = "def"'
|
|
output: "0"
|
|
explanation: "There is no such common subsequence, so the result is 0."
|
|
|
|
explanation:
|
|
intuition: |
|
|
Imagine you're comparing two sequences of characters, trying to find the longest chain of letters that appears in both — not necessarily consecutively, but in the same relative order.
|
|
|
|
Think of it like comparing two playlists of songs. You want to find the longest sequence of songs that appears in both playlists, where the songs appear in the same order (though not necessarily back-to-back). You can't rearrange songs — you can only skip ones that don't match.
|
|
|
|
The **key insight** is that this problem has **optimal substructure**: if we know the LCS of smaller prefixes of both strings, we can build up to the answer for the full strings. When characters match, we extend our subsequence; when they don't, we take the better result from either excluding the last character of the first string or the second.
|
|
|
|
This is a classic **dynamic programming** problem because:
|
|
1. We can break it into overlapping subproblems (comparing prefixes of different lengths)
|
|
2. The solution to larger problems depends on solutions to smaller ones
|
|
3. We can store intermediate results to avoid redundant computation
|
|
|
|
approach: |
|
|
We solve this using a **2D Dynamic Programming** approach with a table where `dp[i][j]` represents the length of the LCS of `text1[0:i]` and `text2[0:j]`.
|
|
|
|
**Step 1: Create the DP table**
|
|
|
|
- Create a 2D array `dp` of size `(m+1) x (n+1)` where `m = len(text1)` and `n = len(text2)`
|
|
- The extra row and column handle the base case of empty prefixes
|
|
- Initialise all values to `0` (the LCS of any string with an empty string is `0`)
|
|
|
|
|
|
|
|
**Step 2: Fill the table using the recurrence relation**
|
|
|
|
- Iterate through each cell `dp[i][j]` for `i` from `1` to `m` and `j` from `1` to `n`
|
|
- If `text1[i-1] == text2[j-1]`: the characters match, so `dp[i][j] = dp[i-1][j-1] + 1`
|
|
- Otherwise: take the maximum of excluding one character from either string: `dp[i][j] = max(dp[i-1][j], dp[i][j-1])`
|
|
|
|
|
|
|
|
**Step 3: Return the result**
|
|
|
|
- The answer is in `dp[m][n]`, representing the LCS of the complete strings
|
|
|
|
|
|
|
|
The recurrence works because when characters match, we've found a common element and extend the LCS of the previous prefixes. When they don't match, we take the best LCS we can get by ignoring one character from either string.
|
|
|
|
common_pitfalls:
|
|
- title: Confusing Subsequence with Substring
|
|
description: |
|
|
A **substring** must be contiguous (consecutive characters), while a **subsequence** allows gaps.
|
|
|
|
For `"abcde"` and `"ace"`:
|
|
- The longest common **substring** is `"a"` or `"c"` or `"e"` (length 1)
|
|
- The longest common **subsequence** is `"ace"` (length 3)
|
|
|
|
Using a substring algorithm (like checking all contiguous windows) will give the wrong answer. LCS requires dynamic programming because we need to track non-contiguous matches.
|
|
wrong_approach: "Sliding window for contiguous matches"
|
|
correct_approach: "2D DP tracking all prefix combinations"
|
|
|
|
- title: The Brute Force Exponential Trap
|
|
description: |
|
|
A naive approach might try all possible subsequences of one string and check if each exists in the other.
|
|
|
|
For a string of length `n`, there are `2^n` possible subsequences. With constraints up to `1000` characters, `2^1000` operations is astronomically impossible.
|
|
|
|
Even with recursion and memoisation, without proper caching you'll recompute the same subproblems many times. The DP table ensures each subproblem is solved exactly once.
|
|
wrong_approach: "Generate all subsequences and check membership"
|
|
correct_approach: "Bottom-up DP with O(m*n) time"
|
|
|
|
- title: Off-by-One Index Errors
|
|
description: |
|
|
The DP table has dimensions `(m+1) x (n+1)` to include the empty prefix base case.
|
|
|
|
When comparing characters, use `text1[i-1]` and `text2[j-1]` (not `text1[i]` and `text2[j]`) because `dp[i][j]` represents prefixes of length `i` and `j`.
|
|
|
|
A common mistake is using `text1[i]` which causes an index out of bounds error or compares the wrong characters.
|
|
wrong_approach: "Compare text1[i] with text2[j] directly"
|
|
correct_approach: "Compare text1[i-1] with text2[j-1] when filling dp[i][j]"
|
|
|
|
key_takeaways:
|
|
- "**Classic DP problem**: LCS is a foundational dynamic programming problem that appears in many variations (edit distance, diff algorithms, DNA sequence alignment)"
|
|
- "**2D table pattern**: When comparing two sequences, a 2D DP table where `dp[i][j]` represents the answer for prefixes of length `i` and `j` is a common technique"
|
|
- "**Optimal substructure**: Match = extend previous result by 1; no match = take the best of two subproblems"
|
|
- "**Space optimisation possible**: Since each row only depends on the previous row, you can reduce space from O(m*n) to O(min(m,n)) using rolling arrays"
|
|
|
|
time_complexity: "O(m * n). We fill each cell of the `m x n` DP table exactly once, where `m` and `n` are the lengths of the two strings."
|
|
space_complexity: "O(m * n). We use a 2D array of size `(m+1) x (n+1)` to store intermediate results. This can be optimised to O(min(m, n)) using a rolling array since we only need the previous row."
|
|
|
|
solutions:
|
|
- approach_name: 2D Dynamic Programming
|
|
is_optimal: true
|
|
code: |
|
|
def longest_common_subsequence(text1: str, text2: str) -> int:
|
|
m, n = len(text1), len(text2)
|
|
|
|
# Create DP table with extra row/col for empty string base case
|
|
# dp[i][j] = LCS length of text1[0:i] and text2[0:j]
|
|
dp = [[0] * (n + 1) for _ in range(m + 1)]
|
|
|
|
# Fill the table row by row
|
|
for i in range(1, m + 1):
|
|
for j in range(1, n + 1):
|
|
if text1[i - 1] == text2[j - 1]:
|
|
# Characters match: extend LCS from diagonal
|
|
dp[i][j] = dp[i - 1][j - 1] + 1
|
|
else:
|
|
# No match: take best of excluding one char from either string
|
|
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
|
|
|
|
# Answer is LCS of complete strings
|
|
return dp[m][n]
|
|
explanation: |
|
|
**Time Complexity:** O(m * n) — We iterate through every cell in the DP table once.
|
|
|
|
**Space Complexity:** O(m * n) — We store the full 2D DP table.
|
|
|
|
This bottom-up approach builds the solution systematically. Each cell depends only on already-computed cells (top, left, and diagonal), so we fill row by row. The final cell contains the answer for the complete strings.
|
|
|
|
- approach_name: Space-Optimised DP (Rolling Array)
|
|
is_optimal: true
|
|
code: |
|
|
def longest_common_subsequence(text1: str, text2: str) -> int:
|
|
# Ensure text2 is the shorter string to minimise space
|
|
if len(text1) < len(text2):
|
|
text1, text2 = text2, text1
|
|
|
|
m, n = len(text1), len(text2)
|
|
|
|
# Only keep two rows: previous and current
|
|
prev = [0] * (n + 1)
|
|
curr = [0] * (n + 1)
|
|
|
|
for i in range(1, m + 1):
|
|
for j in range(1, n + 1):
|
|
if text1[i - 1] == text2[j - 1]:
|
|
# Match: extend from diagonal (prev row, prev column)
|
|
curr[j] = prev[j - 1] + 1
|
|
else:
|
|
# No match: best of top (prev[j]) or left (curr[j-1])
|
|
curr[j] = max(prev[j], curr[j - 1])
|
|
|
|
# Roll the arrays: current becomes previous for next iteration
|
|
prev, curr = curr, prev
|
|
|
|
# Answer is in prev (after the swap)
|
|
return prev[n]
|
|
explanation: |
|
|
**Time Complexity:** O(m * n) — Same iteration as the 2D approach.
|
|
|
|
**Space Complexity:** O(min(m, n)) — Only two arrays of length `n+1` are used.
|
|
|
|
Since each row only depends on the immediately previous row, we can discard older rows. By swapping `prev` and `curr` after each row, we maintain a "rolling window" of just two rows. We also swap strings if needed to ensure we use the shorter length for our arrays.
|
|
|
|
- approach_name: Recursive with Memoisation
|
|
is_optimal: false
|
|
code: |
|
|
def longest_common_subsequence(text1: str, text2: str) -> int:
|
|
from functools import lru_cache
|
|
|
|
@lru_cache(maxsize=None)
|
|
def lcs(i: int, j: int) -> int:
|
|
# Base case: empty prefix
|
|
if i == 0 or j == 0:
|
|
return 0
|
|
|
|
# Characters match: include in LCS
|
|
if text1[i - 1] == text2[j - 1]:
|
|
return lcs(i - 1, j - 1) + 1
|
|
|
|
# No match: try excluding from each string
|
|
return max(lcs(i - 1, j), lcs(i, j - 1))
|
|
|
|
return lcs(len(text1), len(text2))
|
|
explanation: |
|
|
**Time Complexity:** O(m * n) — Each unique `(i, j)` state is computed once due to memoisation.
|
|
|
|
**Space Complexity:** O(m * n) — For the memoisation cache, plus O(m + n) for the recursion stack.
|
|
|
|
This top-down approach is more intuitive but uses more memory due to the recursion stack. It's useful for understanding the problem structure but the iterative DP solution is generally preferred in interviews for its predictable space usage and no risk of stack overflow.
|