questions M-R

This commit is contained in:
2025-05-25 12:43:25 +01:00
parent 917c371529
commit 68699f35ec
62 changed files with 12841 additions and 0 deletions

View File

@@ -0,0 +1,248 @@
title: Regular Expression Matching
slug: regular-expression-matching
difficulty: hard
leetcode_id: 10
leetcode_url: https://leetcode.com/problems/regular-expression-matching/
categories:
- strings
- dynamic-programming
- recursion
patterns:
- dynamic-programming
description: |
Given an input string `s` and a pattern `p`, implement regular expression matching with support for `'.'` and `'*'` where:
- `'.'` Matches any single character.
- `'*'` Matches zero or more of the preceding element.
The matching should cover the **entire** input string (not partial).
constraints: |
- `1 <= s.length <= 20`
- `1 <= p.length <= 20`
- `s` contains only lowercase English letters.
- `p` contains only lowercase English letters, `'.'`, and `'*'`.
- It is guaranteed for each appearance of the character `'*'`, there will be a previous valid character to match.
examples:
- input: 's = "aa", p = "a"'
output: "false"
explanation: '"a" does not match the entire string "aa".'
- input: 's = "aa", p = "a*"'
output: "true"
explanation: '"*" means zero or more of the preceding element, "a". Therefore, by repeating "a" once, it becomes "aa".'
- input: 's = "ab", p = ".*"'
output: "true"
explanation: '".*" means "zero or more (*) of any character (.)".'
explanation:
intuition: |
Think of this problem as a **decision tree** where at each step you must decide how to match the current characters.
The key insight is that the `'*'` wildcard creates **branching possibilities**: when you see a pattern like `a*`, you can either:
1. **Use zero occurrences** of `a` (skip `a*` entirely and move on in the pattern)
2. **Use one or more occurrences** of `a` (if the current string character matches, consume it and keep the `a*` available for more matches)
This branching nature makes the problem a natural fit for **recursion** with **memoisation** (or bottom-up dynamic programming). Without memoisation, you'd repeatedly solve the same subproblems, leading to exponential time complexity.
The `'.'` wildcard is simpler: it just matches any single character, so treat it as a "universal match" when comparing characters.
The mental model is: "At each position, what are my options, and does *any* combination of choices lead to a full match?"
approach: |
We solve this using **Dynamic Programming** with a 2D table:
**Step 1: Define the DP state**
- `dp[i][j]`: Whether `s[0:i]` matches `p[0:j]`
- Our answer will be `dp[len(s)][len(p)]`
&nbsp;
**Step 2: Initialise the base cases**
- `dp[0][0] = True`: Empty string matches empty pattern
- `dp[0][j]`: Empty string can match patterns like `a*b*c*` where each `x*` uses zero occurrences
- `dp[i][0] = False` for `i > 0`: Non-empty string cannot match empty pattern
&nbsp;
**Step 3: Fill the DP table**
For each cell `dp[i][j]`, we consider the current pattern character `p[j-1]`:
- **Case 1: `p[j-1]` is `'*'`** (star wildcard)
- *Option A*: Use zero occurrences of the preceding element: `dp[i][j] = dp[i][j-2]`
- *Option B*: Use one or more occurrences (only if `s[i-1]` matches `p[j-2]`): `dp[i][j] = dp[i-1][j]`
- We take the OR of both options
- **Case 2: `p[j-1]` is `'.'` or a letter**
- Check if `s[i-1]` matches `p[j-1]` (either same letter or `'.'`)
- If match: `dp[i][j] = dp[i-1][j-1]`
- If no match: `dp[i][j] = False`
&nbsp;
**Step 4: Return the result**
- Return `dp[len(s)][len(p)]`
common_pitfalls:
- title: Mishandling the Star Wildcard
description: |
The `'*'` doesn't stand alone; it modifies the **preceding character**. A common mistake is treating `*` as "match anything" like in shell globbing.
In regex matching, `a*` means "zero or more `a`s", not "anything". The pattern `.*` means "zero or more of any character" because `.` matches any single character.
Always process `*` together with its preceding character as a single unit.
wrong_approach: "Treating * as an independent wildcard"
correct_approach: "Process * with its preceding character as a unit"
- title: Forgetting the Zero-Match Case
description: |
When you see `x*` in the pattern, you might only consider matching one or more `x`s. But `*` means **zero or more**, so you must also consider skipping `x*` entirely.
For example, matching `s = "aab"` against `p = "c*a*b"`:
- `c*` matches zero `c`s
- `a*` matches two `a`s
- `b` matches `b`
Missing the zero-match case will cause incorrect results.
wrong_approach: "Only considering one or more matches for x*"
correct_approach: "Always consider both zero matches (skip) and one-or-more matches"
- title: Incorrect Base Case for Empty String
description: |
An empty string `s` can still match certain patterns. For example:
- `s = ""` matches `p = "a*"` (zero `a`s)
- `s = ""` matches `p = "a*b*c*"` (zero of each)
You must carefully initialise `dp[0][j]` by checking if `p[0:j]` can match an empty string. This happens when the pattern consists entirely of `x*` pairs.
wrong_approach: "Assuming empty string only matches empty pattern"
correct_approach: "Check if pattern can reduce to empty via x* zero-matches"
- title: Off-by-One Errors in Indexing
description: |
The DP table has dimensions `(len(s)+1) x (len(p)+1)` to handle empty string/pattern cases. When accessing `s[i-1]` or `p[j-1]` from `dp[i][j]`, it's easy to make indexing mistakes.
Be consistent: `dp[i][j]` represents matching `s[0:i]` with `p[0:j]`, so the "current" characters are `s[i-1]` and `p[j-1]`.
key_takeaways:
- "**DP on two sequences**: When matching/comparing two strings, think of a 2D DP table where `dp[i][j]` represents the answer for prefixes `s[0:i]` and `p[0:j]`"
- "**Handle wildcards as units**: `*` modifies its preceding character; process them together"
- "**Consider all branches**: The `*` creates branching (zero vs. one-or-more matches); use OR logic to combine possibilities"
- "**Foundation for harder problems**: This pattern extends to wildcard matching, edit distance, and other two-string DP problems"
time_complexity: "O(m * n). We fill a 2D table of size `(len(s)+1) x (len(p)+1)`, and each cell takes O(1) time."
space_complexity: "O(m * n). We use a 2D DP table. This can be optimised to O(n) using rolling arrays since we only need the previous row."
solutions:
- approach_name: Dynamic Programming (Bottom-Up)
is_optimal: true
code: |
def is_match(s: str, p: str) -> bool:
m, n = len(s), len(p)
# dp[i][j] = True if s[0:i] matches p[0:j]
dp = [[False] * (n + 1) for _ in range(m + 1)]
# Base case: empty string matches empty pattern
dp[0][0] = True
# Base case: empty string can match patterns like a*, a*b*, etc.
for j in range(2, n + 1):
# If current char is *, we can use zero occurrences of preceding char
if p[j - 1] == '*':
dp[0][j] = dp[0][j - 2]
# Fill the DP table
for i in range(1, m + 1):
for j in range(1, n + 1):
if p[j - 1] == '*':
# Option 1: use zero occurrences of preceding element
dp[i][j] = dp[i][j - 2]
# Option 2: use one or more (if current char matches preceding pattern char)
if p[j - 2] == '.' or p[j - 2] == s[i - 1]:
dp[i][j] = dp[i][j] or dp[i - 1][j]
elif p[j - 1] == '.' or p[j - 1] == s[i - 1]:
# Direct match: current chars match
dp[i][j] = dp[i - 1][j - 1]
# else: dp[i][j] remains False (no match)
return dp[m][n]
explanation: |
**Time Complexity:** O(m * n) — We fill each cell of the `(m+1) x (n+1)` table exactly once.
**Space Complexity:** O(m * n) — We store the entire DP table.
This bottom-up approach builds the solution from smaller subproblems. The key transitions handle the `*` wildcard by considering both zero matches (skip) and one-or-more matches (consume and stay).
- approach_name: Recursion with Memoisation
is_optimal: true
code: |
def is_match(s: str, p: str) -> bool:
memo = {}
def dp(i: int, j: int) -> bool:
"""Check if s[i:] matches p[j:]"""
if (i, j) in memo:
return memo[(i, j)]
# Base case: pattern exhausted
if j == len(p):
return i == len(s)
# Check if first characters match
first_match = i < len(s) and (p[j] == s[i] or p[j] == '.')
# Handle star wildcard
if j + 1 < len(p) and p[j + 1] == '*':
# Option 1: skip x* (zero occurrences)
# Option 2: use x* (if first char matches, consume it)
result = dp(i, j + 2) or (first_match and dp(i + 1, j))
else:
# No star: must match current char and recurse
result = first_match and dp(i + 1, j + 1)
memo[(i, j)] = result
return result
return dp(0, 0)
explanation: |
**Time Complexity:** O(m * n) — Each unique `(i, j)` state is computed once and cached.
**Space Complexity:** O(m * n) — For the memoisation cache, plus O(m + n) recursion stack depth.
This top-down approach directly translates the recursive thinking. The memoisation dictionary prevents redundant computation of overlapping subproblems.
- approach_name: Recursion (Brute Force)
is_optimal: false
code: |
def is_match(s: str, p: str) -> bool:
def dp(i: int, j: int) -> bool:
"""Check if s[i:] matches p[j:]"""
# Base case: pattern exhausted
if j == len(p):
return i == len(s)
# Check if first characters match
first_match = i < len(s) and (p[j] == s[i] or p[j] == '.')
# Handle star wildcard
if j + 1 < len(p) and p[j + 1] == '*':
# Option 1: skip x* (zero occurrences)
# Option 2: use x* (if first char matches, consume it)
return dp(i, j + 2) or (first_match and dp(i + 1, j))
else:
# No star: must match current char and recurse
return first_match and dp(i + 1, j + 1)
return dp(0, 0)
explanation: |
**Time Complexity:** O(2^(m+n)) in the worst case — Without memoisation, the same subproblems are recomputed exponentially many times.
**Space Complexity:** O(m + n) — Recursion stack depth.
This naive recursive solution is correct but extremely slow. Patterns with many `*` wildcards cause exponential branching. Included to show why memoisation is essential.