title: Regular Expression Matching slug: regular-expression-matching difficulty: hard leetcode_id: 10 leetcode_url: https://leetcode.com/problems/regular-expression-matching/ categories: - strings - dynamic-programming - recursion patterns: - slug: dynamic-programming is_optimal: true function_signature: "def is_match(s: str, p: str) -> bool:" test_cases: visible: - input: { s: "aa", p: "a" } expected: false - input: { s: "aa", p: "a*" } expected: true - input: { s: "ab", p: ".*" } expected: true hidden: - input: { s: "a", p: "a" } expected: true - input: { s: "a", p: "." } expected: true - input: { s: "", p: "a*" } expected: true - input: { s: "aab", p: "c*a*b" } expected: true - input: { s: "mississippi", p: "mis*is*p*." } expected: false description: | Given an input string `s` and a pattern `p`, implement regular expression matching with support for `'.'` and `'*'` where: - `'.'` Matches any single character. - `'*'` Matches zero or more of the preceding element. The matching should cover the **entire** input string (not partial). constraints: | - `1 <= s.length <= 20` - `1 <= p.length <= 20` - `s` contains only lowercase English letters. - `p` contains only lowercase English letters, `'.'`, and `'*'`. - It is guaranteed for each appearance of the character `'*'`, there will be a previous valid character to match. examples: - input: 's = "aa", p = "a"' output: "false" explanation: '"a" does not match the entire string "aa".' - input: 's = "aa", p = "a*"' output: "true" explanation: '"*" means zero or more of the preceding element, "a". Therefore, by repeating "a" once, it becomes "aa".' - input: 's = "ab", p = ".*"' output: "true" explanation: '".*" means "zero or more (*) of any character (.)".' explanation: intuition: | Think of this problem as a **decision tree** where at each step you must decide how to match the current characters. The key insight is that the `'*'` wildcard creates **branching possibilities**: when you see a pattern like `a*`, you can either: 1. **Use zero occurrences** of `a` (skip `a*` entirely and move on in the pattern) 2. **Use one or more occurrences** of `a` (if the current string character matches, consume it and keep the `a*` available for more matches) This branching nature makes the problem a natural fit for **recursion** with **memoisation** (or bottom-up dynamic programming). Without memoisation, you'd repeatedly solve the same subproblems, leading to exponential time complexity. The `'.'` wildcard is simpler: it just matches any single character, so treat it as a "universal match" when comparing characters. The mental model is: "At each position, what are my options, and does *any* combination of choices lead to a full match?" approach: | We solve this using **Dynamic Programming** with a 2D table: **Step 1: Define the DP state** - `dp[i][j]`: Whether `s[0:i]` matches `p[0:j]` - Our answer will be `dp[len(s)][len(p)]`   **Step 2: Initialise the base cases** - `dp[0][0] = True`: Empty string matches empty pattern - `dp[0][j]`: Empty string can match patterns like `a*b*c*` where each `x*` uses zero occurrences - `dp[i][0] = False` for `i > 0`: Non-empty string cannot match empty pattern   **Step 3: Fill the DP table** For each cell `dp[i][j]`, we consider the current pattern character `p[j-1]`: - **Case 1: `p[j-1]` is `'*'`** (star wildcard) - *Option A*: Use zero occurrences of the preceding element: `dp[i][j] = dp[i][j-2]` - *Option B*: Use one or more occurrences (only if `s[i-1]` matches `p[j-2]`): `dp[i][j] = dp[i-1][j]` - We take the OR of both options - **Case 2: `p[j-1]` is `'.'` or a letter** - Check if `s[i-1]` matches `p[j-1]` (either same letter or `'.'`) - If match: `dp[i][j] = dp[i-1][j-1]` - If no match: `dp[i][j] = False`   **Step 4: Return the result** - Return `dp[len(s)][len(p)]` common_pitfalls: - title: Mishandling the Star Wildcard description: | The `'*'` doesn't stand alone; it modifies the **preceding character**. A common mistake is treating `*` as "match anything" like in shell globbing. In regex matching, `a*` means "zero or more `a`s", not "anything". The pattern `.*` means "zero or more of any character" because `.` matches any single character. Always process `*` together with its preceding character as a single unit. wrong_approach: "Treating * as an independent wildcard" correct_approach: "Process * with its preceding character as a unit" - title: Forgetting the Zero-Match Case description: | When you see `x*` in the pattern, you might only consider matching one or more `x`s. But `*` means **zero or more**, so you must also consider skipping `x*` entirely. For example, matching `s = "aab"` against `p = "c*a*b"`: - `c*` matches zero `c`s - `a*` matches two `a`s - `b` matches `b` Missing the zero-match case will cause incorrect results. wrong_approach: "Only considering one or more matches for x*" correct_approach: "Always consider both zero matches (skip) and one-or-more matches" - title: Incorrect Base Case for Empty String description: | An empty string `s` can still match certain patterns. For example: - `s = ""` matches `p = "a*"` (zero `a`s) - `s = ""` matches `p = "a*b*c*"` (zero of each) You must carefully initialise `dp[0][j]` by checking if `p[0:j]` can match an empty string. This happens when the pattern consists entirely of `x*` pairs. wrong_approach: "Assuming empty string only matches empty pattern" correct_approach: "Check if pattern can reduce to empty via x* zero-matches" - title: Off-by-One Errors in Indexing description: | The DP table has dimensions `(len(s)+1) x (len(p)+1)` to handle empty string/pattern cases. When accessing `s[i-1]` or `p[j-1]` from `dp[i][j]`, it's easy to make indexing mistakes. Be consistent: `dp[i][j]` represents matching `s[0:i]` with `p[0:j]`, so the "current" characters are `s[i-1]` and `p[j-1]`. key_takeaways: - "**DP on two sequences**: When matching/comparing two strings, think of a 2D DP table where `dp[i][j]` represents the answer for prefixes `s[0:i]` and `p[0:j]`" - "**Handle wildcards as units**: `*` modifies its preceding character; process them together" - "**Consider all branches**: The `*` creates branching (zero vs. one-or-more matches); use OR logic to combine possibilities" - "**Foundation for harder problems**: This pattern extends to wildcard matching, edit distance, and other two-string DP problems" time_complexity: "O(m * n). We fill a 2D table of size `(len(s)+1) x (len(p)+1)`, and each cell takes O(1) time." space_complexity: "O(m * n). We use a 2D DP table. This can be optimised to O(n) using rolling arrays since we only need the previous row." solutions: - approach_name: Dynamic Programming (Bottom-Up) is_optimal: true code: | def is_match(s: str, p: str) -> bool: m, n = len(s), len(p) # dp[i][j] = True if s[0:i] matches p[0:j] dp = [[False] * (n + 1) for _ in range(m + 1)] # Base case: empty string matches empty pattern dp[0][0] = True # Base case: empty string can match patterns like a*, a*b*, etc. for j in range(2, n + 1): # If current char is *, we can use zero occurrences of preceding char if p[j - 1] == '*': dp[0][j] = dp[0][j - 2] # Fill the DP table for i in range(1, m + 1): for j in range(1, n + 1): if p[j - 1] == '*': # Option 1: use zero occurrences of preceding element dp[i][j] = dp[i][j - 2] # Option 2: use one or more (if current char matches preceding pattern char) if p[j - 2] == '.' or p[j - 2] == s[i - 1]: dp[i][j] = dp[i][j] or dp[i - 1][j] elif p[j - 1] == '.' or p[j - 1] == s[i - 1]: # Direct match: current chars match dp[i][j] = dp[i - 1][j - 1] # else: dp[i][j] remains False (no match) return dp[m][n] explanation: | **Time Complexity:** O(m * n) — We fill each cell of the `(m+1) x (n+1)` table exactly once. **Space Complexity:** O(m * n) — We store the entire DP table. This bottom-up approach builds the solution from smaller subproblems. The key transitions handle the `*` wildcard by considering both zero matches (skip) and one-or-more matches (consume and stay). - approach_name: Recursion with Memoisation is_optimal: true code: | def is_match(s: str, p: str) -> bool: memo = {} def dp(i: int, j: int) -> bool: """Check if s[i:] matches p[j:]""" if (i, j) in memo: return memo[(i, j)] # Base case: pattern exhausted if j == len(p): return i == len(s) # Check if first characters match first_match = i < len(s) and (p[j] == s[i] or p[j] == '.') # Handle star wildcard if j + 1 < len(p) and p[j + 1] == '*': # Option 1: skip x* (zero occurrences) # Option 2: use x* (if first char matches, consume it) result = dp(i, j + 2) or (first_match and dp(i + 1, j)) else: # No star: must match current char and recurse result = first_match and dp(i + 1, j + 1) memo[(i, j)] = result return result return dp(0, 0) explanation: | **Time Complexity:** O(m * n) — Each unique `(i, j)` state is computed once and cached. **Space Complexity:** O(m * n) — For the memoisation cache, plus O(m + n) recursion stack depth. This top-down approach directly translates the recursive thinking. The memoisation dictionary prevents redundant computation of overlapping subproblems. - approach_name: Recursion (Brute Force) is_optimal: false code: | def is_match(s: str, p: str) -> bool: def dp(i: int, j: int) -> bool: """Check if s[i:] matches p[j:]""" # Base case: pattern exhausted if j == len(p): return i == len(s) # Check if first characters match first_match = i < len(s) and (p[j] == s[i] or p[j] == '.') # Handle star wildcard if j + 1 < len(p) and p[j + 1] == '*': # Option 1: skip x* (zero occurrences) # Option 2: use x* (if first char matches, consume it) return dp(i, j + 2) or (first_match and dp(i + 1, j)) else: # No star: must match current char and recurse return first_match and dp(i + 1, j + 1) return dp(0, 0) explanation: | **Time Complexity:** O(2^(m+n)) in the worst case — Without memoisation, the same subproblems are recomputed exponentially many times. **Space Complexity:** O(m + n) — Recursion stack depth. This naive recursive solution is correct but extremely slow. Patterns with many `*` wildcards cause exponential branching. Included to show why memoisation is essential.