questions M-R

2025-05-25 12:43:25 +01:00
parent 917c371529
commit 68699f35ec
62 changed files with 12841 additions and 0 deletions
--- a/backend/data/questions/regular-expression-matching.yaml
+++ b/backend/data/questions/regular-expression-matching.yaml
@@ -0,0 +1,248 @@
+title: Regular Expression Matching
+slug: regular-expression-matching
+difficulty: hard
+leetcode_id: 10
+leetcode_url: https://leetcode.com/problems/regular-expression-matching/
+categories:
+  - strings
+  - dynamic-programming
+  - recursion
+patterns:
+  - dynamic-programming
+
+description: |
+  Given an input string `s` and a pattern `p`, implement regular expression matching with support for `'.'` and `'*'` where:
+
+  - `'.'` Matches any single character.
+  - `'*'` Matches zero or more of the preceding element.
+
+  The matching should cover the **entire** input string (not partial).
+
+constraints: |
+  - `1 <= s.length <= 20`
+  - `1 <= p.length <= 20`
+  - `s` contains only lowercase English letters.
+  - `p` contains only lowercase English letters, `'.'`, and `'*'`.
+  - It is guaranteed for each appearance of the character `'*'`, there will be a previous valid character to match.
+
+examples:
+  - input: 's = "aa", p = "a"'
+    output: "false"
+    explanation: '"a" does not match the entire string "aa".'
+  - input: 's = "aa", p = "a*"'
+    output: "true"
+    explanation: '"*" means zero or more of the preceding element, "a". Therefore, by repeating "a" once, it becomes "aa".'
+  - input: 's = "ab", p = ".*"'
+    output: "true"
+    explanation: '".*" means "zero or more (*) of any character (.)".'
+
+explanation:
+  intuition: |
+    Think of this problem as a **decision tree** where at each step you must decide how to match the current characters.
+
+    The key insight is that the `'*'` wildcard creates **branching possibilities**: when you see a pattern like `a*`, you can either:
+    1. **Use zero occurrences** of `a` (skip `a*` entirely and move on in the pattern)
+    2. **Use one or more occurrences** of `a` (if the current string character matches, consume it and keep the `a*` available for more matches)
+
+    This branching nature makes the problem a natural fit for **recursion** with **memoisation** (or bottom-up dynamic programming). Without memoisation, you'd repeatedly solve the same subproblems, leading to exponential time complexity.
+
+    The `'.'` wildcard is simpler: it just matches any single character, so treat it as a "universal match" when comparing characters.
+
+    The mental model is: "At each position, what are my options, and does *any* combination of choices lead to a full match?"
+
+  approach: |
+    We solve this using **Dynamic Programming** with a 2D table:
+
+    **Step 1: Define the DP state**
+
+    - `dp[i][j]`: Whether `s[0:i]` matches `p[0:j]`
+    - Our answer will be `dp[len(s)][len(p)]`
+
+    &nbsp;
+
+    **Step 2: Initialise the base cases**
+
+    - `dp[0][0] = True`: Empty string matches empty pattern
+    - `dp[0][j]`: Empty string can match patterns like `a*b*c*` where each `x*` uses zero occurrences
+    - `dp[i][0] = False` for `i > 0`: Non-empty string cannot match empty pattern
+
+    &nbsp;
+
+    **Step 3: Fill the DP table**
+
+    For each cell `dp[i][j]`, we consider the current pattern character `p[j-1]`:
+
+    - **Case 1: `p[j-1]` is `'*'`** (star wildcard)
+      - *Option A*: Use zero occurrences of the preceding element: `dp[i][j] = dp[i][j-2]`
+      - *Option B*: Use one or more occurrences (only if `s[i-1]` matches `p[j-2]`): `dp[i][j] = dp[i-1][j]`
+      - We take the OR of both options
+
+    - **Case 2: `p[j-1]` is `'.'` or a letter**
+      - Check if `s[i-1]` matches `p[j-1]` (either same letter or `'.'`)
+      - If match: `dp[i][j] = dp[i-1][j-1]`
+      - If no match: `dp[i][j] = False`
+
+    &nbsp;
+
+    **Step 4: Return the result**
+
+    - Return `dp[len(s)][len(p)]`
+
+  common_pitfalls:
+    - title: Mishandling the Star Wildcard
+      description: |
+        The `'*'` doesn't stand alone; it modifies the **preceding character**. A common mistake is treating `*` as "match anything" like in shell globbing.
+
+        In regex matching, `a*` means "zero or more `a`s", not "anything". The pattern `.*` means "zero or more of any character" because `.` matches any single character.
+
+        Always process `*` together with its preceding character as a single unit.
+      wrong_approach: "Treating * as an independent wildcard"
+      correct_approach: "Process * with its preceding character as a unit"
+
+    - title: Forgetting the Zero-Match Case
+      description: |
+        When you see `x*` in the pattern, you might only consider matching one or more `x`s. But `*` means **zero or more**, so you must also consider skipping `x*` entirely.
+
+        For example, matching `s = "aab"` against `p = "c*a*b"`:
+        - `c*` matches zero `c`s
+        - `a*` matches two `a`s
+        - `b` matches `b`
+
+        Missing the zero-match case will cause incorrect results.
+      wrong_approach: "Only considering one or more matches for x*"
+      correct_approach: "Always consider both zero matches (skip) and one-or-more matches"
+
+    - title: Incorrect Base Case for Empty String
+      description: |
+        An empty string `s` can still match certain patterns. For example:
+        - `s = ""` matches `p = "a*"` (zero `a`s)
+        - `s = ""` matches `p = "a*b*c*"` (zero of each)
+
+        You must carefully initialise `dp[0][j]` by checking if `p[0:j]` can match an empty string. This happens when the pattern consists entirely of `x*` pairs.
+      wrong_approach: "Assuming empty string only matches empty pattern"
+      correct_approach: "Check if pattern can reduce to empty via x* zero-matches"
+
+    - title: Off-by-One Errors in Indexing
+      description: |
+        The DP table has dimensions `(len(s)+1) x (len(p)+1)` to handle empty string/pattern cases. When accessing `s[i-1]` or `p[j-1]` from `dp[i][j]`, it's easy to make indexing mistakes.
+
+        Be consistent: `dp[i][j]` represents matching `s[0:i]` with `p[0:j]`, so the "current" characters are `s[i-1]` and `p[j-1]`.
+
+  key_takeaways:
+    - "**DP on two sequences**: When matching/comparing two strings, think of a 2D DP table where `dp[i][j]` represents the answer for prefixes `s[0:i]` and `p[0:j]`"
+    - "**Handle wildcards as units**: `*` modifies its preceding character; process them together"
+    - "**Consider all branches**: The `*` creates branching (zero vs. one-or-more matches); use OR logic to combine possibilities"
+    - "**Foundation for harder problems**: This pattern extends to wildcard matching, edit distance, and other two-string DP problems"
+
+  time_complexity: "O(m * n). We fill a 2D table of size `(len(s)+1) x (len(p)+1)`, and each cell takes O(1) time."
+  space_complexity: "O(m * n). We use a 2D DP table. This can be optimised to O(n) using rolling arrays since we only need the previous row."
+
+solutions:
+  - approach_name: Dynamic Programming (Bottom-Up)
+    is_optimal: true
+    code: |
+      def is_match(s: str, p: str) -> bool:
+          m, n = len(s), len(p)
+          # dp[i][j] = True if s[0:i] matches p[0:j]
+          dp = [[False] * (n + 1) for _ in range(m + 1)]
+
+          # Base case: empty string matches empty pattern
+          dp[0][0] = True
+
+          # Base case: empty string can match patterns like a*, a*b*, etc.
+          for j in range(2, n + 1):
+              # If current char is *, we can use zero occurrences of preceding char
+              if p[j - 1] == '*':
+                  dp[0][j] = dp[0][j - 2]
+
+          # Fill the DP table
+          for i in range(1, m + 1):
+              for j in range(1, n + 1):
+                  if p[j - 1] == '*':
+                      # Option 1: use zero occurrences of preceding element
+                      dp[i][j] = dp[i][j - 2]
+
+                      # Option 2: use one or more (if current char matches preceding pattern char)
+                      if p[j - 2] == '.' or p[j - 2] == s[i - 1]:
+                          dp[i][j] = dp[i][j] or dp[i - 1][j]
+
+                  elif p[j - 1] == '.' or p[j - 1] == s[i - 1]:
+                      # Direct match: current chars match
+                      dp[i][j] = dp[i - 1][j - 1]
+                  # else: dp[i][j] remains False (no match)
+
+          return dp[m][n]
+    explanation: |
+      **Time Complexity:** O(m * n) — We fill each cell of the `(m+1) x (n+1)` table exactly once.
+
+      **Space Complexity:** O(m * n) — We store the entire DP table.
+
+      This bottom-up approach builds the solution from smaller subproblems. The key transitions handle the `*` wildcard by considering both zero matches (skip) and one-or-more matches (consume and stay).
+
+  - approach_name: Recursion with Memoisation
+    is_optimal: true
+    code: |
+      def is_match(s: str, p: str) -> bool:
+          memo = {}
+
+          def dp(i: int, j: int) -> bool:
+              """Check if s[i:] matches p[j:]"""
+              if (i, j) in memo:
+                  return memo[(i, j)]
+
+              # Base case: pattern exhausted
+              if j == len(p):
+                  return i == len(s)
+
+              # Check if first characters match
+              first_match = i < len(s) and (p[j] == s[i] or p[j] == '.')
+
+              # Handle star wildcard
+              if j + 1 < len(p) and p[j + 1] == '*':
+                  # Option 1: skip x* (zero occurrences)
+                  # Option 2: use x* (if first char matches, consume it)
+                  result = dp(i, j + 2) or (first_match and dp(i + 1, j))
+              else:
+                  # No star: must match current char and recurse
+                  result = first_match and dp(i + 1, j + 1)
+
+              memo[(i, j)] = result
+              return result
+
+          return dp(0, 0)
+    explanation: |
+      **Time Complexity:** O(m * n) — Each unique `(i, j)` state is computed once and cached.
+
+      **Space Complexity:** O(m * n) — For the memoisation cache, plus O(m + n) recursion stack depth.
+
+      This top-down approach directly translates the recursive thinking. The memoisation dictionary prevents redundant computation of overlapping subproblems.
+
+  - approach_name: Recursion (Brute Force)
+    is_optimal: false
+    code: |
+      def is_match(s: str, p: str) -> bool:
+          def dp(i: int, j: int) -> bool:
+              """Check if s[i:] matches p[j:]"""
+              # Base case: pattern exhausted
+              if j == len(p):
+                  return i == len(s)
+
+              # Check if first characters match
+              first_match = i < len(s) and (p[j] == s[i] or p[j] == '.')
+
+              # Handle star wildcard
+              if j + 1 < len(p) and p[j + 1] == '*':
+                  # Option 1: skip x* (zero occurrences)
+                  # Option 2: use x* (if first char matches, consume it)
+                  return dp(i, j + 2) or (first_match and dp(i + 1, j))
+              else:
+                  # No star: must match current char and recurse
+                  return first_match and dp(i + 1, j + 1)
+
+          return dp(0, 0)
+    explanation: |
+      **Time Complexity:** O(2^(m+n)) in the worst case — Without memoisation, the same subproblems are recomputed exponentially many times.
+
+      **Space Complexity:** O(m + n) — Recursion stack depth.
+
+      This naive recursive solution is correct but extremely slow. Patterns with many `*` wildcards cause exponential branching. Included to show why memoisation is essential.