diff --git a/backend/data/questions/coin-change.yaml b/backend/data/questions/coin-change.yaml index cf79b4a..a917376 100644 --- a/backend/data/questions/coin-change.yaml +++ b/backend/data/questions/coin-change.yaml @@ -10,90 +10,138 @@ patterns: - dynamic-programming description: | - You are given an integer array `coins` representing coins of different denominations and an - integer `amount` representing a total amount of money. + You are given an integer array `coins` representing coins of different denominations and an integer `amount` representing a total amount of money. - Return the fewest number of coins that you need to make up that amount. If that amount of - money cannot be made up by any combination of the coins, return -1. + Return *the fewest number of coins that you need to make up that amount*. If that amount of money cannot be made up by any combination of the coins, return `-1`. - You may assume that you have an infinite number of each kind of coin. + You may assume that you have an **infinite number** of each kind of coin. constraints: | - - 1 <= coins.length <= 12 - - 1 <= coins[i] <= 2^31 - 1 - - 0 <= amount <= 10^4 + - `1 <= coins.length <= 12` + - `1 <= coins[i] <= 2^31 - 1` + - `0 <= amount <= 10^4` examples: - input: "coins = [1,2,5], amount = 11" output: "3" - explanation: "11 = 5 + 5 + 1" + explanation: "11 = 5 + 5 + 1, using 3 coins." - input: "coins = [2], amount = 3" output: "-1" - explanation: "Cannot make amount 3 with only coin 2." + explanation: "Cannot make amount 3 with only coin denomination 2." - input: "coins = [1], amount = 0" output: "0" - explanation: "Amount 0 needs 0 coins." + explanation: "Amount 0 requires 0 coins." explanation: - approach: | - 1. Create a DP array where dp[i] = min coins needed for amount i - 2. Initialize dp[0] = 0 (zero coins for zero amount) - 3. For each amount from 1 to target, try each coin - 4. If coin <= current amount, dp[i] = min(dp[i], dp[i - coin] + 1) - 5. Return dp[amount] if valid, else -1 - intuition: | - This is the classic unbounded knapsack problem. For each amount, we ask: "What's the - minimum coins needed if I use coin c as the last coin?" + Imagine you're at a vending machine that gives change. You want to give the customer exactly 11 cents using the fewest coins possible from denominations [1, 2, 5]. How do you think about this? - If we use coin c last, we need 1 + dp[amount - c] coins. We try all possible "last coins" - and take the minimum. This optimal substructure makes it perfect for DP. + Think of it like this: if I knew the minimum coins needed for amounts 0 through 10, I could figure out amount 11 by asking: "What if I use a 1-cent coin last? A 2-cent? A 5-cent?" + + - Using 1-cent last: I need `coins(10) + 1` + - Using 2-cent last: I need `coins(9) + 1` + - Using 5-cent last: I need `coins(6) + 1` + + The answer is the **minimum** of these options. This is the **optimal substructure** that makes dynamic programming work. + + This is the classic **unbounded knapsack** pattern — "unbounded" because we can use each coin infinitely many times. + + approach: | + We solve this using **Bottom-Up Dynamic Programming**: + + **Step 1: Create and initialise the DP array** + + - Create `dp` of size `amount + 1`, where `dp[i]` = minimum coins for amount `i` + - Initialise all values to infinity (or `amount + 1`) — meaning "impossible so far" + - Set `dp[0] = 0` as the base case: zero coins needed for amount zero + +   + + **Step 2: Build up solutions for each amount** + + - For each amount `i` from 1 to `amount`: + - Try each coin denomination + - If `coin <= i` (coin fits), check if using this coin improves our answer: + - `dp[i] = min(dp[i], dp[i - coin] + 1)` + - The `+1` accounts for using this coin; `dp[i - coin]` is the subproblem + +   + + **Step 3: Return the answer** + + - If `dp[amount]` is still infinity, return `-1` (impossible) + - Otherwise, return `dp[amount]` + +   + + This builds solutions from smaller amounts to larger ones, ensuring we always have the subproblem solutions ready when we need them. common_pitfalls: - - title: Wrong initialization + - title: Wrong Initialisation description: | - Initialize dp array to infinity (or amount + 1), not 0. - dp[0] = 0 is the only base case. - wrong_approach: "Initializing all dp values to 0" + Initialising the DP array to `0` instead of infinity is a critical error. If `dp[5] = 0`, we'd incorrectly think amount 5 needs 0 coins! + + We use `float('inf')` (or `amount + 1` as a practical upper bound) to represent "not yet achievable". Only `dp[0] = 0` should start as zero. + wrong_approach: "dp = [0] * (amount + 1)" correct_approach: "dp = [float('inf')] * (amount + 1); dp[0] = 0" - - title: Not checking if subproblem is solvable + - title: Greedy Doesn't Work Here description: | - Before using dp[i - coin], ensure i >= coin and dp[i - coin] is valid. + It's tempting to always use the largest coin first (greedy). But this fails! - - title: Returning wrong value for impossible case + **Counterexample**: `coins = [1, 3, 4]`, `amount = 6` + - Greedy: `4 + 1 + 1 = 6` (3 coins) + - Optimal: `3 + 3 = 6` (2 coins) + + The greedy choice can block us from finding the actual minimum. + wrong_approach: "Always pick the largest coin that fits" + correct_approach: "Try all coins and take the minimum via DP" + + - title: Not Checking for Impossible Cases description: | - If dp[amount] is still infinity, return -1, not infinity. + If `dp[amount]` is still infinity after filling the table, no valid combination exists. Return `-1`, not infinity or some arbitrary value. + + This happens when no coin divides evenly into the required amounts (e.g., `coins = [2]`, `amount = 3`). + wrong_approach: "return dp[amount]" + correct_approach: "return dp[amount] if dp[amount] != float('inf') else -1" key_takeaways: - - Classic unbounded knapsack problem - - Bottom-up DP builds solution from smaller amounts - - Try each coin as the "last coin" for each amount - - Greedy doesn't work here (counterexample: coins=[1,3,4], amount=6) + - "**Unbounded knapsack pattern**: Each item (coin) can be used unlimited times — different from 0/1 knapsack" + - "**Greedy fails for coin change**: Classic counterexample shows why DP is necessary" + - "**Bottom-up builds confidence**: Solving smaller amounts first guarantees subproblems are ready" + - "**Foundation for variations**: This extends to counting combinations, finding exact change with fewest bills, etc." - time_complexity: "O(amount × coins)" - space_complexity: "O(amount)" - complexity_explanation: | - Time: For each amount (1 to target), we try each coin. - Space: DP array of size amount + 1. + time_complexity: "O(amount × n). For each amount from 1 to target, we try each of the n coins." + space_complexity: "O(amount). The DP array stores one value per amount from 0 to target." solutions: - - approach_name: Bottom-Up DP (Optimal) + - approach_name: Bottom-Up DP is_optimal: true code: | def coin_change(coins: list[int], amount: int) -> int: + # dp[i] = minimum coins needed for amount i + # Initialise to "impossible" (any value > amount works) dp = [float('inf')] * (amount + 1) + + # Base case: 0 coins needed for amount 0 dp[0] = 0 + # Build solutions from amount 1 to target for i in range(1, amount + 1): + # Try each coin as the "last coin" used for coin in coins: + # Can only use this coin if it fits and subproblem is solvable if coin <= i and dp[i - coin] != float('inf'): dp[i] = min(dp[i], dp[i - coin] + 1) + # Return result, or -1 if impossible return dp[amount] if dp[amount] != float('inf') else -1 explanation: | - Build up from amount 0. For each amount, try using each coin as the last coin. - Take the minimum of all valid options. + **Time Complexity:** O(amount × n) — For each of `amount` values, we check n coins. + + **Space Complexity:** O(amount) — DP array of size `amount + 1`. + + We build up from amount 0. For each amount, we try using each coin as the last coin and take the minimum. The key insight: if we know the minimum coins for smaller amounts, we can compute larger amounts by adding one coin at a time. - approach_name: BFS (Alternative) is_optimal: false @@ -104,6 +152,7 @@ solutions: if amount == 0: return 0 + # BFS: each "level" represents using one more coin visited = {0} queue = deque([(0, 0)]) # (current_sum, num_coins) @@ -112,14 +161,21 @@ solutions: for coin in coins: next_sum = current + coin + + # Found exact amount if next_sum == amount: return num_coins + 1 + + # Valid state we haven't seen if next_sum < amount and next_sum not in visited: visited.add(next_sum) queue.append((next_sum, num_coins + 1)) + # No valid combination found return -1 explanation: | - BFS finds shortest path in unweighted graph. - First time we reach 'amount' is the minimum coins. - Less space-efficient than DP for this problem. + **Time Complexity:** O(amount × n) — Similar to DP in the worst case. + + **Space Complexity:** O(amount) — Visited set can hold up to `amount` states. + + BFS naturally finds the shortest path (minimum coins) in an unweighted graph. Each state is a sum, and edges represent adding a coin. The first time we reach `amount`, we've found the minimum. Generally less efficient than DP for this problem due to queue overhead. diff --git a/backend/data/questions/container-with-most-water.yaml b/backend/data/questions/container-with-most-water.yaml index 724e32c..1d162cb 100644 --- a/backend/data/questions/container-with-most-water.yaml +++ b/backend/data/questions/container-with-most-water.yaml @@ -11,83 +11,126 @@ patterns: - greedy description: | - You are given an integer array `height` of length n. There are n vertical lines drawn such - that the two endpoints of the ith line are (i, 0) and (i, height[i]). + You are given an integer array `height` of length `n`. There are `n` vertical lines drawn such that the two endpoints of the ith line are `(i, 0)` and `(i, height[i])`. - Find two lines that together with the x-axis form a container, such that the container - contains the most water. + Find two lines that together with the x-axis form a container, such that the container contains the most water. - Return the maximum amount of water a container can store. + Return *the maximum amount of water a container can store*. + + **Note:** You may not slant the container. constraints: | - - n == height.length - - 2 <= n <= 10^5 - - 0 <= height[i] <= 10^4 + - `n == height.length` + - `2 <= n <= 10^5` + - `0 <= height[i] <= 10^4` examples: - input: "height = [1,8,6,2,5,4,8,3,7]" output: "49" - explanation: "Lines at index 1 (height 8) and 8 (height 7) form container with area 7 * 7 = 49." + explanation: "Lines at indices 1 (height 8) and 8 (height 7) form a container with area = min(8, 7) × (8 - 1) = 7 × 7 = 49." - input: "height = [1,1]" output: "1" - explanation: "Only container possible has area 1 * 1 = 1." + explanation: "The only possible container has area = min(1, 1) × 1 = 1." explanation: - approach: | - 1. Start with two pointers at the far left and far right - 2. Calculate the area formed by the two lines - 3. Move the pointer pointing to the shorter line inward - 4. Track maximum area seen - 5. Continue until pointers meet - intuition: | - Area = width × height. The height is limited by the shorter line. + Visualise the problem: you have vertical lines of varying heights, and you want to trap the most water between two of them. The water level is limited by the **shorter** line (water would spill over), and the width is the distance between the lines. - Starting with maximum width (endpoints), we can only improve by finding taller lines. - If we move the taller pointer, width decreases and height can't increase beyond the - shorter line — so area can only decrease or stay the same. + Think of it like this: `Area = width × min(left_height, right_height)` - Moving the shorter pointer gives us a chance to find a taller line that could - increase the height enough to compensate for the reduced width. + If we start with pointers at both ends, we have **maximum width**. The only way to potentially increase area is to find taller lines. But here's the key insight: + + If `height[left] < height[right]`, the current area is limited by `height[left]`. Moving `right` inward would: + - **Decrease** width (always) + - Keep height limited by `height[left]` (at best) or make it smaller + + So moving the **taller** pointer can never improve the area! We should always move the **shorter** pointer, hoping to find a taller line that compensates for the reduced width. + + This greedy choice is provably optimal because we're eliminating pairs that cannot possibly be better than what we've already found. + + approach: | + We solve this using **Two Pointers with Greedy Movement**: + + **Step 1: Initialise pointers and tracking variable** + + - `left = 0` (start of array) + - `right = len(height) - 1` (end of array) + - `max_water = 0` to track the best area found + +   + + **Step 2: Calculate area and move pointers** + + - While `left < right`: + - Calculate current area: `width × min(height[left], height[right])` + - Update `max_water` if current area is larger + - Move the pointer pointing to the **shorter** line inward: + - If `height[left] < height[right]`: increment `left` + - Otherwise: decrement `right` + +   + + **Step 3: Return the maximum** + + - After pointers meet, return `max_water` + +   + + This works because moving the shorter pointer gives us the only chance to find a better solution. Moving the taller pointer cannot improve the area — it's mathematically impossible. common_pitfalls: - - title: Moving the wrong pointer + - title: Moving the Wrong Pointer description: | - Always move the pointer pointing to the shorter line. Moving the taller one - cannot possibly increase the area since height is constrained by the shorter. - wrong_approach: "Moving left pointer always or randomly" - correct_approach: "Move the pointer with smaller height[i]" + The algorithm only works if you move the **shorter** pointer. Moving the taller one (or moving randomly) breaks the correctness proof. - - title: Using nested loops + Why? If `height[left] < height[right]`, the area is constrained by `height[left]`. Any container with `left` as one boundary cannot have more water than we've already calculated (width would be smaller, height at most `height[left]`). + + Moving `left` eliminates these inferior possibilities and might find a taller left boundary. + wrong_approach: "Always move left pointer, or move randomly" + correct_approach: "Always move the pointer with the smaller height" + + - title: Using Nested Loops (Brute Force) description: | - O(n²) brute force is unnecessary. Two pointers achieve O(n) by eliminating - suboptimal pairs without checking them. + Checking all O(n²) pairs works but is far too slow for `n = 10^5` (10 billion operations). + + The two-pointer approach achieves O(n) by intelligently eliminating pairs that cannot be optimal. Each pointer moves at most n times total. + wrong_approach: "for i in range(n): for j in range(i+1, n): ..." + correct_approach: "Two pointers moving inward based on height comparison" + + - title: Forgetting to Update Maximum + description: | + Calculate the area **before** moving the pointer, and always update `max_water`. It's easy to forget one of these steps and miss the optimal answer. + wrong_approach: "Moving pointer before calculating area" + correct_approach: "Calculate area first, update max, then move pointer" key_takeaways: - - Two pointers from opposite ends is powerful for optimization - - Moving the limiting factor gives the best chance of improvement - - Greedy choice (move shorter) is provably optimal here - - Width × min(heights) is the key formula + - "**Two pointers from opposite ends**: A powerful pattern for optimization on sorted/indexed data" + - "**Move the limiting factor**: When one element constrains the result, changing it gives the best chance of improvement" + - "**Greedy with proof**: This isn't just heuristic — moving the shorter pointer is provably optimal" + - "**O(n) vs O(n²)**: Two pointers can eliminate the need for nested loops when the problem has the right structure" - time_complexity: "O(n)" - space_complexity: "O(1)" - complexity_explanation: | - Time: Each pointer moves at most n times total. - Space: Only a few variables for pointers and max area. + time_complexity: "O(n). Each pointer moves at most n times, and we do O(1) work per step." + space_complexity: "O(1). Only a few variables for pointers and the maximum area." solutions: - - approach_name: Two Pointers (Optimal) + - approach_name: Two Pointers is_optimal: true code: | def max_area(height: list[int]) -> int: - left, right = 0, len(height) - 1 + left = 0 + right = len(height) - 1 max_water = 0 while left < right: + # Calculate width and height of current container width = right - left h = min(height[left], height[right]) + + # Update maximum if this container is larger max_water = max(max_water, width * h) + # Move the pointer pointing to the shorter line + # (moving the taller one can't improve the result) if height[left] < height[right]: left += 1 else: @@ -95,5 +138,8 @@ solutions: return max_water explanation: | - Start from both ends and move the shorter line inward. - Track maximum area found during traversal. + **Time Complexity:** O(n) — Each pointer moves at most n positions total. + + **Space Complexity:** O(1) — Only constant extra space used. + + We start with maximum width and greedily try to find taller lines. By always moving the shorter pointer, we ensure we don't miss any potentially better containers. The proof: any container involving the shorter line at its current position has already been considered (or would have less area due to reduced width). diff --git a/backend/data/questions/longest-substring-without-repeating.yaml b/backend/data/questions/longest-substring-without-repeating.yaml index da6bf66..1b52d46 100644 --- a/backend/data/questions/longest-substring-without-repeating.yaml +++ b/backend/data/questions/longest-substring-without-repeating.yaml @@ -10,11 +10,11 @@ patterns: - sliding-window description: | - Given a string `s`, find the length of the longest substring without repeating characters. + Given a string `s`, find the length of the **longest substring** without repeating characters. constraints: | - - 0 <= s.length <= 5 * 10^4 - - s consists of English letters, digits, symbols and spaces + - `0 <= s.length <= 5 × 10^4` + - `s` consists of English letters, digits, symbols and spaces examples: - input: 's = "abcabcbb"' @@ -25,90 +25,146 @@ examples: explanation: "The answer is 'b', with length 1." - input: 's = "pwwkew"' output: "3" - explanation: "The answer is 'wke', with length 3." + explanation: "The answer is 'wke', with length 3. Note that 'pwke' is a subsequence, not a substring." explanation: - approach: | - 1. Use a sliding window with left and right pointers - 2. Maintain a set of characters in the current window - 3. Expand right pointer, adding characters to the set - 4. When a duplicate is found, shrink from left until duplicate is removed - 5. Track maximum window size throughout - intuition: | - We're looking for the longest contiguous substring where all characters are unique. - A sliding window naturally represents a substring. + Imagine a window sliding across the string. The window represents our current substring candidate. We want to expand this window as much as possible while keeping all characters inside it unique. - When we encounter a duplicate, the current window is invalid. Instead of restarting - from scratch, we shrink the window from the left until the duplicate is removed. - This way, we never revisit characters unnecessarily. + Think of it like this: you're scanning through a document with a highlighter. You want to find the longest stretch you can highlight where no letter appears twice. When you hit a repeat, you need to move the start of your highlight forward until the duplicate is gone. + + The key insight is that we don't need to restart from scratch when we find a duplicate. If we've seen 'a' before at position 3, and we see 'a' again at position 7, we just need to move our window's left edge past position 3. Everything between positions 4 and 7 might still be valid! + + This is the **sliding window** pattern: expand the right edge to explore, contract the left edge to maintain validity. + + approach: | + We solve this using a **Sliding Window with a Set**: + + **Step 1: Initialise the window and tracking** + + - `left = 0`: Left edge of our window + - `char_set = set()`: Characters currently in our window + - `max_length = 0`: Best length found so far + - The right edge is controlled by our loop iteration + +   + + **Step 2: Expand the window (right pointer)** + + - For each character at position `right`: + - If `s[right]` is already in `char_set`, we have a duplicate + - Before adding it, we must shrink from the left until the duplicate is removed + +   + + **Step 3: Shrink the window (left pointer)** + + - While `s[right]` is in `char_set`: + - Remove `s[left]` from the set + - Increment `left` + - This "slides" the window past the previous occurrence + +   + + **Step 4: Add current character and update maximum** + + - Add `s[right]` to `char_set` + - Update `max_length = max(max_length, right - left + 1)` + +   + + The window always contains unique characters, and we track the maximum size it achieves. common_pitfalls: - - title: Resetting the window incorrectly + - title: Resetting the Window Completely on Duplicate description: | - When finding a duplicate, don't start over from the next character. - Instead, shrink from the left until the duplicate is removed. + A common mistake is to set `left = right` when finding a duplicate, effectively restarting the search. This loses valid characters that could still be part of a longer substring. + + For example, in `"abcdb"`, when we hit the second `'b'`, we should move `left` from 0 to 2 (just past the first `'b'`), keeping `"cd"` in our window. Resetting to `left = right` would discard `"cd"` unnecessarily. wrong_approach: "left = right when duplicate found" - correct_approach: "Increment left until duplicate is out of window" + correct_approach: "Increment left until duplicate is removed from window" - - title: Not handling empty string + - title: Off-by-One in Length Calculation description: | - An empty string should return 0. Make sure the algorithm handles this. + The length of a window from index `left` to `right` (inclusive) is `right - left + 1`, not `right - left`. - - title: Off-by-one in length calculation + For `left = 2, right = 5`, the substring has 4 characters (indices 2, 3, 4, 5), not 3. + wrong_approach: "max_length = max(max_length, right - left)" + correct_approach: "max_length = max(max_length, right - left + 1)" + + - title: Not Handling Empty String description: | - Window length is right - left + 1, or just track max before removing. + An empty string `""` should return `0`. The algorithm handles this naturally (the loop never executes), but it's worth verifying. + + Similarly, a single character `"a"` should return `1`. + wrong_approach: "Assuming string has at least one character" + correct_approach: "Algorithm works for empty strings — returns 0" key_takeaways: - - Sliding window is ideal for substring problems with constraints - - Use a set or map to track elements in current window - - Shrinking from one end maintains the contiguous property - - This pattern appears in many "longest/shortest with constraint" problems + - "**Sliding window for substrings**: When looking for contiguous sequences with constraints, sliding window is often the answer" + - "**Expand and contract**: Right pointer explores, left pointer maintains validity" + - "**Set for uniqueness checking**: O(1) membership testing makes the algorithm efficient" + - "**Optimisation with hash map**: Store the last index of each character to jump `left` directly instead of incrementing" - time_complexity: "O(n)" - space_complexity: "O(min(m, n))" - complexity_explanation: | - Time: Each character is visited at most twice (once by right, once by left). - Space: The set holds at most min(n, m) characters where m is the charset size. + time_complexity: "O(n). Each character is visited at most twice — once by the right pointer, once by the left pointer." + space_complexity: "O(min(m, n)). The set holds at most min(n, m) characters, where m is the character set size (e.g., 26 for lowercase letters, 128 for ASCII)." solutions: - - approach_name: Sliding Window with Set (Optimal) + - approach_name: Sliding Window with Set is_optimal: true code: | def length_of_longest_substring(s: str) -> int: + # Set to track characters in current window char_set = set() left = 0 max_length = 0 + # Right pointer expands the window for right in range(len(s)): + # Shrink window until duplicate is removed while s[right] in char_set: char_set.remove(s[left]) left += 1 + # Add current character to window char_set.add(s[right]) + + # Update maximum length max_length = max(max_length, right - left + 1) return max_length explanation: | - Expand window by moving right pointer. - When duplicate found, shrink from left until window is valid again. + **Time Complexity:** O(n) — Each character added and removed from set at most once. - - approach_name: Optimized with Hash Map + **Space Complexity:** O(min(m, n)) — Set holds unique characters in window. + + We maintain a sliding window containing only unique characters. When we encounter a duplicate, we shrink from the left until it's removed. The window size at each step represents a valid substring length. + + - approach_name: Optimised with Hash Map is_optimal: true code: | def length_of_longest_substring(s: str) -> int: + # Map character to its most recent index char_index = {} left = 0 max_length = 0 for right, char in enumerate(s): + # If char seen before AND within current window if char in char_index and char_index[char] >= left: + # Jump left pointer past the previous occurrence left = char_index[char] + 1 + # Update character's latest index char_index[char] = right + + # Update maximum length max_length = max(max_length, right - left + 1) return max_length explanation: | - Store the last index of each character. - Jump left pointer directly past the duplicate instead of shrinking one by one. + **Time Complexity:** O(n) — Single pass through the string. + + **Space Complexity:** O(min(m, n)) — Hash map stores character indices. + + Instead of shrinking the window one character at a time, we store each character's last index. When we find a duplicate, we jump `left` directly past the previous occurrence. The condition `char_index[char] >= left` ensures we only consider duplicates within the current window (old occurrences outside the window are ignored). diff --git a/backend/data/questions/median-of-two-sorted-arrays.yaml b/backend/data/questions/median-of-two-sorted-arrays.yaml index c3813f2..c82d6fd 100644 --- a/backend/data/questions/median-of-two-sorted-arrays.yaml +++ b/backend/data/questions/median-of-two-sorted-arrays.yaml @@ -10,18 +10,17 @@ patterns: - binary-search description: | - Given two sorted arrays `nums1` and `nums2` of size m and n respectively, return the median - of the two sorted arrays. + Given two sorted arrays `nums1` and `nums2` of size `m` and `n` respectively, return **the median** of the two sorted arrays. - The overall run time complexity should be O(log(m+n)). + The overall run time complexity should be **O(log(m+n))**. constraints: | - - nums1.length == m - - nums2.length == n - - 0 <= m <= 1000 - - 0 <= n <= 1000 - - 1 <= m + n <= 2000 - - -10^6 <= nums1[i], nums2[i] <= 10^6 + - `nums1.length == m` + - `nums2.length == n` + - `0 <= m <= 1000` + - `0 <= n <= 1000` + - `1 <= m + n <= 2000` + - `-10^6 <= nums1[i], nums2[i] <= 10^6` examples: - input: "nums1 = [1,3], nums2 = [2]" @@ -32,91 +31,138 @@ examples: explanation: "Merged array is [1,2,3,4]. Median is (2+3)/2 = 2.5." explanation: - approach: | - 1. Binary search on the smaller array for partition point - 2. Partition both arrays such that left half has (m+n+1)//2 elements - 3. Check if partition is valid: max(left) <= min(right) - 4. If valid, compute median from boundary elements - 5. Adjust binary search bounds based on comparison - intuition: | - The median divides the combined array into two halves of equal size. We don't need to - actually merge; we just need to find the correct partition. + The median divides a sorted array into two equal halves. For two sorted arrays, we need to find a **partition** that puts exactly half the total elements on the left and half on the right. - If we choose i elements from nums1 for the left half, we need (m+n+1)//2 - i from nums2. - Binary search on i (0 to m) to find where nums1[i-1] <= nums2[j] and nums2[j-1] <= nums1[i]. + Think of it like this: imagine cutting both arrays with vertical lines. If we take `i` elements from `nums1` and `j` elements from `nums2` for the "left half", we need `i + j = (m + n + 1) // 2`. For this partition to be valid: + - Everything in the left half ≤ Everything in the right half - This is O(log min(m,n)) since we binary search on the smaller array. + The key insight: once we choose `i` (how many from `nums1`), `j` is determined. So we **binary search on `i`**! + + For a valid partition: + - `nums1[i-1] <= nums2[j]` (left of nums1 ≤ right of nums2) + - `nums2[j-1] <= nums1[i]` (left of nums2 ≤ right of nums1) + + If not valid, adjust `i`: if `nums1[i-1] > nums2[j]`, we took too many from nums1 — decrease `i`. + + approach: | + We solve this using **Binary Search on Partition**: + + **Step 1: Ensure nums1 is the smaller array** + + - If `m > n`, swap the arrays + - This guarantees a valid `j` always exists and improves efficiency + +   + + **Step 2: Binary search for the correct partition** + + - Search for `i` in range `[0, m]` (elements taken from nums1) + - Calculate `j = half_len - i` where `half_len = (m + n + 1) // 2` + - For each `i`, check if partition is valid + +   + + **Step 3: Handle boundary cases with infinity** + + - If `i = 0`, there's no left element in nums1 → use `-infinity` + - If `i = m`, there's no right element in nums1 → use `+infinity` + - Same for `j = 0` and `j = n` in nums2 + +   + + **Step 4: Compute the median** + + - If partition is valid: + - **Odd total**: median = `max(left1, left2)` + - **Even total**: median = `(max(left1, left2) + min(right1, right2)) / 2` + - If not valid, adjust binary search bounds + +   + + The median is formed by the boundary elements at the valid partition. common_pitfalls: - - title: Not handling edge cases at partition + - title: Not Handling Boundary Cases description: | - When partition is at array boundary (i=0 or i=m), use -inf or inf for boundary values. - wrong_approach: "Accessing nums1[i-1] when i=0" - correct_approach: "Use float('-inf') if i == 0" + When `i = 0` or `i = m`, there's no left or right element in nums1. Accessing `nums1[i-1]` or `nums1[i]` would be out of bounds. - - title: Binary searching on the longer array - description: | - Always binary search on the shorter array to ensure valid partition exists - and for better efficiency. + Use `float('-inf')` for missing left elements and `float('inf')` for missing right elements. This ensures comparisons always work correctly. + wrong_approach: "Accessing nums1[i-1] when i = 0" + correct_approach: "nums1_left = float('-inf') if i == 0 else nums1[i-1]" - - title: Odd vs even total length + - title: Binary Searching on the Longer Array description: | - For odd total, median is max of left half. - For even, it's average of max(left) and min(right). + Always search on the shorter array. If `m > n` and we search on nums1, `j = half_len - i` might become negative (invalid). + + Swapping ensures `j` is always valid: `0 <= j <= n`. + wrong_approach: "Binary searching on the longer array" + correct_approach: "if m > n: swap arrays, then binary search on the shorter one" + + - title: Odd vs Even Total Length + description: | + For **odd** total `(m + n)`: the median is a single value — `max(left1, left2)`. + For **even** total: the median is the average of two middle values. + + Getting this wrong produces incorrect results for half the test cases. + wrong_approach: "Always averaging two values" + correct_approach: "Check (m + n) % 2 and handle odd/even separately" key_takeaways: - - Binary search on partition, not on values - - Partition both arrays to have equal halves - - Handle boundary conditions with infinity - - O(log min(m,n)) is achievable + - "**Binary search on partition, not values**: Search for how many elements to take from nums1" + - "**Partition both arrays to split total elements in half**: Once we choose `i`, `j` is determined" + - "**Handle boundaries with infinity**: Prevents index errors at array edges" + - "**O(log min(m,n))**: Binary search on the smaller array is sufficient" - time_complexity: "O(log min(m,n))" - space_complexity: "O(1)" - complexity_explanation: | - Time: Binary search on the smaller array. - Space: Only constant extra variables. + time_complexity: "O(log min(m, n)). Binary search on the smaller array." + space_complexity: "O(1). Only constant extra variables for pointers and boundary values." solutions: - - approach_name: Binary Search on Partition (Optimal) + - approach_name: Binary Search on Partition is_optimal: true code: | def find_median_sorted_arrays(nums1: list[int], nums2: list[int]) -> float: - # Ensure nums1 is the smaller array + # Ensure nums1 is the smaller array for valid j values if len(nums1) > len(nums2): nums1, nums2 = nums2, nums1 m, n = len(nums1), len(nums2) - left, right = 0, m - half_len = (m + n + 1) // 2 + half_len = (m + n + 1) // 2 # Size of left half (ceiling for odd total) + + left, right = 0, m # Binary search bounds for i while left <= right: - i = (left + right) // 2 # Partition in nums1 - j = half_len - i # Partition in nums2 + i = (left + right) // 2 # Elements from nums1 in left half + j = half_len - i # Elements from nums2 in left half - # Handle edge cases with infinity + # Handle boundary cases with infinity nums1_left = float('-inf') if i == 0 else nums1[i - 1] nums1_right = float('inf') if i == m else nums1[i] nums2_left = float('-inf') if j == 0 else nums2[j - 1] nums2_right = float('inf') if j == n else nums2[j] + # Check if partition is valid if nums1_left <= nums2_right and nums2_left <= nums1_right: - # Found valid partition + # Valid partition found — compute median if (m + n) % 2 == 1: + # Odd total: median is max of left half return max(nums1_left, nums2_left) else: + # Even total: median is average of middle two return (max(nums1_left, nums2_left) + min(nums1_right, nums2_right)) / 2 elif nums1_left > nums2_right: - # Too many elements from nums1 in left half + # Too many from nums1, decrease i right = i - 1 else: - # Too few elements from nums1 in left half + # Too few from nums1, increase i left = i + 1 - return 0.0 # Should never reach here + return 0.0 # Should never reach here with valid input explanation: | - Binary search to find correct partition point in the smaller array. - Partition is valid when all left elements <= all right elements. - Compute median from the four boundary elements. + **Time Complexity:** O(log min(m, n)) — Binary search on the smaller array. + + **Space Complexity:** O(1) — Only constant extra variables. + + We binary search for the correct partition point in the smaller array. A valid partition has all left elements ≤ all right elements. Once found, the median is computed from the four boundary elements: max of left side for odd totals, average of max-left and min-right for even totals. diff --git a/backend/data/questions/merge-k-sorted-lists.yaml b/backend/data/questions/merge-k-sorted-lists.yaml index d0ba9a1..734a3fd 100644 --- a/backend/data/questions/merge-k-sorted-lists.yaml +++ b/backend/data/questions/merge-k-sorted-lists.yaml @@ -10,22 +10,22 @@ patterns: - heap description: | - You are given an array of k linked-lists `lists`, each linked-list is sorted in ascending order. + You are given an array of `k` linked-lists `lists`, each linked-list is sorted in **ascending order**. Merge all the linked-lists into one sorted linked-list and return it. constraints: | - - k == lists.length - - 0 <= k <= 10^4 - - 0 <= lists[i].length <= 500 - - -10^4 <= lists[i][j] <= 10^4 - - lists[i] is sorted in ascending order - - The sum of lists[i].length will not exceed 10^4 + - `k == lists.length` + - `0 <= k <= 10^4` + - `0 <= lists[i].length <= 500` + - `-10^4 <= lists[i][j] <= 10^4` + - `lists[i]` is sorted in **ascending order** + - The sum of `lists[i].length` will not exceed `10^4` examples: - input: "lists = [[1,4,5],[1,3,4],[2,6]]" output: "[1,1,2,3,4,4,5,6]" - explanation: "Merge three sorted lists into one." + explanation: "Merge three sorted lists: [1,4,5] + [1,3,4] + [2,6] = [1,1,2,3,4,4,5,6]" - input: "lists = []" output: "[]" explanation: "No lists to merge." @@ -34,50 +34,84 @@ examples: explanation: "Single empty list." explanation: - approach: | - 1. Use a min-heap to track the smallest element among all list heads - 2. Add the first node from each non-empty list to the heap - 3. Pop the smallest node, add it to the result - 4. If that node has a next, add it to the heap - 5. Continue until heap is empty - intuition: | - At each step, we need to find the minimum among k candidates (the heads of each list). - A min-heap gives us this minimum in O(log k) time. + Imagine you have k piles of sorted cards, and you want to combine them into one sorted pile. At each step, you need to pick the smallest card among all the piles' top cards. - Since each node is pushed and popped from the heap exactly once, and we have N total nodes, - the overall complexity is O(N log k). + The naive approach — scan all k top cards each time — takes O(k) per pick. With N total cards, that's O(N × k). + + Think of it like this: we need a data structure that efficiently gives us the minimum among k elements and lets us update when we take one. A **min-heap** does exactly this in O(log k) time! + + The algorithm: + 1. Add the head of each list to a min-heap + 2. Pop the smallest node, add it to the result + 3. If that node has a next, push it to the heap + 4. Repeat until the heap is empty + + Each node is pushed and popped exactly once, giving O(N log k) total. + + approach: | + We solve this using a **Min-Heap (Priority Queue)**: + + **Step 1: Initialise the heap** + + - Create an empty min-heap + - Add the head node of each non-empty list + - Use `(node.val, index, node)` tuples for proper ordering (index breaks ties) + +   + + **Step 2: Build the result list** + + - Create a `dummy` node for easy construction + - While the heap is not empty: + - Pop the minimum node + - Append it to the result list + - If the popped node has a `.next`, push that next node to the heap + +   + + **Step 3: Return the merged list** + + - Return `dummy.next` + +   + + The heap always contains at most k nodes (one from each list), so each push/pop is O(log k). With N total nodes, total time is O(N log k). common_pitfalls: - - title: Not handling empty lists + - title: Not Handling Empty Lists description: | - Some lists might be empty (null). Filter them out or check before adding to heap. - wrong_approach: "Adding null to heap" - correct_approach: "if node: heappush(...)" + Some lists in the input might be empty (`None`). Adding `None` to the heap will cause errors. - - title: Heap comparison with ListNode - description: | - Python's heapq can't compare ListNode objects directly. - Either use a tuple (value, index, node) or define __lt__ on ListNode. + Filter out empty lists or check before pushing: `if node: heappush(...)`. + wrong_approach: "heappush(heap, (node.val, i, node)) without checking" + correct_approach: "if node: heappush(heap, (node.val, i, node))" - - title: Not advancing the list pointer + - title: Python Heap Comparison with ListNode description: | - After adding a node to result, push its next node to the heap, not the same node. + Python's `heapq` compares tuples element by element. If two nodes have the same value, it tries to compare the nodes themselves, which fails (no `__lt__` defined). + + Use `(value, index, node)` tuples where `index` is unique — this breaks ties deterministically. + wrong_approach: "(node.val, node) — fails when values are equal" + correct_approach: "(node.val, unique_index, node)" + + - title: Not Advancing to the Next Node + description: | + After popping a node and adding it to the result, push its **next** node to the heap, not the same node. Otherwise, you'll process the same node forever. + wrong_approach: "heappush(heap, (node.val, i, node)) after popping" + correct_approach: "heappush(heap, (node.next.val, i, node.next))" key_takeaways: - - Min-heap efficiently finds minimum among k elements - - This is a k-way merge algorithm - - Total work is O(N log k) where N is total nodes - - Same pattern works for merging k sorted arrays + - "**Min-heap for k-way merge**: Efficiently find the minimum among k candidates in O(log k)" + - "**This is the merge step of external merge sort**: Same pattern for merging k sorted files" + - "**Total time O(N log k)**: Each of N nodes is pushed/popped once, each operation is O(log k)" + - "**Divide and conquer alternative**: Pair up lists and merge, reducing k by half each round — same complexity" - time_complexity: "O(N log k)" - space_complexity: "O(k)" - complexity_explanation: | - Time: Each of N nodes is pushed and popped once, each operation is O(log k). - Space: Heap holds at most k nodes at any time. + time_complexity: "O(N log k). Each of the N total nodes is pushed and popped from the heap once, and each heap operation is O(log k)." + space_complexity: "O(k). The heap holds at most k nodes at any time (one from each list)." solutions: - - approach_name: Min-Heap (Optimal) + - approach_name: Min-Heap is_optimal: true code: | import heapq @@ -90,7 +124,8 @@ solutions: def merge_k_lists(lists: list[ListNode | None]) -> ListNode | None: heap = [] - # Add first node from each list with index for tie-breaking + # Add first node from each non-empty list + # Use index for tie-breaking (avoids comparing ListNode objects) for i, node in enumerate(lists): if node: heapq.heappush(heap, (node.val, i, node)) @@ -99,18 +134,24 @@ solutions: current = dummy while heap: + # Pop the smallest node val, i, node = heapq.heappop(heap) + + # Add to result list current.next = node current = current.next + # Push next node from the same list if node.next: heapq.heappush(heap, (node.next.val, i, node.next)) return dummy.next explanation: | - Use heap to always get the smallest current head. - Push next node when popping to maintain k candidates. - Index in tuple handles equal values (tie-breaking). + **Time Complexity:** O(N log k) — N nodes, each with O(log k) heap operations. + + **Space Complexity:** O(k) — Heap holds at most k nodes. + + The min-heap always contains the smallest unprocessed node from each list. We pop the minimum, add it to our result, and push its successor. The index in the tuple provides stable tie-breaking for equal values. - approach_name: Divide and Conquer is_optimal: true @@ -120,6 +161,7 @@ solutions: return None def merge_two(l1: ListNode | None, l2: ListNode | None) -> ListNode | None: + """Merge two sorted lists into one.""" dummy = ListNode() current = dummy @@ -135,6 +177,7 @@ solutions: current.next = l1 or l2 return dummy.next + # Repeatedly merge pairs until one list remains while len(lists) > 1: merged = [] for i in range(0, len(lists), 2): @@ -145,5 +188,8 @@ solutions: return lists[0] explanation: | - Pair up lists and merge, reducing k to k/2 each round. - Same complexity as heap approach but iterative merge logic. + **Time Complexity:** O(N log k) — log k rounds, each processing N nodes total. + + **Space Complexity:** O(1) — Aside from the merged lists (reusing nodes). + + Pair up lists and merge each pair, reducing k to k/2 each round. After log k rounds, one list remains. Each round processes all N nodes, giving O(N log k) total — same as the heap approach but without extra heap space. diff --git a/backend/data/questions/merge-two-sorted-lists.yaml b/backend/data/questions/merge-two-sorted-lists.yaml index 0910cd3..bfa0097 100644 --- a/backend/data/questions/merge-two-sorted-lists.yaml +++ b/backend/data/questions/merge-two-sorted-lists.yaml @@ -12,15 +12,14 @@ patterns: description: | You are given the heads of two sorted linked lists `list1` and `list2`. - Merge the two lists into one sorted list. The list should be made by splicing together - the nodes of the first two lists. + Merge the two lists into one **sorted** list. The list should be made by splicing together the nodes of the first two lists. - Return the head of the merged linked list. + Return *the head of the merged linked list*. constraints: | - - The number of nodes in both lists is in the range [0, 50]. - - -100 <= Node.val <= 100 - - Both list1 and list2 are sorted in non-decreasing order. + - The number of nodes in both lists is in the range `[0, 50]` + - `-100 <= Node.val <= 100` + - Both `list1` and `list2` are sorted in **non-decreasing** order examples: - input: "list1 = [1,2,4], list2 = [1,3,4]" @@ -34,61 +33,83 @@ examples: explanation: "One list empty, return the other." explanation: - approach: | - 1. Create a dummy node to simplify edge cases (avoids special handling for the head) - 2. Use a current pointer starting at the dummy node - 3. While both lists have nodes: - - Compare the values at the heads of both lists - - Attach the smaller node to current.next - - Advance the pointer of the list we took from - - Advance current to the newly attached node - 4. Attach any remaining nodes from the non-empty list - 5. Return dummy.next (the actual head of the merged list) - intuition: | - Since both lists are already sorted, we can build the merged list by repeatedly taking - the smaller of the two current heads. This is the merge step from merge sort. + Imagine you have two sorted piles of numbered cards, and you want to combine them into one sorted pile. The natural approach is to always take the smaller of the two top cards and add it to your result pile. - The dummy node technique is a common pattern for linked list problems. It eliminates - the need for special logic to initialize the head of the result list. + Think of it like the **merge step in merge sort** — you have two already-sorted sequences, and you need to combine them while maintaining sorted order. The key insight is that since both lists are sorted, the smallest unprocessed element is always at the front of one of the two lists. - Think of it like merging two sorted piles of cards — always take the smaller top card. + The **dummy node technique** is a powerful pattern for linked list construction. Instead of handling the "first node" as a special case, we create a placeholder node to start our result list. This lets us always use `current.next = ...` without checking if we're setting the head. + + approach: | + We solve this using an **Iterative Merge with Dummy Node**: + + **Step 1: Create a dummy node and current pointer** + + - Create a `dummy` node as a placeholder (its value doesn't matter) + - Set `current` to point to `dummy` — this is where we'll build our result + - The dummy eliminates special-case logic for initialising the head + +   + + **Step 2: Compare and attach nodes** + + - While both lists have remaining nodes: + - Compare the values at the heads of `list1` and `list2` + - Attach the smaller node to `current.next` + - Advance the pointer of whichever list we took from + - Move `current` forward to the newly attached node + +   + + **Step 3: Attach remaining nodes** + + - When the loop ends, one list may still have nodes + - Simply attach the entire remaining portion: `current.next = list1 or list2` + - No need to iterate — the remaining nodes are already sorted and linked! + +   + + **Step 4: Return the merged list** + + - Return `dummy.next` — this skips the placeholder and gives the actual head + - The dummy node itself is not part of the result common_pitfalls: - - title: Forgetting to handle empty lists + - title: Forgetting to Handle Empty Lists description: | - One or both lists might be empty. The dummy node pattern handles this naturally, - but without it, you need explicit null checks. - wrong_approach: "Assuming both lists have at least one node" - correct_approach: "Use dummy node or check for null at the start" + One or both input lists might be `None`. Without the dummy node pattern, you'd need explicit null checks to initialise the result head. - - title: Not linking remaining nodes + For example, if `list1` is empty, the correct answer is simply `list2`. The dummy node handles this naturally — if one list is empty, the loop never runs, and we just attach the non-empty list. + wrong_approach: "Assuming both lists have at least one node" + correct_approach: "Use dummy node, which handles empty lists automatically" + + - title: Iterating Through Remaining Nodes One-by-One description: | - After the main loop, one list might still have nodes. Don't iterate through - them — just link the entire remaining portion. - wrong_approach: "Looping through remaining nodes one by one" + After the main loop, one list might still have nodes. A common mistake is to loop through them individually. + + Since the remaining nodes are already sorted and linked together, you can attach the entire remainder with a single assignment: `current.next = list1 if list1 else list2`. This is O(1), not O(remaining length). + wrong_approach: "while list1: current.next = list1; list1 = list1.next" correct_approach: "current.next = list1 or list2" - - title: Returning dummy instead of dummy.next + - title: Returning dummy Instead of dummy.next description: | - The dummy node is just a placeholder. The actual merged list starts at dummy.next. + The dummy node is just a construction helper — it's not part of the actual result. The merged list starts at `dummy.next`. + + Returning `dummy` would include an extra node with whatever value you initialised it with (usually 0). wrong_approach: "return dummy" correct_approach: "return dummy.next" key_takeaways: - - Dummy nodes simplify linked list construction - - This is the merge step of merge sort - - Comparing and advancing pointers is a fundamental linked list technique - - Can also be solved recursively with elegant code + - "**Dummy node pattern**: Eliminates special-case handling for the head node in linked list construction" + - "**This IS merge sort's merge step**: Understanding this prepares you for implementing full merge sort on linked lists" + - "**Compare and advance**: The two-pointer technique of comparing heads and advancing one pointer is fundamental to many linked list problems" + - "**Recursive alternative**: This problem has an elegant recursive solution that's worth understanding, though it uses O(n+m) stack space" - time_complexity: "O(n + m)" - space_complexity: "O(1)" - complexity_explanation: | - Time: We visit each node exactly once, where n and m are the lengths of the two lists. - Space: We only use a few pointers; we reuse existing nodes (no new nodes created). + time_complexity: "O(n + m). We visit each node exactly once, where n and m are the lengths of the two lists." + space_complexity: "O(1). We only use a few pointer variables; we reuse existing nodes without allocating new ones." solutions: - - approach_name: Iterative with Dummy Node (Optimal) + - approach_name: Iterative with Dummy Node is_optimal: true code: | class ListNode: @@ -100,9 +121,11 @@ solutions: list1: ListNode | None, list2: ListNode | None, ) -> ListNode | None: + # Dummy node simplifies head handling dummy = ListNode() current = dummy + # Compare and attach smaller node each iteration while list1 and list2: if list1.val <= list2.val: current.next = list1 @@ -112,13 +135,17 @@ solutions: list2 = list2.next current = current.next - # Attach remaining nodes + # Attach any remaining nodes (already sorted) current.next = list1 if list1 else list2 + # Skip dummy, return actual head return dummy.next explanation: | - Use a dummy node to build the result. Compare heads and attach the smaller one. - Finally, attach any remaining nodes from the non-empty list. + **Time Complexity:** O(n + m) — Each node is visited exactly once. + + **Space Complexity:** O(1) — Only pointer variables used; existing nodes are reused. + + We use a dummy node to avoid special-casing the head. In each iteration, we attach the smaller of the two current nodes and advance that list's pointer. Finally, we attach any remaining nodes and return `dummy.next`. - approach_name: Recursive is_optimal: false @@ -127,18 +154,24 @@ solutions: list1: ListNode | None, list2: ListNode | None, ) -> ListNode | None: + # Base cases: if either list is empty, return the other if not list1: return list2 if not list2: return list1 + # Recursive case: attach smaller head and recurse if list1.val <= list2.val: + # list1 is smaller, it becomes head of result list1.next = merge_two_lists(list1.next, list2) return list1 else: + # list2 is smaller, it becomes head of result list2.next = merge_two_lists(list1, list2.next) return list2 explanation: | - Elegant recursive solution. Base case: return the non-null list. - Recursive case: attach smaller head and recurse on remaining lists. - Space is O(n+m) due to recursion stack. + **Time Complexity:** O(n + m) — Each node is processed once. + + **Space Complexity:** O(n + m) — Recursion stack depth equals total number of nodes. + + This elegant recursive solution chooses the smaller head, then recursively merges the rest. The base case handles empty lists. While beautiful, the iterative approach is preferred for large lists due to stack space limitations. diff --git a/backend/data/questions/number-of-islands.yaml b/backend/data/questions/number-of-islands.yaml index 9a15034..efe417e 100644 --- a/backend/data/questions/number-of-islands.yaml +++ b/backend/data/questions/number-of-islands.yaml @@ -9,19 +9,18 @@ categories: patterns: - dfs - bfs + - matrix-traversal description: | - Given an m x n 2D binary grid `grid` which represents a map of '1's (land) and '0's (water), - return the number of islands. + Given an `m × n` 2D binary grid `grid` which represents a map of `'1'`s (land) and `'0'`s (water), return *the number of islands*. - An island is surrounded by water and is formed by connecting adjacent lands horizontally - or vertically. You may assume all four edges of the grid are surrounded by water. + An **island** is surrounded by water and is formed by connecting adjacent lands **horizontally or vertically**. You may assume all four edges of the grid are surrounded by water. constraints: | - - m == grid.length - - n == grid[i].length - - 1 <= m, n <= 300 - - grid[i][j] is '0' or '1' + - `m == grid.length` + - `n == grid[i].length` + - `1 <= m, n <= 300` + - `grid[i][j]` is `'0'` or `'1'` examples: - input: | @@ -32,7 +31,7 @@ examples: ["0","0","0","0","0"] ] output: "1" - explanation: "All land cells are connected, forming one island." + explanation: "All land cells are connected horizontally/vertically, forming one island." - input: | grid = [ ["1","1","0","0","0"], @@ -41,53 +40,93 @@ examples: ["0","0","0","1","1"] ] output: "3" - explanation: "Three separate connected components of land." + explanation: "Three separate groups of connected land cells — three islands." explanation: - approach: | - 1. Iterate through every cell in the grid - 2. When a '1' (land) is found, increment island count - 3. Use DFS/BFS to mark all connected land cells as visited - 4. Continue iteration until all cells are processed - intuition: | - Each island is a connected component of '1's. We need to count these components. + Imagine looking at a map from above. Each `'1'` is a piece of land, and you want to count how many distinct landmasses (islands) exist. Two pieces of land belong to the same island if you can walk from one to the other without crossing water (moving only up, down, left, or right — not diagonally). - When we find an unvisited '1', we've discovered a new island. We then "sink" the entire - island by marking all connected '1's as visited (either change to '0' or use a visited set). - This ensures we don't count the same island multiple times. + Think of it like this: when you step onto a piece of land, you want to "explore" the entire island by visiting all connected land cells. Once you've seen the whole island, you mark it as "visited" so you don't count it again. Then you continue scanning the map for the next unvisited piece of land. + + This is the classic **connected components** problem on a grid. Each island is one connected component of `'1'`s. We count components by: + 1. Finding an unvisited land cell (new island found!) + 2. Exploring all connected land cells (mark the whole island as visited) + 3. Repeat until every cell has been processed + + approach: | + We solve this using **DFS to Explore and Mark Islands**: + + **Step 1: Iterate through every cell** + + - Scan the grid row by row, column by column + - We're looking for unvisited `'1'`s — each one represents a new island + +   + + **Step 2: When land is found, count it and explore** + + - Increment the island count + - Use DFS (or BFS) to visit all connected land cells + - Mark each visited cell by changing `'1'` to `'0'` (this "sinks" the island to avoid recounting) + +   + + **Step 3: DFS exploration** + + - From the current cell, recursively explore all four directions (up, down, left, right) + - Stop when: out of bounds, or cell is water (`'0'`) + - Mark the cell as visited **before** recursive calls to prevent infinite loops + +   + + **Step 4: Return the count** + + - After processing all cells, return the island count + +   + + This works because once we've explored an island, all its cells are marked as `'0'`, so we'll never trigger a new exploration from those cells again. common_pitfalls: - - title: Not marking visited cells + - title: Not Marking Cells as Visited description: | - Without marking cells as visited, you'll count the same island multiple times - or get infinite loops in DFS/BFS. + Without marking visited cells, two things go wrong: + 1. You'll count the same island multiple times (each cell triggers a new count) + 2. DFS/BFS will revisit cells infinitely, causing a stack overflow or infinite loop + + The cleanest solution is to modify the grid itself — change `'1'` to `'0'` when visited. Alternatively, use a separate `visited` set, but this uses extra space. wrong_approach: "Not modifying grid or using visited set" - correct_approach: "Mark cell as '0' or add to visited set when processing" + correct_approach: "grid[r][c] = '0' immediately when visiting" - - title: Diagonal connections + - title: Including Diagonal Connections description: | - Islands only connect horizontally and vertically, not diagonally. - Only explore 4 directions, not 8. + The problem states islands connect **horizontally or vertically** only. Diagonal cells are NOT considered adjacent. - - title: Boundary checks + Check only 4 directions: `(r+1,c), (r-1,c), (r,c+1), (r,c-1)`. Don't include the 4 diagonal directions. + wrong_approach: "Exploring 8 directions including diagonals" + correct_approach: "Explore only 4 orthogonal directions" + + - title: Boundary Check Errors description: | - Always check if row/col are within bounds before accessing grid. + Before accessing `grid[r][c]`, always verify that `r` and `c` are within bounds: + - `0 <= r < rows` + - `0 <= c < cols` + + Missing these checks causes index-out-of-bounds errors. + wrong_approach: "Accessing grid[r][c] without bounds check" + correct_approach: "Check bounds first: if r < 0 or r >= rows or c < 0 or c >= cols: return" key_takeaways: - - Grid problems often reduce to graph traversal - - DFS or BFS both work for exploring connected components - - Modifying input can serve as "visited" tracking - - This pattern applies to many "count components" problems + - "**Grid = implicit graph**: Each cell is a node; adjacent cells are connected by edges" + - "**DFS/BFS for connected components**: Classic technique for counting or exploring connected regions" + - "**In-place marking**: Modifying input to track visited state saves space (when allowed)" + - "**Foundation for many grid problems**: Flood fill, maze solving, region counting all use this pattern" - time_complexity: "O(m × n)" - space_complexity: "O(m × n)" - complexity_explanation: | - Time: Each cell is visited at most once. - Space: DFS recursion stack or BFS queue can hold O(m × n) cells in worst case. + time_complexity: "O(m × n). Each cell is visited at most once by the main loop and at most once by DFS/BFS." + space_complexity: "O(m × n). In the worst case (all land), the DFS recursion stack or BFS queue can hold all cells." solutions: - - approach_name: DFS (Optimal) + - approach_name: DFS is_optimal: true code: | def num_islands(grid: list[list[str]]) -> int: @@ -98,28 +137,36 @@ solutions: islands = 0 def dfs(r: int, c: int) -> None: + # Stop if out of bounds or water if r < 0 or r >= rows or c < 0 or c >= cols: return if grid[r][c] != '1': return - grid[r][c] = '0' # Mark as visited + # Mark as visited by "sinking" the land + grid[r][c] = '0' - dfs(r + 1, c) - dfs(r - 1, c) - dfs(r, c + 1) - dfs(r, c - 1) + # Explore all four directions + dfs(r + 1, c) # down + dfs(r - 1, c) # up + dfs(r, c + 1) # right + dfs(r, c - 1) # left + # Scan every cell in the grid for r in range(rows): for c in range(cols): if grid[r][c] == '1': + # Found new island! Count it and explore islands += 1 dfs(r, c) return islands explanation: | - When land is found, increment count and sink the entire island using DFS. - Modifying the grid serves as our visited marker. + **Time Complexity:** O(m × n) — Each cell visited at most twice (once by loop, once by DFS). + + **Space Complexity:** O(m × n) — Recursion stack in worst case (grid is all land in a snake pattern). + + When we find unvisited land, we increment our count and use DFS to "sink" the entire island by marking all connected land as water. This prevents recounting. - approach_name: BFS is_optimal: true @@ -135,14 +182,18 @@ solutions: def bfs(start_r: int, start_c: int) -> None: queue = deque([(start_r, start_c)]) - grid[start_r][start_c] = '0' + grid[start_r][start_c] = '0' # Mark starting cell while queue: r, c = queue.popleft() + + # Explore all four directions for dr, dc in [(1, 0), (-1, 0), (0, 1), (0, -1)]: nr, nc = r + dr, c + dc + + # Add unvisited land to queue if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1': - grid[nr][nc] = '0' + grid[nr][nc] = '0' # Mark before adding to queue queue.append((nr, nc)) for r in range(rows): @@ -153,5 +204,8 @@ solutions: return islands explanation: | - Same logic using BFS instead of DFS. - Avoids recursion stack but uses queue space. + **Time Complexity:** O(m × n) — Same as DFS. + + **Space Complexity:** O(min(m, n)) — Queue holds at most one "frontier" layer, which is bounded by the smaller dimension. + + BFS explores level by level rather than depth-first. Mark cells as visited **when adding to queue** (not when processing) to avoid adding duplicates. diff --git a/backend/data/questions/three-sum.yaml b/backend/data/questions/three-sum.yaml index 0712014..f557b16 100644 --- a/backend/data/questions/three-sum.yaml +++ b/backend/data/questions/three-sum.yaml @@ -11,109 +11,159 @@ patterns: - two-pointers description: | - Given an integer array `nums`, return all the triplets [nums[i], nums[j], nums[k]] such that - i != j, i != k, and j != k, and nums[i] + nums[j] + nums[k] == 0. + Given an integer array `nums`, return all the triplets `[nums[i], nums[j], nums[k]]` such that `i != j`, `i != k`, and `j != k`, and `nums[i] + nums[j] + nums[k] == 0`. - Notice that the solution set must not contain duplicate triplets. + Notice that the solution set must **not contain duplicate triplets**. constraints: | - - 3 <= nums.length <= 3000 - - -10^5 <= nums[i] <= 10^5 + - `3 <= nums.length <= 3000` + - `-10^5 <= nums[i] <= 10^5` examples: - input: "nums = [-1,0,1,2,-1,-4]" output: "[[-1,-1,2],[-1,0,1]]" - explanation: "The distinct triplets that sum to zero." + explanation: "The distinct triplets that sum to zero are [-1,-1,2] and [-1,0,1]." - input: "nums = [0,1,1]" output: "[]" explanation: "No triplet sums to zero." - input: "nums = [0,0,0]" output: "[[0,0,0]]" - explanation: "Only one triplet sums to zero." + explanation: "The only triplet [0,0,0] sums to zero." explanation: - approach: | - 1. Sort the array - 2. For each element nums[i], find pairs that sum to -nums[i] - 3. Use two pointers (left, right) to find pairs in the remaining array - 4. Skip duplicates at each level to avoid duplicate triplets - intuition: | - After sorting, for each fixed element nums[i], we need to find nums[j] + nums[k] = -nums[i]. - This reduces to the Two Sum II problem on a sorted array, solvable with two pointers. + Finding three numbers that sum to zero seems complex, but we can reduce it to a simpler problem we already know how to solve. - Sorting enables two things: efficient two-pointer search and easy duplicate skipping. - We skip duplicates by checking if current value equals previous value. + Think of it like this: if we **fix** one number (call it `a`), then we need to find two numbers that sum to `-a`. This is exactly the Two Sum problem! But instead of using a hash map (which makes duplicate handling tricky), we can use two pointers on a **sorted** array. + + Sorting gives us two superpowers: + 1. **Two pointers work**: With a sorted array, if our sum is too small, move left pointer right; if too big, move right pointer left + 2. **Easy duplicate skipping**: Adjacent duplicates become neighbours, so `if nums[i] == nums[i-1]: skip` + + The algorithm: for each element `nums[i]`, use two pointers on the remaining array to find pairs summing to `-nums[i]`. + + approach: | + We solve this using **Sort + Two Pointers**: + + **Step 1: Sort the array** + + - Sorting enables two-pointer technique and easy duplicate detection + - Time: O(n log n), which doesn't affect overall O(n²) complexity + +   + + **Step 2: Fix the first element and find pairs** + + - For each `i` from 0 to n-3: + - Skip if `nums[i] == nums[i-1]` (avoid duplicate triplets) + - **Early termination**: If `nums[i] > 0`, stop — no triplet can sum to zero (all remaining elements are positive) + - Set `left = i + 1`, `right = n - 1` + - Find pairs using two pointers + +   + + **Step 3: Two-pointer search for each fixed element** + + - Calculate `total = nums[i] + nums[left] + nums[right]` + - If `total < 0`: we need larger values, move `left` right + - If `total > 0`: we need smaller values, move `right` left + - If `total == 0`: found a triplet! + - Add `[nums[i], nums[left], nums[right]]` to result + - Skip duplicates for both pointers: `while nums[left] == nums[left+1]: left++` + - Move both pointers inward + +   + + **Step 4: Return all unique triplets** + + Duplicate skipping happens at three levels: the outer loop, left pointer, and right pointer. common_pitfalls: - - title: Not handling duplicates + - title: Not Handling Duplicates Properly description: | - Without duplicate skipping, you'll return duplicate triplets. - Skip at the outer loop and both pointers. - wrong_approach: "Not skipping when nums[i] == nums[i-1]" - correct_approach: "if i > 0 and nums[i] == nums[i-1]: continue" + Without careful duplicate skipping, you'll return duplicate triplets like `[-1,-1,2]` multiple times. - - title: Wrong pointer skipping - description: | - After finding a valid triplet, skip duplicates for both left and right pointers - while maintaining left < right. + Duplicates must be handled at **all three levels**: + 1. Outer loop: `if i > 0 and nums[i] == nums[i-1]: continue` + 2. Left pointer: `while left < right and nums[left] == nums[left+1]: left += 1` + 3. Right pointer: `while left < right and nums[right] == nums[right-1]: right -= 1` + wrong_approach: "Using a set of tuples (works but slower)" + correct_approach: "Skip adjacent duplicates at each level" - - title: Starting i too late + - title: Duplicate Skipping Order After Finding Triplet description: | - The outer loop should start at index 0. Also, skip if nums[i] > 0 since - sorted array means no valid triplet possible. + After finding a valid triplet, skip duplicates **before** moving both pointers. A common bug is skipping duplicates incorrectly, leading to missing triplets or infinite loops. + + The sequence should be: (1) add triplet, (2) skip left duplicates, (3) skip right duplicates, (4) move both `left++` and `right--`. + wrong_approach: "Moving pointers before skipping duplicates" + correct_approach: "Skip duplicates first, then move both pointers" + + - title: Missing Early Termination + description: | + Once `nums[i] > 0` in a sorted array, no valid triplet can exist (all remaining elements are non-negative, so the smallest possible sum is positive). + + This optimisation can significantly speed up cases with many positive numbers. + wrong_approach: "Continuing to search when nums[i] > 0" + correct_approach: "if nums[i] > 0: break" key_takeaways: - - Reduce N-sum to (N-1)-sum by fixing one element - - Sorting enables two-pointer approach and duplicate handling - - Duplicate skipping happens at multiple levels - - Time complexity is O(n²) — can't do better for returning all triplets + - "**Reduce N-sum to (N-1)-sum**: Fix one element and solve a smaller problem — this pattern extends to 4Sum, kSum" + - "**Sorting enables two pointers**: Transforms O(n²) lookup per element into O(n)" + - "**Multi-level duplicate handling**: When returning all unique solutions, handle duplicates at every decision point" + - "**Time complexity is O(n²)**: Can't do better when returning all triplets (there can be O(n²) triplets)" - time_complexity: "O(n²)" - space_complexity: "O(log n) to O(n)" - complexity_explanation: | - Time: O(n log n) for sorting + O(n²) for the two-pointer search. - Space: Depends on sorting algorithm (log n for in-place, n for non-in-place). + time_complexity: "O(n²). Sorting is O(n log n), then for each of n elements, the two-pointer search is O(n)." + space_complexity: "O(log n) to O(n). Depends on the sorting algorithm — O(log n) for in-place sorts, O(n) for others. The output is not counted as extra space." solutions: - - approach_name: Sort + Two Pointers (Optimal) + - approach_name: Sort + Two Pointers is_optimal: true code: | def three_sum(nums: list[int]) -> list[list[int]]: - nums.sort() + nums.sort() # Enable two pointers and duplicate detection result = [] + n = len(nums) - for i in range(len(nums) - 2): - # Skip duplicates for i + for i in range(n - 2): + # Skip duplicates for the first element if i > 0 and nums[i] == nums[i - 1]: continue - # Early termination + # Early termination: if smallest element is positive, no solution if nums[i] > 0: break - left, right = i + 1, len(nums) - 1 + # Two pointers for the remaining array + left, right = i + 1, n - 1 while left < right: total = nums[i] + nums[left] + nums[right] if total < 0: + # Need larger sum, move left pointer left += 1 elif total > 0: + # Need smaller sum, move right pointer right -= 1 else: + # Found a triplet result.append([nums[i], nums[left], nums[right]]) - # Skip duplicates for left and right + # Skip duplicates for left pointer while left < right and nums[left] == nums[left + 1]: left += 1 + # Skip duplicates for right pointer while left < right and nums[right] == nums[right - 1]: right -= 1 + # Move both pointers for next pair left += 1 right -= 1 return result explanation: | - Fix one element and use two pointers to find the other two. - Skip duplicates at all levels to avoid duplicate triplets. + **Time Complexity:** O(n²) — O(n log n) sort + O(n) two-pointer search for each of O(n) elements. + + **Space Complexity:** O(log n) to O(n) — Sorting space; output not counted. + + We sort the array, then for each element, use two pointers to find pairs that complete the triplet. Careful duplicate skipping at all three levels ensures we return only unique triplets. diff --git a/backend/data/questions/trapping-rain-water.yaml b/backend/data/questions/trapping-rain-water.yaml index 10ab08c..bba2a3b 100644 --- a/backend/data/questions/trapping-rain-water.yaml +++ b/backend/data/questions/trapping-rain-water.yaml @@ -12,68 +12,102 @@ patterns: - monotonic-stack description: | - Given n non-negative integers representing an elevation map where the width of each bar is 1, - compute how much water it can trap after raining. + Given `n` non-negative integers representing an elevation map where the width of each bar is `1`, compute how much water it can trap after raining. constraints: | - - n == height.length - - 1 <= n <= 2 * 10^4 - - 0 <= height[i] <= 10^5 + - `n == height.length` + - `1 <= n <= 2 × 10^4` + - `0 <= height[i] <= 10^5` examples: - input: "height = [0,1,0,2,1,0,1,3,2,1,2,1]" output: "6" - explanation: "6 units of water are trapped between the bars." + explanation: "The elevation map traps 6 units of water between the bars." - input: "height = [4,2,0,3,2,5]" output: "9" - explanation: "9 units of water are trapped." + explanation: "Water fills the valleys: 2 + 4 + 1 + 2 = 9 units." explanation: - approach: | - 1. Use two pointers from left and right - 2. Track maximum height seen from each side - 3. Move the pointer with smaller max height - 4. Water at current position = max_height - current_height - 5. Add to total and continue until pointers meet - intuition: | - Water at any position is determined by the minimum of the maximum heights to its left - and right, minus the current height. + Visualise the elevation map as a cross-section of terrain. After rain, water fills the valleys but can't rise above the surrounding walls. - With two pointers, we track left_max and right_max. If left_max < right_max, water at - the left pointer is limited by left_max (the right side is guaranteed to be at least - as tall). We process and move the pointer with the smaller maximum. + Think of it like this: at any position `i`, the water level is determined by the **shorter** of the two walls — the tallest bar to the left and the tallest bar to the right. Water can't rise higher than this "limiting wall" without spilling over. + + For position `i`: + - `left_max` = maximum height to the left of i + - `right_max` = maximum height to the right of i + - Water level at i = `min(left_max, right_max)` + - Water trapped at i = `water_level - height[i]` (if positive) + + The clever insight for the two-pointer approach: if we know `left_max < right_max`, the water at the left position is limited by `left_max` — we don't need to know the exact `right_max`, just that it's bigger. This lets us process from both ends simultaneously. + + approach: | + We solve this using **Two Pointers**: + + **Step 1: Initialise pointers and tracking variables** + + - `left = 0`, `right = n - 1` (start at both ends) + - `left_max = 0`, `right_max = 0` (maximum heights seen so far) + - `water = 0` (total water trapped) + +   + + **Step 2: Process from both ends** + + - While `left < right`: + - If `height[left] < height[right]`: + - If `height[left] >= left_max`: update `left_max` + - Else: water trapped = `left_max - height[left]`, add to total + - Move `left` right + - Else: + - If `height[right] >= right_max`: update `right_max` + - Else: water trapped = `right_max - height[right]`, add to total + - Move `right` left + +   + + **Step 3: Return total water** + + - After pointers meet, all positions have been processed + +   + + Why process the shorter side? If `height[left] < height[right]`, the water at `left` is bounded by `left_max` (the right side is guaranteed to be at least as tall as `height[right]`, which is bigger). We can safely compute water at `left` without knowing the exact `right_max`. common_pitfalls: - - title: Only considering one side + - title: Only Considering One Side description: | - Water level is determined by BOTH sides. You need to track maximum from left AND right. + Water level at any position depends on BOTH the left maximum and right maximum. If you only track one side, you'll compute incorrect water levels. + + The two-pointer approach cleverly tracks both sides by processing the limiting side first. wrong_approach: "Only tracking left_max" correct_approach: "Track both left_max and right_max" - - title: Counting bars instead of water + - title: Counting Bar Height as Water description: | - Water trapped at position i is max_height - height[i], not max_height. - The bar itself takes up space. + Water trapped at position i is `max_height - height[i]`, not just `max_height`. The bar itself occupies space and can't hold water. + wrong_approach: "water += left_max" + correct_approach: "water += left_max - height[left]" - - title: Not updating max heights + - title: Not Updating Max Before Computing Water description: | - Update left_max or right_max before calculating water, not after. + Update `left_max` or `right_max` **before** computing water. If the current bar is taller than the previous max, no water is trapped there (it's a new "wall"). + + The code should check: if current >= max, update max; else compute water. + wrong_approach: "Computing water, then updating max" + correct_approach: "if height >= max: max = height; else: water += max - height" key_takeaways: - - Two pointers eliminate need for O(n) precomputation - - Water level = min(left_max, right_max) - current_height - - Always process the side with smaller max (guaranteed bound) - - This can also be solved with monotonic stack or DP + - "**Water level = min(left_max, right_max)**: The shorter wall determines the water level" + - "**Two pointers eliminate precomputation**: No need to precompute left_max and right_max arrays" + - "**Process the limiting side**: If left is shorter, it's bounded by left_max; process it and move inward" + - "**Multiple approaches exist**: DP (precompute arrays), monotonic stack, and two pointers all work" - time_complexity: "O(n)" - space_complexity: "O(1)" - complexity_explanation: | - Time: Single pass with two pointers. - Space: Only a few variables for pointers and max values. + time_complexity: "O(n). Single pass with two pointers, processing each position once." + space_complexity: "O(1). Only a few variables for pointers and maximum values." solutions: - - approach_name: Two Pointers (Optimal) + - approach_name: Two Pointers is_optimal: true code: | def trap(height: list[int]) -> int: @@ -86,12 +120,16 @@ solutions: while left < right: if height[left] < height[right]: + # Left side is the limiting factor if height[left] >= left_max: + # New wall — update max, no water here left_max = height[left] else: + # Valley — water trapped up to left_max water += left_max - height[left] left += 1 else: + # Right side is the limiting factor if height[right] >= right_max: right_max = height[right] else: @@ -100,30 +138,38 @@ solutions: return water explanation: | - Process from both ends. Move the pointer with smaller max height. - Add water based on the difference between max height and current height. + **Time Complexity:** O(n) — Single pass through the array. + + **Space Complexity:** O(1) — Only constant extra space. + + We process from both ends, always moving the pointer on the shorter side. If the current height exceeds the running max, it becomes the new max (a wall). Otherwise, water is trapped equal to the difference between max and current height. The key insight: processing the shorter side first guarantees correct water calculation. - approach_name: Monotonic Stack is_optimal: false code: | def trap(height: list[int]) -> int: - stack = [] # stores indices + stack = [] # Stores indices of bars in decreasing height water = 0 for i, h in enumerate(height): + # Pop shorter bars and calculate water in the valley while stack and h > height[stack[-1]]: - top = stack.pop() + bottom = stack.pop() if not stack: - break + break # No left boundary + # Calculate water in this layer width = i - stack[-1] - 1 - bounded_height = min(h, height[stack[-1]]) - height[top] + bounded_height = min(h, height[stack[-1]]) - height[bottom] water += width * bounded_height stack.append(i) return water explanation: | - Stack stores indices of bars in decreasing height order. - When a taller bar is found, calculate water trapped in the "valley". + **Time Complexity:** O(n) — Each index pushed and popped at most once. + + **Space Complexity:** O(n) — Stack can hold up to n indices. + + The stack maintains bars in decreasing order. When we encounter a taller bar, we pop shorter bars and calculate water trapped in the "valley" between the current bar and the previous taller bar on the stack. Water is computed layer by layer, horizontally. diff --git a/backend/data/questions/word-search.yaml b/backend/data/questions/word-search.yaml index c1e65c6..e711082 100644 --- a/backend/data/questions/word-search.yaml +++ b/backend/data/questions/word-search.yaml @@ -11,108 +11,160 @@ patterns: - dfs description: | - Given an m x n grid of characters `board` and a string `word`, return true if `word` exists - in the grid. + Given an `m × n` grid of characters `board` and a string `word`, return `true` if `word` exists in the grid. - The word can be constructed from letters of sequentially adjacent cells, where adjacent cells - are horizontally or vertically neighboring. The same letter cell may not be used more than once. + The word can be constructed from letters of sequentially **adjacent** cells, where adjacent cells are horizontally or vertically neighboring. The same letter cell may **not be used more than once**. constraints: | - - m == board.length - - n == board[i].length - - 1 <= m, n <= 6 - - 1 <= word.length <= 15 - - board and word consist of only lowercase and uppercase English letters + - `m == board.length` + - `n == board[i].length` + - `1 <= m, n <= 6` + - `1 <= word.length <= 15` + - `board` and `word` consist of only lowercase and uppercase English letters examples: - input: 'board = [["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], word = "ABCCED"' output: "true" - explanation: "Path exists starting from top-left corner." + explanation: "Path: A(0,0) → B(0,1) → C(0,2) → C(1,2) → E(2,2) → D(2,1)" - input: 'board = [["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], word = "SEE"' output: "true" - explanation: "Path exists." + explanation: "Path: S(1,3) → E(2,3) → E(2,2)" - input: 'board = [["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], word = "ABCB"' output: "false" - explanation: "Would need to reuse 'B' cell." + explanation: "Would require reusing the 'B' cell at (0,1)." explanation: - approach: | - 1. For each cell, try to start the word from there - 2. Use DFS with backtracking to explore all paths - 3. Mark cells as visited during exploration - 4. Unmark cells when backtracking (restore state) - 5. If entire word is matched, return true - intuition: | - This is a classic backtracking problem. We explore paths character by character, - and if we reach a dead end (no valid next character), we backtrack and try a - different direction. + Imagine walking through a maze of letters, trying to spell out a word. At each step, you can move up, down, left, or right to an adjacent cell. But there's a rule: you can't step on the same cell twice. - The key is marking cells as visited during exploration to avoid reusing them, - then unmarking when we backtrack to allow other paths to use them. + This is a classic **backtracking** problem. We try a path, and if it leads to a dead end (wrong character or no valid moves), we **backtrack** — undo our steps and try a different direction. + + Think of it like this: + 1. Start from any cell that matches the first character + 2. From there, try to find the second character in any adjacent cell + 3. Mark cells as "visited" to prevent reuse + 4. If we hit a dead end, **unmark** the cell and try another path + 5. If we match all characters, we've found the word! + + The key insight is that backtracking requires **restoring state** after each failed attempt. + + approach: | + We solve this using **DFS with Backtracking**: + + **Step 1: Try every cell as a starting point** + + - Iterate through all cells in the grid + - For each cell, attempt to find the word starting there + +   + + **Step 2: Define the DFS function** + + - `dfs(row, col, index)` returns True if we can find `word[index:]` starting from `(row, col)` + - Base case: if `index == len(word)`, we've matched everything — return True + - Boundary check: if out of bounds, return False + - Character check: if `board[row][col] != word[index]`, return False + +   + + **Step 3: Mark, explore, and unmark** + + - **Mark**: Temporarily change `board[row][col]` to `'#'` to prevent reuse + - **Explore**: Recursively check all four directions with `index + 1` + - **Unmark**: Restore `board[row][col]` to its original value (backtrack) + +   + + **Step 4: Return result** + + - If any DFS call returns True, the word exists + - If all starting points fail, return False + +   + + The unmarking step is crucial — it allows other paths to use the same cell. common_pitfalls: - - title: Not restoring visited state + - title: Not Restoring Visited State description: | - After exploring a path, you must unmark the cell as visited. - Otherwise, other paths from earlier cells can't use it. - wrong_approach: "Only marking, never unmarking" - correct_approach: "Mark before recursion, unmark after" + After exploring a path, you **must** restore the cell's original value. Otherwise, other paths can't use that cell. - - title: Modifying board permanently - description: | - If you change board[r][c] to mark as visited, restore it after backtracking. + ```python + # WRONG: Cell stays marked forever + board[r][c] = '#' + result = dfs(...) + return result - - title: Checking word completion too late + # RIGHT: Restore after exploring + board[r][c] = '#' + result = dfs(...) + board[r][c] = original_value # Backtrack! + return result + ``` + wrong_approach: "Only marking cells, never restoring" + correct_approach: "Store original value, mark, explore, restore" + + - title: Checking Word Completion Too Late description: | - Check if entire word is matched (index == len(word)) at the start of DFS, - before any bounds/character checks. + Check if `index == len(word)` **before** bounds and character checks. Otherwise, when we've matched all characters, we might return False due to being "out of bounds" at the next position. + wrong_approach: "Checking bounds/character before word completion" + correct_approach: "if index == len(word): return True # Check first!" + + - title: Not Trying All Directions + description: | + You must explore all four directions: up, down, left, right. Missing any direction means missing potential valid paths. + + Use short-circuit OR: `dfs(r+1,c) or dfs(r-1,c) or dfs(r,c+1) or dfs(r,c-1)` + wrong_approach: "Only checking some directions" + correct_approach: "Explore all four orthogonal directions" key_takeaways: - - Backtracking = DFS with state restoration - - Mark and unmark visited cells around recursive calls - - Early termination when full word is found - - Grid constraints allow brute force (small board size) + - "**Backtracking = DFS + state restoration**: Mark before recursion, unmark after" + - "**Early termination**: Return True as soon as the word is found" + - "**In-place marking**: Using `'#'` to mark cells avoids extra space for a visited set" + - "**Small constraints enable brute force**: With m, n ≤ 6 and word ≤ 15, exponential exploration is acceptable" - time_complexity: "O(m × n × 3^L)" - space_complexity: "O(L)" - complexity_explanation: | - Time: Start from each cell, explore up to 3 directions (not the one we came from) for L characters. - Space: Recursion depth is at most word length L. + time_complexity: "O(m × n × 3^L). We try each cell as a start, and from each cell, we explore up to 3 directions (excluding where we came from) for L characters." + space_complexity: "O(L). The recursion stack depth equals the word length L." solutions: - - approach_name: DFS with Backtracking (Optimal) + - approach_name: DFS with Backtracking is_optimal: true code: | def exist(board: list[list[str]], word: str) -> bool: rows, cols = len(board), len(board[0]) def dfs(r: int, c: int, i: int) -> bool: + # Base case: found all characters if i == len(word): return True + # Boundary check if r < 0 or r >= rows or c < 0 or c >= cols: return False + + # Character mismatch if board[r][c] != word[i]: return False - # Mark as visited - temp = board[r][c] + # Mark cell as visited (temporarily) + original = board[r][c] board[r][c] = '#' - # Explore all 4 directions + # Explore all four directions found = ( - dfs(r + 1, c, i + 1) or - dfs(r - 1, c, i + 1) or - dfs(r, c + 1, i + 1) or - dfs(r, c - 1, i + 1) + dfs(r + 1, c, i + 1) or # down + dfs(r - 1, c, i + 1) or # up + dfs(r, c + 1, i + 1) or # right + dfs(r, c - 1, i + 1) # left ) - # Restore (backtrack) - board[r][c] = temp + # Restore cell (backtrack) + board[r][c] = original return found + # Try every cell as starting point for r in range(rows): for c in range(cols): if dfs(r, c, 0): @@ -120,5 +172,8 @@ solutions: return False explanation: | - Try starting from each cell. Use DFS to match characters one by one. - Mark cells temporarily, then restore when backtracking. + **Time Complexity:** O(m × n × 3^L) — Each starting cell can explore 3 directions per character. + + **Space Complexity:** O(L) — Recursion depth equals word length. + + We try each cell as a starting point. DFS matches characters one by one, marking cells to prevent reuse. After exploring, we restore the cell's value (backtrack) to allow other paths to use it. Short-circuit OR provides early termination.