diff --git a/backend/data/questions/find-in-mountain-array.yaml b/backend/data/questions/find-in-mountain-array.yaml new file mode 100644 index 0000000..a07036e --- /dev/null +++ b/backend/data/questions/find-in-mountain-array.yaml @@ -0,0 +1,248 @@ +title: Find in Mountain Array +slug: find-in-mountain-array +difficulty: hard +leetcode_id: 1095 +leetcode_url: https://leetcode.com/problems/find-in-mountain-array/ +categories: + - arrays + - binary-search +patterns: + - binary-search + +description: | + *(This problem is an **interactive problem**.)* + + You may recall that an array `arr` is a **mountain array** if and only if: + + - `arr.length >= 3` + - There exists some `i` with `0 < i < arr.length - 1` such that: + - `arr[0] < arr[1] < ... < arr[i - 1] < arr[i]` + - `arr[i] > arr[i + 1] > ... > arr[arr.length - 1]` + + Given a mountain array `mountainArr`, return the **minimum** index such that `mountainArr.get(index) == target`. If such an index does not exist, return `-1`. + + **You cannot access the mountain array directly.** You may only access the array using a `MountainArray` interface: + + - `MountainArray.get(k)` returns the element of the array at index `k` (0-indexed). + - `MountainArray.length()` returns the length of the array. + + Submissions making more than `100` calls to `MountainArray.get` will be judged *Wrong Answer*. Also, any solutions that attempt to circumvent the judge will result in disqualification. + +constraints: | + - `3 <= mountainArr.length() <= 10^4` + - `0 <= target <= 10^9` + - `0 <= mountainArr.get(index) <= 10^9` + +examples: + - input: "mountainArr = [1,2,3,4,5,3,1], target = 3" + output: "2" + explanation: "3 exists in the array, at index=2 and index=5. Return the minimum index, which is 2." + - input: "mountainArr = [0,1,2,4,2,1], target = 3" + output: "-1" + explanation: "3 does not exist in the array, so we return -1." + +explanation: + intuition: | + Imagine a mountain with a single peak. You're standing at the base and need to find a specific elevation marker — but you can only check the elevation at a limited number of points (100 checks maximum). + + The key insight is that a mountain array is actually **two sorted arrays joined at the peak**: the left side is strictly increasing, and the right side is strictly decreasing. This structure is perfect for binary search! + + Think of it like this: if we can find the peak, we've essentially split the problem into two simpler binary searches: + 1. Search the ascending left side (standard binary search) + 2. If not found, search the descending right side (reversed binary search) + + But here's the crucial detail: we want the **minimum index**. Since the left side has smaller indices than the right side, we should search the left side first. If we find the target there, we're done — no need to check the right side. + + The challenge is that each `get()` call is expensive (limited to 100 total), so we must use binary search for all three operations: finding the peak and searching both sides. + + approach: | + We solve this using **Three Binary Searches**: + + **Step 1: Find the peak index** + + - Use binary search to locate the peak (maximum element) + - At each midpoint, compare `arr[mid]` with `arr[mid + 1]` + - If `arr[mid] < arr[mid + 1]`, we're on the ascending side — peak is to the right + - If `arr[mid] > arr[mid + 1]`, we're on the descending side or at the peak — search left + - When `left == right`, we've found the peak + +   + + **Step 2: Binary search the ascending (left) side** + + - Search from index `0` to `peak` using standard binary search + - If `arr[mid] < target`, move right; if `arr[mid] > target`, move left + - If found, return immediately (this guarantees minimum index) + +   + + **Step 3: Binary search the descending (right) side** + + - Only if not found on the left side + - Search from index `peak + 1` to `n - 1` + - Since this side is **decreasing**, the comparisons are reversed: + - If `arr[mid] > target`, move right (smaller values are to the right) + - If `arr[mid] < target`, move left + +   + + **Step 4: Return the result** + + - If found on either side, return that index + - Otherwise, return `-1` + + common_pitfalls: + - title: Exceeding the Call Limit + description: | + With at most `100` calls to `MountainArray.get()` and array length up to `10^4`, a linear scan is not an option. + + Three binary searches use at most `3 * log2(10^4) ≈ 3 * 14 = 42` calls, well within the limit. But caching values you've already fetched can help reduce redundant calls further. + wrong_approach: "Linear scan or excessive get() calls" + correct_approach: "Three binary searches with O(log n) calls each" + + - title: Forgetting to Search Left Side First + description: | + The problem asks for the **minimum index**. If the target appears on both the ascending and descending sides (like `3` in `[1,2,3,4,5,3,1]`), you must return the smaller index. + + Always search the left (ascending) side first and return immediately if found. Only search the right side if the left search fails. + wrong_approach: "Searching right side first or both sides without priority" + correct_approach: "Search ascending side first, return immediately if found" + + - title: Incorrect Binary Search Direction on Descending Side + description: | + The descending (right) side of the mountain is sorted in **reverse order**. Standard binary search logic must be inverted: + + - In ascending order: `arr[mid] < target` means move right + - In descending order: `arr[mid] < target` means move **left** (larger values are to the left) + + Mixing up these directions causes incorrect results. + wrong_approach: "Using same comparison logic for both sides" + correct_approach: "Invert comparisons for the descending side" + + - title: Off-by-One in Peak Finding + description: | + When finding the peak, be careful with boundary conditions. The peak can never be at index `0` or `n-1` (by definition of mountain array), so initialize `left = 1` and `right = n - 2` for safety. + + Also, when comparing `arr[mid]` with `arr[mid + 1]`, ensure `mid + 1` is within bounds. + + key_takeaways: + - "**Decompose the problem**: A mountain array is two sorted subarrays — find the peak first, then binary search each half" + - "**Binary search on structure**: When data has a predictable structure (sorted, bitonic, rotated), binary search can dramatically reduce search time" + - "**Order matters for ties**: When finding minimum/maximum index, search the appropriate half first to short-circuit early" + - "**Interactive problems**: Limited API calls force O(log n) solutions — linear scans are not acceptable" + + time_complexity: "O(log n). Three binary searches, each taking O(log n) time in the worst case." + space_complexity: "O(1). We only use a constant number of variables for indices and bounds." + +solutions: + - approach_name: Triple Binary Search + is_optimal: true + code: | + # MountainArray interface is provided by the judge: + # class MountainArray: + # def get(self, index: int) -> int: ... + # def length(self) -> int: ... + + class Solution: + def findInMountainArray(self, target: int, mountain_arr: 'MountainArray') -> int: + n = mountain_arr.length() + + # Step 1: Find the peak index using binary search + left, right = 0, n - 1 + while left < right: + mid = (left + right) // 2 + # If mid is less than mid+1, peak is to the right + if mountain_arr.get(mid) < mountain_arr.get(mid + 1): + left = mid + 1 + else: + # Peak is at mid or to the left + right = mid + peak = left + + # Step 2: Binary search on ascending (left) side [0, peak] + left, right = 0, peak + while left <= right: + mid = (left + right) // 2 + val = mountain_arr.get(mid) + if val == target: + return mid # Found on left side = minimum index + elif val < target: + left = mid + 1 + else: + right = mid - 1 + + # Step 3: Binary search on descending (right) side [peak+1, n-1] + left, right = peak + 1, n - 1 + while left <= right: + mid = (left + right) // 2 + val = mountain_arr.get(mid) + if val == target: + return mid + # Descending order: larger values on left, smaller on right + elif val > target: + left = mid + 1 # Move right to find smaller values + else: + right = mid - 1 # Move left to find larger values + + return -1 # Target not found in either half + explanation: | + **Time Complexity:** O(log n) — Three binary searches, each O(log n). + + **Space Complexity:** O(1) — Only constant extra space for variables. + + We first locate the peak using binary search by comparing adjacent elements. Then we search the ascending left side with standard binary search. If not found, we search the descending right side with inverted comparisons. The total number of `get()` calls is at most `2 * log(n) + 2 * log(n) + 2 * log(n) ≈ 6 * log(10^4) ≈ 84`, well within the 100-call limit. + + - approach_name: Triple Binary Search with Caching + is_optimal: false + code: | + class Solution: + def findInMountainArray(self, target: int, mountain_arr: 'MountainArray') -> int: + n = mountain_arr.length() + cache = {} # Cache to avoid redundant get() calls + + def get(i: int) -> int: + if i not in cache: + cache[i] = mountain_arr.get(i) + return cache[i] + + # Find peak + left, right = 0, n - 1 + while left < right: + mid = (left + right) // 2 + if get(mid) < get(mid + 1): + left = mid + 1 + else: + right = mid + peak = left + + # Search ascending side + left, right = 0, peak + while left <= right: + mid = (left + right) // 2 + val = get(mid) + if val == target: + return mid + elif val < target: + left = mid + 1 + else: + right = mid - 1 + + # Search descending side + left, right = peak + 1, n - 1 + while left <= right: + mid = (left + right) // 2 + val = get(mid) + if val == target: + return mid + elif val > target: + left = mid + 1 + else: + right = mid - 1 + + return -1 + explanation: | + **Time Complexity:** O(log n) — Same as the optimal solution. + + **Space Complexity:** O(log n) — Cache stores at most O(log n) values. + + This variation adds a cache dictionary to avoid redundant `get()` calls. While the asymptotic complexity is the same, caching can reduce the actual number of API calls when the same index is accessed multiple times (e.g., the peak index might be checked during both the peak-finding phase and the left-side search). This is a practical optimisation for interactive problems with strict call limits. diff --git a/backend/data/questions/find-k-closest-elements.yaml b/backend/data/questions/find-k-closest-elements.yaml new file mode 100644 index 0000000..9a7be66 --- /dev/null +++ b/backend/data/questions/find-k-closest-elements.yaml @@ -0,0 +1,189 @@ +title: Find K Closest Elements +slug: find-k-closest-elements +difficulty: medium +leetcode_id: 658 +leetcode_url: https://leetcode.com/problems/find-k-closest-elements/ +categories: + - arrays + - binary-search + - two-pointers +patterns: + - binary-search + - two-pointers + +description: | + Given a **sorted** integer array `arr`, two integers `k` and `x`, return the `k` closest integers to `x` in the array. The result should also be sorted in ascending order. + + An integer `a` is closer to `x` than an integer `b` if: + + - `|a - x| < |b - x|`, or + - `|a - x| == |b - x|` and `a < b` + +constraints: | + - `1 <= k <= arr.length` + - `1 <= arr.length <= 10^4` + - `arr` is sorted in **ascending** order + - `-10^4 <= arr[i], x <= 10^4` + +examples: + - input: "arr = [1,2,3,4,5], k = 4, x = 3" + output: "[1,2,3,4]" + explanation: "All elements except 5 are within distance 2 of x=3. Element 4 (distance 1) is closer than 5 (distance 2)." + - input: "arr = [1,1,2,3,4,5], k = 4, x = -1" + output: "[1,1,2,3]" + explanation: "The closest elements to -1 are the smallest values. When distances are equal (both 1s have distance 2), prefer the smaller value." + +explanation: + intuition: | + Imagine you're standing at position `x` on a number line, and the sorted array represents points along that line. You need to find the `k` points closest to where you're standing. + + The key insight is that the answer is always a **contiguous subarray** of length `k`. Why? Because the array is sorted! If you pick element at index `i` and element at index `j` where `j > i + 1`, and they're both in your answer, then every element between them must also be closer to `x` than elements outside this range. + + Think of it like this: you're looking for a **sliding window** of size `k` that captures the `k` closest elements. The question becomes: where should this window start? + + Instead of searching for elements, we can **binary search for the left boundary** of this window. For any starting position, we compare whether the left edge or the element just past the right edge is further from `x`. This tells us whether to move the window left or right. + + approach: | + We solve this using **Binary Search for Window Start**: + + **Step 1: Define the search space** + + - We're searching for the starting index of a window of size `k` + - The starting index can range from `0` to `len(arr) - k` + - Set `left = 0`, `right = len(arr) - k` + +   + + **Step 2: Binary search for optimal start position** + + - While `left < right`: + - Calculate `mid = left + (right - left) // 2` + - Compare `x - arr[mid]` with `arr[mid + k] - x` + - If `x - arr[mid] > arr[mid + k] - x`: + - The left edge is further from `x` than the element just past the right edge + - Move the window right: `left = mid + 1` + - Else: + - The left edge is closer (or equal), keep it as a candidate + - `right = mid` + +   + + **Step 3: Return the window** + + - Return `arr[left:left + k]` + +   + + Why compare `x - arr[mid]` instead of using absolute value? When the left edge is to the left of `x`, `x - arr[mid]` gives the distance. When the right edge past the window is to the right of `x`, `arr[mid + k] - x` gives that distance. This comparison tells us which side should be excluded. + + common_pitfalls: + - title: Sorting with Custom Key + description: | + A common first approach is to sort the array by distance to `x`: + ```python + sorted(arr, key=lambda a: (abs(a - x), a))[:k] + ``` + + This works but has **O(n log n)** time complexity. Since the array is already sorted, we can do better with O(log n + k) using binary search. + wrong_approach: "Sort by distance, take first k" + correct_approach: "Binary search for window start position" + + - title: Using Absolute Values in Comparison + description: | + When comparing distances during binary search, using `abs(arr[mid] - x)` vs `abs(arr[mid + k] - x)` can lead to subtle bugs. + + The comparison `x - arr[mid] > arr[mid + k] - x` works because: + - If both are on the same side of `x`, we're comparing actual positions + - If they straddle `x`, the signs handle the comparison correctly + + Using absolute values requires additional tie-breaking logic for the "prefer smaller value" rule. + wrong_approach: "abs(arr[mid] - x) vs abs(arr[mid + k] - x)" + correct_approach: "x - arr[mid] vs arr[mid + k] - x" + + - title: Wrong Search Space Bounds + description: | + The right bound must be `len(arr) - k`, not `len(arr) - 1`. We're searching for the *start* of a window of size `k`, so the maximum valid start index is `n - k`. + + If `arr = [1,2,3,4,5]` and `k = 3`, valid start indices are 0, 1, 2 (giving windows [1,2,3], [2,3,4], [3,4,5]). + wrong_approach: "right = len(arr) - 1" + correct_approach: "right = len(arr) - k" + + key_takeaways: + - "**Contiguous subarray insight**: In a sorted array, the k closest elements form a contiguous window" + - "**Binary search for boundaries**: Instead of searching for elements, search for the optimal window position" + - "**Comparison without abs()**: When comparing distances on opposite sides, signed arithmetic handles it correctly" + - "**Foundation for window problems**: This technique extends to other problems about finding optimal subarrays in sorted data" + + time_complexity: "O(log(n - k) + k). Binary search takes O(log(n - k)), and returning the slice takes O(k)." + space_complexity: "O(k). The returned list contains k elements. The binary search itself uses O(1) extra space." + +solutions: + - approach_name: Binary Search for Window Start + is_optimal: true + code: | + def find_closest_elements(arr: list[int], k: int, x: int) -> list[int]: + # Search for the starting index of the k-element window + left, right = 0, len(arr) - k + + while left < right: + mid = left + (right - left) // 2 + + # Compare left edge distance vs element just past right edge + if x - arr[mid] > arr[mid + k] - x: + # Left edge is further, move window right + left = mid + 1 + else: + # Left edge is closer (or equal), keep as candidate + right = mid + + # Return the k-element window starting at left + return arr[left:left + k] + explanation: | + **Time Complexity:** O(log(n - k) + k) — Binary search over n - k + 1 positions, plus slicing k elements. + + **Space Complexity:** O(k) — Output array of k elements. + + We binary search for the optimal starting position of a window of size k. The comparison `x - arr[mid] > arr[mid + k] - x` determines if the left boundary or the element just past the right boundary is further from x. This guides us toward the optimal window. + + - approach_name: Two Pointers (Shrinking Window) + is_optimal: false + code: | + def find_closest_elements(arr: list[int], k: int, x: int) -> list[int]: + left, right = 0, len(arr) - 1 + + # Shrink window until it has exactly k elements + while right - left >= k: + # Compare distances of left and right edges to x + if abs(arr[left] - x) > abs(arr[right] - x): + # Left edge is further, exclude it + left += 1 + else: + # Right edge is further (or equal), exclude it + # Equal case: prefer smaller value (left), so exclude right + right -= 1 + + return arr[left:right + 1] + explanation: | + **Time Complexity:** O(n - k) — We shrink the window n - k times. + + **Space Complexity:** O(k) — Output array of k elements. + + Start with the full array and repeatedly remove the element furthest from x until k elements remain. When distances are equal, remove the larger (right) element to satisfy the tie-breaking rule. Simpler to understand than binary search but slower for small k. + + - approach_name: Sort by Distance + is_optimal: false + code: | + def find_closest_elements(arr: list[int], k: int, x: int) -> list[int]: + # Sort by distance to x, then by value for tie-breaking + sorted_arr = sorted(arr, key=lambda a: (abs(a - x), a)) + + # Take k closest and sort by value for output + result = sorted(sorted_arr[:k]) + + return result + explanation: | + **Time Complexity:** O(n log n) — Sorting dominates. + + **Space Complexity:** O(n) — Sorted copy of the array. + + Sort all elements by their distance to x (with value as tie-breaker), take the first k, then sort again by value for the output. This ignores the fact that the input is already sorted, making it less efficient than the binary search approach. diff --git a/backend/data/questions/find-median-from-data-stream.yaml b/backend/data/questions/find-median-from-data-stream.yaml new file mode 100644 index 0000000..3f3ec61 --- /dev/null +++ b/backend/data/questions/find-median-from-data-stream.yaml @@ -0,0 +1,200 @@ +title: Find Median from Data Stream +slug: find-median-from-data-stream +difficulty: hard +leetcode_id: 295 +leetcode_url: https://leetcode.com/problems/find-median-from-data-stream/ +categories: + - heap + - sorting +patterns: + - heap + +description: | + The **median** is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values. + + - For example, for `arr = [2, 3, 4]`, the median is `3`. + - For example, for `arr = [2, 3]`, the median is `(2 + 3) / 2 = 2.5`. + + Implement the `MedianFinder` class: + + - `MedianFinder()` initialises the `MedianFinder` object. + - `void addNum(int num)` adds the integer `num` from the data stream to the data structure. + - `double findMedian()` returns the median of all elements so far. Answers within `10^-5` of the actual answer will be accepted. + +constraints: | + - `-10^5 <= num <= 10^5` + - There will be at least one element in the data structure before calling `findMedian`. + - At most `5 * 10^4` calls will be made to `addNum` and `findMedian`. + +examples: + - input: | + ["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"] + [[], [1], [2], [], [3], []] + output: "[null, null, null, 1.5, null, 2.0]" + explanation: | + MedianFinder medianFinder = new MedianFinder(); + medianFinder.addNum(1); // arr = [1] + medianFinder.addNum(2); // arr = [1, 2] + medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2) + medianFinder.addNum(3); // arr = [1, 2, 3] + medianFinder.findMedian(); // return 2.0 + +explanation: + intuition: | + Imagine you're watching numbers flow by on a conveyor belt, and at any moment someone might ask: "What's the median of all numbers you've seen so far?" + + The naive approach would be to keep a sorted list and insert each new number in its correct position. But insertion into a sorted list takes O(n) time, which becomes too slow with many operations. + + Here's the key insight: **you don't need the entire sorted list to find the median**. You only need quick access to the middle element(s). Think of splitting the numbers into two halves: + + - The **smaller half** — all numbers less than or equal to the median + - The **larger half** — all numbers greater than or equal to the median + + If you had instant access to the **maximum of the smaller half** and the **minimum of the larger half**, you could compute the median immediately. This is exactly what two heaps provide: + + - A **max-heap** for the smaller half (gives you the largest of the small numbers) + - A **min-heap** for the larger half (gives you the smallest of the large numbers) + + By keeping these heaps balanced (differing in size by at most 1), the median is always at the top of one or both heaps. + + approach: | + We solve this using the **Two Heaps** pattern: + + **Step 1: Initialise two heaps** + + - `max_heap`: A max-heap to store the smaller half of numbers (in Python, we negate values since `heapq` is a min-heap) + - `min_heap`: A min-heap to store the larger half of numbers + - We maintain the invariant: `len(max_heap) >= len(min_heap)` and they differ by at most 1 + +   + + **Step 2: Adding a number** + + - First, add the new number to `max_heap` (the smaller half) + - Then, move the largest from `max_heap` to `min_heap` to ensure all elements in `max_heap` are smaller than those in `min_heap` + - If `min_heap` becomes larger than `max_heap`, move one element back to balance + + This "add-then-balance" approach ensures both heaps stay balanced and maintain the correct ordering. + +   + + **Step 3: Finding the median** + + - If total count is odd: the median is the top of `max_heap` (the larger heap) + - If total count is even: the median is the average of both heap tops + +   + + This approach guarantees O(log n) insertion and O(1) median retrieval. + + common_pitfalls: + - title: Sorted List Insertion Trap + description: | + A tempting first approach is to maintain a sorted list using binary search insertion: + - Use `bisect.insort()` to insert each number in O(log n) search time + - But the actual insertion into the list still takes O(n) time due to shifting elements + + With up to `5 * 10^4` operations, this O(n) insertion leads to O(n^2) total time, which may cause TLE. + wrong_approach: "Sorted list with binary search insertion" + correct_approach: "Two heaps for O(log n) insertion" + + - title: Single Heap Mistake + description: | + You might think one heap is enough — just keep all elements and find the middle. But heaps only give you efficient access to one extreme (min or max), not the middle. + + Finding the median in a single heap requires removing half the elements, which is O(n log n) per query. + wrong_approach: "Single heap with repeated extraction" + correct_approach: "Two heaps splitting at the median" + + - title: Heap Imbalance + description: | + If the heaps become unbalanced (size difference > 1), the median calculation breaks. For example, if `max_heap` has 5 elements and `min_heap` has 2, the top of `max_heap` is not the median. + + Always rebalance after each insertion to maintain the invariant: `0 <= len(max_heap) - len(min_heap) <= 1`. + wrong_approach: "Inserting without rebalancing" + correct_approach: "Rebalance heaps after every insertion" + + - title: Python Heap Negation + description: | + Python's `heapq` module only provides a min-heap. To simulate a max-heap, you must negate values when pushing and negate again when popping. + + Forgetting to negate leads to incorrect ordering — you'd get the minimum of the smaller half instead of the maximum. + wrong_approach: "Using heapq as max-heap without negation" + correct_approach: "Negate values: push -x, pop and negate result" + + key_takeaways: + - "**Two Heaps pattern**: Split data at the median using a max-heap for the lower half and min-heap for the upper half" + - "**Streaming data structure**: This design handles continuous data with O(log n) updates and O(1) queries" + - "**Heap balancing invariant**: Keep heap sizes within 1 of each other to ensure the median is always accessible at the tops" + - "**Foundation for variations**: This technique extends to finding other percentiles or handling weighted medians" + + time_complexity: "O(log n) per `addNum` call due to heap insertion and rebalancing. O(1) per `findMedian` call since we only access heap tops." + space_complexity: "O(n) where n is the total number of elements added, as all elements are stored across the two heaps." + +solutions: + - approach_name: Two Heaps + is_optimal: true + code: | + import heapq + + class MedianFinder: + def __init__(self): + # Max-heap for smaller half (store negated values) + self.max_heap = [] + # Min-heap for larger half + self.min_heap = [] + + def addNum(self, num: int) -> None: + # Always add to max_heap first (negate for max-heap behaviour) + heapq.heappush(self.max_heap, -num) + + # Move largest from max_heap to min_heap + # This ensures max_heap elements <= min_heap elements + heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap)) + + # Rebalance: max_heap should have equal or one more element + if len(self.min_heap) > len(self.max_heap): + heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap)) + + def findMedian(self) -> float: + # Odd total: median is top of max_heap + if len(self.max_heap) > len(self.min_heap): + return -self.max_heap[0] + # Even total: average of both tops + return (-self.max_heap[0] + self.min_heap[0]) / 2 + explanation: | + **Time Complexity:** O(log n) for `addNum` — each heap operation is O(log n). O(1) for `findMedian` — just accessing heap tops. + + **Space Complexity:** O(n) — storing all n elements across two heaps. + + We maintain two heaps that split the data at the median. The max-heap holds the smaller half, the min-heap holds the larger half. After each insertion, we rebalance to keep sizes within 1. The median is always accessible at the top(s) of the heaps. + + - approach_name: Sorted List with Binary Search + is_optimal: false + code: | + import bisect + + class MedianFinder: + def __init__(self): + # Maintain a sorted list of all numbers + self.nums = [] + + def addNum(self, num: int) -> None: + # Binary search to find insertion point: O(log n) + # But actual insertion shifts elements: O(n) + bisect.insort(self.nums, num) + + def findMedian(self) -> float: + n = len(self.nums) + mid = n // 2 + # Odd length: return middle element + if n % 2 == 1: + return self.nums[mid] + # Even length: return average of two middle elements + return (self.nums[mid - 1] + self.nums[mid]) / 2 + explanation: | + **Time Complexity:** O(n) for `addNum` — binary search is O(log n) but list insertion is O(n). O(1) for `findMedian` — direct index access. + + **Space Complexity:** O(n) — storing all n elements in a list. + + This approach maintains a sorted list. While conceptually simple and gives O(1) median lookup, the O(n) insertion time makes it impractical for large inputs. It's included to illustrate why heaps are necessary. diff --git a/backend/data/questions/find-minimum-in-rotated-sorted-array.yaml b/backend/data/questions/find-minimum-in-rotated-sorted-array.yaml new file mode 100644 index 0000000..4b8b822 --- /dev/null +++ b/backend/data/questions/find-minimum-in-rotated-sorted-array.yaml @@ -0,0 +1,164 @@ +title: Find Minimum in Rotated Sorted Array +slug: find-minimum-in-rotated-sorted-array +difficulty: medium +leetcode_id: 153 +leetcode_url: https://leetcode.com/problems/find-minimum-in-rotated-sorted-array/ +categories: + - arrays + - binary-search +patterns: + - binary-search + +description: | + Suppose an array of length `n` sorted in ascending order is **rotated** between `1` and `n` times. For example, the array `nums = [0,1,2,4,5,6,7]` might become: + + - `[4,5,6,7,0,1,2]` if it was rotated 4 times + - `[0,1,2,4,5,6,7]` if it was rotated 7 times (back to original) + + Given the sorted rotated array `nums` of **unique** elements, return *the minimum element of this array*. + + You must write an algorithm that runs in **O(log n)** time. + +constraints: | + - `n == nums.length` + - `1 <= n <= 5000` + - `-5000 <= nums[i] <= 5000` + - All the integers of `nums` are **unique** + - `nums` is sorted and rotated between 1 and n times + +examples: + - input: "nums = [3,4,5,1,2]" + output: "1" + explanation: "Original array was [1,2,3,4,5] rotated 3 times." + - input: "nums = [4,5,6,7,0,1,2]" + output: "0" + explanation: "Original array was [0,1,2,4,5,6,7] rotated 4 times." + - input: "nums = [11,13,15,17]" + output: "11" + explanation: "Array was rotated 4 times (full rotation), so minimum is first element." + +explanation: + intuition: | + Visualise a rotated sorted array: it's like taking a sorted array, cutting it somewhere, and swapping the two pieces. This creates a **pivot point** — the place where the large values suddenly drop to small values. + + For example, in `[4,5,6,7,0,1,2]`, the pivot is between 7 and 0. The minimum element is always at this pivot point! + + Think of it like this: the array has two sorted "halves". One half has larger values, the other has smaller values. The minimum is the first element of the smaller half. + + How do we find it with binary search? Compare `nums[mid]` with `nums[right]`: + - If `nums[mid] > nums[right]`: We're in the "larger" half. The pivot (minimum) must be to the right. + - If `nums[mid] <= nums[right]`: We're in the "smaller" half or at the minimum. The pivot is at `mid` or to the left. + + Why compare with `right` instead of `left`? Because comparing with `right` consistently tells us which "half" we're in, regardless of how much the array was rotated. + + approach: | + We solve this using **Modified Binary Search**: + + **Step 1: Initialise pointers** + + - `left = 0`, `right = len(nums) - 1` + - The minimum must be somewhere in `[left, right]` + +   + + **Step 2: Binary search with right comparison** + + - While `left < right`: + - Calculate `mid = left + (right - left) // 2` + - If `nums[mid] > nums[right]`: + - The pivot (minimum) is in the right half + - Set `left = mid + 1` (exclude mid — it's too large) + - Else (`nums[mid] <= nums[right]`): + - The pivot is at `mid` or in the left half + - Set `right = mid` (keep mid in consideration) + +   + + **Step 3: Return the minimum** + + - When `left == right`, we've found the minimum + - Return `nums[left]` + +   + + This works because we're essentially searching for the "boundary" where the array transitions from large values to small values. + + common_pitfalls: + - title: Comparing with Left Instead of Right + description: | + Comparing `nums[mid]` with `nums[left]` doesn't work consistently. Consider `[2, 1]`: + - `mid = 0`, `nums[mid] = 2`, `nums[left] = 2` + - `nums[mid] > nums[left]` is false, but the minimum is on the right! + + Comparing with `nums[right]` works because the right element is always either in the "smaller half" (after pivot) or the array isn't rotated. + wrong_approach: "if nums[mid] > nums[left]: search right" + correct_approach: "if nums[mid] > nums[right]: search right" + + - title: Using left <= right Loop Condition + description: | + For this problem, use `while left < right`. When `left == right`, we've found the answer. Using `<=` can cause infinite loops because we're not always excluding `mid`. + wrong_approach: "while left <= right" + correct_approach: "while left < right" + + - title: Excluding mid Incorrectly + description: | + When `nums[mid] <= nums[right]`, `mid` could be the minimum! We must keep it in consideration by setting `right = mid`, not `right = mid - 1`. + + When `nums[mid] > nums[right]`, we know `mid` is definitely not the minimum (it's larger than something to its right), so `left = mid + 1` is safe. + wrong_approach: "right = mid - 1 when nums[mid] <= nums[right]" + correct_approach: "right = mid (keep mid as a candidate)" + + key_takeaways: + - "**Binary search on rotated arrays**: Compare with the right boundary to determine which half contains the answer" + - "**Understanding the structure**: A rotated sorted array has two sorted segments — find the boundary between them" + - "**Careful with boundary updates**: `mid + 1` vs `mid` depends on whether mid can be the answer" + - "**Foundation for harder problems**: This technique extends to searching for any element in rotated arrays" + + time_complexity: "O(log n). Each iteration halves the search space." + space_complexity: "O(1). Only a constant number of variables are used." + +solutions: + - approach_name: Binary Search + is_optimal: true + code: | + def find_min(nums: list[int]) -> int: + left, right = 0, len(nums) - 1 + + while left < right: + mid = left + (right - left) // 2 + + if nums[mid] > nums[right]: + # Mid is in the "larger" half + # Minimum must be to the right of mid + left = mid + 1 + else: + # Mid is in the "smaller" half (or at the minimum) + # Minimum is at mid or to the left + right = mid + + # left == right, pointing to the minimum + return nums[left] + explanation: | + **Time Complexity:** O(log n) — Search space halves each iteration. + + **Space Complexity:** O(1) — Constant extra space. + + We compare `nums[mid]` with `nums[right]` to determine which half contains the minimum. If `mid > right`, the pivot is on the right. Otherwise, it's at `mid` or on the left. The loop converges to the exact position of the minimum. + + - approach_name: Linear Scan + is_optimal: false + code: | + def find_min(nums: list[int]) -> int: + # Find where sorted order breaks + for i in range(1, len(nums)): + if nums[i] < nums[i - 1]: + return nums[i] + + # No break found — array wasn't rotated (or rotated fully) + return nums[0] + explanation: | + **Time Complexity:** O(n) — Scans through the array. + + **Space Complexity:** O(1) — Constant extra space. + + Find the point where the sorted order breaks (current element less than previous). The element at that point is the minimum. If no break is found, the array wasn't rotated, so return the first element. This doesn't meet the O(log n) requirement but is useful for understanding the problem. diff --git a/backend/data/questions/find-the-duplicate-number.yaml b/backend/data/questions/find-the-duplicate-number.yaml new file mode 100644 index 0000000..8da3dad --- /dev/null +++ b/backend/data/questions/find-the-duplicate-number.yaml @@ -0,0 +1,226 @@ +title: Find the Duplicate Number +slug: find-the-duplicate-number +difficulty: medium +leetcode_id: 287 +leetcode_url: https://leetcode.com/problems/find-the-duplicate-number/ +categories: + - arrays + - two-pointers +patterns: + - fast-slow-pointers + - binary-search + +description: | + Given an array of integers `nums` containing `n + 1` integers where each integer is in the range `[1, n]` inclusive. + + There is only **one repeated number** in `nums`, return *this repeated number*. + + You must solve the problem **without** modifying the array `nums` and using only constant extra space. + +constraints: | + - `1 <= n <= 10^5` + - `nums.length == n + 1` + - `1 <= nums[i] <= n` + - All the integers in `nums` appear only **once** except for **precisely one integer** which appears **two or more** times + +examples: + - input: "nums = [1,3,4,2,2]" + output: "2" + explanation: "The number 2 appears twice in the array." + - input: "nums = [3,1,3,4,2]" + output: "3" + explanation: "The number 3 appears twice in the array." + - input: "nums = [3,3,3,3,3]" + output: "3" + explanation: "The number 3 appears five times in the array." + +explanation: + intuition: | + This problem has a beautiful constraint: the array has `n + 1` elements but values are only in the range `[1, n]`. By the **Pigeonhole Principle**, at least one value must repeat. + + The key insight is to view the array as a **linked list** where each value points to the next index. Since values are in `[1, n]` and we have indices `[0, n]`, treating `nums[i]` as "next pointer" creates a valid linked structure. + + Think of it like this: if we start at index `0` and repeatedly jump to `nums[current_index]`, we create a sequence. Because one number repeats, two different indices point to the same location — this creates a **cycle**! The duplicate number is the entry point of this cycle. + + For example, with `nums = [1,3,4,2,2]`: + - Index 0 → value 1 → jump to index 1 + - Index 1 → value 3 → jump to index 3 + - Index 3 → value 2 → jump to index 2 + - Index 2 → value 4 → jump to index 4 + - Index 4 → value 2 → jump to index 2 (cycle!) + + The cycle exists because both index 3 and index 4 have value `2`. Floyd's Tortoise and Hare algorithm finds exactly where this cycle begins. + + approach: | + We solve this using **Floyd's Cycle Detection** (Tortoise and Hare): + + **Step 1: Detect the cycle** + + - `slow`: Moves one step at a time (`slow = nums[slow]`) + - `fast`: Moves two steps at a time (`fast = nums[nums[fast]]`) + - Both start at index `0` + - Keep moving until they meet — this proves a cycle exists + +   + + **Step 2: Find the cycle entrance** + + - Reset `slow` to index `0`, keep `fast` at the meeting point + - Move both pointers one step at a time + - The point where they meet again is the duplicate number + +   + + **Why does this work?** + + Let's say the distance from start to cycle entrance is `F`, and the cycle length is `C`. When slow and fast first meet: + - Slow has traveled `F + a` steps (where `a` is distance into the cycle) + - Fast has traveled `2(F + a)` steps + - Since fast is in the cycle: `2(F + a) - (F + a) = C`, so `F + a = C` + + This means `F = C - a`. When we reset slow to start and both move at the same speed, slow travels `F` steps to reach the entrance, while fast travels `F = C - a` steps from its position `a` into the cycle — also reaching the entrance! + +   + + **Step 3: Return the result** + + - The meeting point in phase 2 is the duplicate value + + common_pitfalls: + - title: Using Extra Space + description: | + A common first instinct is to use a hash set to track seen numbers: + + ```python + seen = set() + for num in nums: + if num in seen: + return num + seen.add(num) + ``` + + While this works and runs in O(n) time, it uses O(n) space. The problem explicitly requires **O(1) space**, so this approach violates the constraints. + wrong_approach: "Hash set to track seen numbers" + correct_approach: "Floyd's cycle detection using the array itself" + + - title: Modifying the Array + description: | + Another tempting approach is to mark visited indices by negating values: + + ```python + for num in nums: + idx = abs(num) + if nums[idx] < 0: + return idx + nums[idx] = -nums[idx] + ``` + + This is O(n) time and O(1) space, but it **modifies the input array**, which the problem forbids. The cycle detection approach leaves the array untouched. + wrong_approach: "Negating values to mark as visited" + correct_approach: "Read-only traversal with two pointers" + + - title: Sorting the Array + description: | + Sorting and finding adjacent duplicates is intuitive but has two problems: + - It modifies the array (or requires O(n) space for a copy) + - It's O(n log n) time, not optimal + + The cycle detection method achieves O(n) time with O(1) space without modification. + wrong_approach: "Sort and find adjacent duplicates" + correct_approach: "Floyd's algorithm for O(n) time, O(1) space" + + - title: Confusing Index with Value + description: | + In Floyd's algorithm, we treat values as pointers to indices. A common mistake is confusing when to use the value versus the index. + + Remember: `slow = nums[slow]` means "jump to the index that equals the current value." The duplicate is a **value**, not an index — it's what gets returned after phase 2. + + key_takeaways: + - "**Cycle detection pattern**: When array values can be treated as pointers (value in valid index range), consider Floyd's algorithm" + - "**Pigeonhole Principle**: With `n + 1` items in `n` slots, at least one slot must have multiple items — guaranteeing a duplicate exists" + - "**Creative problem reframing**: Transforming an array duplicate problem into a linked list cycle problem unlocks an elegant O(1) space solution" + - "**Two-phase approach**: First detect *that* a cycle exists (fast catches slow), then find *where* it starts (both at same speed)" + + time_complexity: "O(n). Each pointer traverses at most O(n) steps in both phases." + space_complexity: "O(1). Only two pointer variables are used, regardless of input size." + +solutions: + - approach_name: Floyd's Cycle Detection + is_optimal: true + code: | + def find_duplicate(nums: list[int]) -> int: + # Phase 1: Find the intersection point in the cycle + slow = nums[0] + fast = nums[0] + + # Move slow by 1, fast by 2 until they meet + while True: + slow = nums[slow] # One step + fast = nums[nums[fast]] # Two steps + if slow == fast: + break + + # Phase 2: Find the entrance to the cycle (the duplicate) + slow = nums[0] # Reset slow to start + + # Move both at same speed until they meet at cycle entrance + while slow != fast: + slow = nums[slow] + fast = nums[fast] + + # The meeting point is the duplicate number + return slow + explanation: | + **Time Complexity:** O(n) — Each pointer visits at most n nodes in each phase. + + **Space Complexity:** O(1) — Only two pointer variables used. + + By treating array values as "next pointers," we transform this into a cycle detection problem. The duplicate causes a cycle because two indices point to the same value. Floyd's algorithm finds the cycle entrance in linear time with constant space. + + - approach_name: Binary Search on Value Range + is_optimal: false + code: | + def find_duplicate(nums: list[int]) -> int: + # Search the value range [1, n], not the array indices + low, high = 1, len(nums) - 1 + + while low < high: + mid = (low + high) // 2 + + # Count numbers <= mid + count = sum(1 for num in nums if num <= mid) + + # If count > mid, duplicate is in [low, mid] + # Otherwise, duplicate is in [mid+1, high] + if count > mid: + high = mid + else: + low = mid + 1 + + return low + explanation: | + **Time Complexity:** O(n log n) — Binary search over n values, each iteration scans n elements. + + **Space Complexity:** O(1) — Only a few variables used. + + This approach binary searches the *value* range, not the array. If there are more than `mid` numbers in `[1, mid]`, the duplicate must be in that range (Pigeonhole Principle). While not optimal, this demonstrates binary search on answer space rather than on array indices. + + - approach_name: Hash Set + is_optimal: false + code: | + def find_duplicate(nums: list[int]) -> int: + seen = set() + + for num in nums: + # If we've seen this number before, it's the duplicate + if num in seen: + return num + seen.add(num) + + return -1 # Should never reach here given constraints + explanation: | + **Time Complexity:** O(n) — Single pass through the array. + + **Space Complexity:** O(n) — Hash set stores up to n elements. + + The most intuitive approach: track seen numbers and return when we find a repeat. While this violates the O(1) space constraint, it's included to show the trade-off between space and algorithmic complexity. Understanding why this isn't acceptable motivates learning Floyd's algorithm. diff --git a/backend/data/questions/find-the-town-judge.yaml b/backend/data/questions/find-the-town-judge.yaml new file mode 100644 index 0000000..d88744d --- /dev/null +++ b/backend/data/questions/find-the-town-judge.yaml @@ -0,0 +1,180 @@ +title: Find the Town Judge +slug: find-the-town-judge +difficulty: easy +leetcode_id: 997 +leetcode_url: https://leetcode.com/problems/find-the-town-judge/ +categories: + - arrays + - graphs + - hash-tables +patterns: + - greedy + +description: | + In a town, there are `n` people labeled from `1` to `n`. There is a rumor that one of these people is secretly the town judge. + + If the town judge exists, then: + + 1. The town judge trusts nobody. + 2. Everybody (except for the town judge) trusts the town judge. + 3. There is exactly one person that satisfies properties **1** and **2**. + + You are given an array `trust` where `trust[i] = [a_i, b_i]` representing that the person labeled `a_i` trusts the person labeled `b_i`. If a trust relationship does not exist in the `trust` array, then such a trust relationship does not exist. + + Return *the label of the town judge if the town judge exists and can be identified, or return* `-1` *otherwise*. + +constraints: | + - `1 <= n <= 1000` + - `0 <= trust.length <= 10^4` + - `trust[i].length == 2` + - All the pairs of `trust` are **unique** + - `a_i != b_i` + - `1 <= a_i, b_i <= n` + +examples: + - input: "n = 2, trust = [[1,2]]" + output: "2" + explanation: "Person 1 trusts person 2, but person 2 trusts no one. Since person 2 is trusted by everyone else (just person 1) and trusts nobody, person 2 is the town judge." + - input: "n = 3, trust = [[1,3],[2,3]]" + output: "3" + explanation: "Both person 1 and person 2 trust person 3, while person 3 trusts nobody. Person 3 satisfies both conditions." + - input: "n = 3, trust = [[1,3],[2,3],[3,1]]" + output: "-1" + explanation: "Person 3 is trusted by everyone else, but person 3 also trusts person 1. Since the town judge must trust nobody, there is no valid town judge." + +explanation: + intuition: | + Think of the trust relationships as a directed graph where each person is a node, and an edge from `a` to `b` means "person `a` trusts person `b`." + + The town judge has a very specific signature in this graph: + - **Zero outgoing edges**: They trust nobody + - **Exactly `n-1` incoming edges**: Everyone else trusts them + + Imagine you're counting votes at a town meeting. Each trust relationship is like a vote of confidence. The judge receives votes from everyone but casts no votes themselves. If we track the "net trust score" for each person (votes received minus votes cast), the judge would have a score of exactly `n-1`. + + This insight transforms a graph problem into a simple counting problem: instead of building complex data structures, we just need to track a single number for each person. + + approach: | + We solve this using a **Trust Score Approach**: + + **Step 1: Initialise a trust score array** + + - Create an array `trust_score` of size `n+1` (using 1-based indexing to match person labels) + - Each position starts at `0`, representing the net trust balance for that person + +   + + **Step 2: Process each trust relationship** + + - For each `[a, b]` pair in the `trust` array: + - Decrement `trust_score[a]` by 1 (person `a` trusts someone, so they lose a point) + - Increment `trust_score[b]` by 1 (person `b` is trusted, so they gain a point) + +   + + **Step 3: Find the town judge** + + - Iterate through people `1` to `n` + - The town judge is the person with `trust_score[i] == n - 1` + - This means they received `n-1` trust votes and cast 0 votes themselves + +   + + **Step 4: Return the result** + + - If found, return the judge's label + - If no one has a trust score of `n-1`, return `-1` + +   + + This approach works because the trust score naturally captures both conditions: trusting nobody (no deductions) and being trusted by everyone else (exactly `n-1` additions). + + common_pitfalls: + - title: Using Two Separate Arrays + description: | + A common approach is to maintain two arrays: one for "trusts count" (outgoing edges) and one for "trusted by count" (incoming edges). Then checking if `trusted_by[i] == n-1` and `trusts[i] == 0`. + + While correct, this uses unnecessary space. The single trust score approach combines both conditions into one value, halving space usage and simplifying the logic. + wrong_approach: "Two arrays tracking incoming and outgoing separately" + correct_approach: "Single array with net trust score (incoming - outgoing)" + + - title: Forgetting the Single Person Case + description: | + When `n = 1` and `trust = []`, the single person is the town judge by definition. They trust nobody (vacuously true since there's no one to trust) and are trusted by everyone else (vacuously true since there's no one else). + + The trust score approach handles this naturally: person 1 has a score of `0`, and we need `n - 1 = 0`, so they qualify as the judge. + wrong_approach: "Special-casing n=1 with extra conditionals" + correct_approach: "Let the algorithm handle it naturally" + + - title: Using 0-Based Indexing Incorrectly + description: | + People are labeled from `1` to `n`, not `0` to `n-1`. Using a size-`n` array with 0-based indexing requires translating indices, which is error-prone. + + Using a size `n+1` array and ignoring index 0 keeps the code simple and matches the problem's labeling directly. + wrong_approach: "Size-n array with index translation" + correct_approach: "Size-(n+1) array with 1-based indexing" + + key_takeaways: + - "**Graph degree insight**: In directed graphs, problems about nodes with specific in-degree and out-degree can often be solved by tracking net degree (in - out)" + - "**Space optimisation**: When tracking two related quantities (trusts vs trusted-by), consider if a single combined metric suffices" + - "**Constraint-driven design**: The judge's unique property (trusted by `n-1`, trusts `0`) translates directly to a net score of `n-1`" + - "**Foundation for graph problems**: This in-degree/out-degree counting technique appears in problems like finding celebrities, detecting cycles, and topological sorting" + + time_complexity: "O(n + t) where `t` is the length of the trust array. We initialise an array of size `n` and iterate through all `t` trust relationships, then check `n` people." + space_complexity: "O(n). We use a single array of size `n+1` to store the trust score for each person." + +solutions: + - approach_name: Trust Score + is_optimal: true + code: | + def find_judge(n: int, trust: list[list[int]]) -> int: + # Use n+1 size for 1-based indexing (people labeled 1 to n) + trust_score = [0] * (n + 1) + + # Process each trust relationship + for a, b in trust: + # Person a trusts someone, so they can't be the judge + trust_score[a] -= 1 + # Person b is trusted, gaining one vote + trust_score[b] += 1 + + # Find the person with trust score of n-1 + # This means: trusted by n-1 people, trusts nobody + for i in range(1, n + 1): + if trust_score[i] == n - 1: + return i + + # No valid town judge found + return -1 + explanation: | + **Time Complexity:** O(n + t) where t is the number of trust relationships. We process each relationship once and scan through n people. + + **Space Complexity:** O(n) for the trust score array. + + The key insight is that a net trust score of `n-1` uniquely identifies the judge: they received votes from all `n-1` other people (contributing +n-1) and cast no votes themselves (contributing 0). + + - approach_name: Two Arrays (In-degree and Out-degree) + is_optimal: false + code: | + def find_judge(n: int, trust: list[list[int]]) -> int: + # Track how many people each person trusts (out-degree) + trusts_count = [0] * (n + 1) + # Track how many people trust each person (in-degree) + trusted_by_count = [0] * (n + 1) + + for a, b in trust: + trusts_count[a] += 1 + trusted_by_count[b] += 1 + + # Judge trusts nobody (out-degree = 0) and is trusted by all others (in-degree = n-1) + for i in range(1, n + 1): + if trusts_count[i] == 0 and trusted_by_count[i] == n - 1: + return i + + return -1 + explanation: | + **Time Complexity:** O(n + t) where t is the number of trust relationships. + + **Space Complexity:** O(n) but uses two arrays instead of one. + + This approach explicitly tracks in-degree and out-degree separately, making the logic clearer but using twice the space. The optimal solution combines these into a single net score. diff --git a/backend/data/questions/first-missing-positive.yaml b/backend/data/questions/first-missing-positive.yaml new file mode 100644 index 0000000..9ab24d4 --- /dev/null +++ b/backend/data/questions/first-missing-positive.yaml @@ -0,0 +1,212 @@ +title: First Missing Positive +slug: first-missing-positive +difficulty: hard +leetcode_id: 41 +leetcode_url: https://leetcode.com/problems/first-missing-positive/ +categories: + - arrays + - hash-tables +patterns: + - matrix-traversal + +description: | + Given an unsorted integer array `nums`, return the *smallest positive integer* that is *not present* in `nums`. + + You must implement an algorithm that runs in `O(n)` time and uses `O(1)` auxiliary space. + +constraints: | + - `1 <= nums.length <= 10^5` + - `-2^31 <= nums[i] <= 2^31 - 1` + +examples: + - input: "nums = [1,2,0]" + output: "3" + explanation: "The numbers in the range [1,2] are all in the array." + - input: "nums = [3,4,-1,1]" + output: "2" + explanation: "1 is in the array but 2 is missing." + - input: "nums = [7,8,9,11,12]" + output: "1" + explanation: "The smallest positive integer 1 is missing." + +explanation: + intuition: | + At first glance, this problem seems straightforward — just find the smallest positive integer not in the array. But the real challenge lies in the **O(n) time and O(1) space** constraints. These constraints rule out sorting (O(n log n)) and hash sets (O(n) space). + + The key insight is to **use the array itself as a hash table**. Think of it like assigning seats in a row: if you have `n` seats numbered 1 through `n`, you want each person with ticket number `i` to sit in seat `i`. After everyone is seated, you walk through the row and find the first empty seat — that's your answer. + + Why does this work? The first missing positive must be in the range `[1, n+1]` where `n` is the array length. If all numbers 1 through `n` are present, the answer is `n+1`. Otherwise, some number in `[1, n]` is missing, and we want the smallest one. + + By placing each value `x` at index `x-1` (so value `1` goes to index `0`, value `2` goes to index `1`, etc.), we transform the array into a lookup table. Then a single scan reveals the first position where the value doesn't match its expected index. + + approach: | + We solve this using **Cyclic Sort** (in-place rearrangement): + + **Step 1: Rearrange the array** + + - Iterate through each position in the array + - For each element `nums[i]`, if it's a positive integer in the range `[1, n]` and not already in its correct position, swap it to where it belongs + - Continue swapping at the current position until the element there is either out of range or already correct + - This ensures each valid value ends up at index `value - 1` + +   + + **Step 2: Find the first missing positive** + + - Scan through the rearranged array + - The first index `i` where `nums[i] != i + 1` indicates that `i + 1` is missing + - Return `i + 1` as the answer + +   + + **Step 3: Handle the all-present case** + + - If all positions contain their expected values (1, 2, 3, ..., n), the answer is `n + 1` + +   + + The cyclic sort approach works because we're essentially building a perfect hash function: value `x` maps to index `x - 1`. By rearranging in-place, we use constant extra space while achieving linear time. + + common_pitfalls: + - title: Using a Hash Set + description: | + The most natural approach is to use a hash set to store all positive numbers, then iterate from 1 upward to find the first missing: + + ```python + seen = set(nums) + for i in range(1, len(nums) + 2): + if i not in seen: + return i + ``` + + While this is O(n) time, it uses **O(n) space** for the hash set, violating the space constraint. The problem explicitly requires O(1) auxiliary space. + wrong_approach: "Hash set for O(1) lookup" + correct_approach: "Use the array itself as a hash table via cyclic sort" + + - title: Sorting the Array + description: | + Another tempting approach is to sort the array first, then scan for the first missing positive: + + ```python + nums.sort() + # Find first missing... + ``` + + Sorting takes **O(n log n)** time, which violates the O(n) time constraint. Even if you're okay with that, this approach still requires careful handling of duplicates and negatives. + wrong_approach: "Sort first, then scan" + correct_approach: "Cyclic sort achieves O(n) time" + + - title: Infinite Loop During Swapping + description: | + When implementing the swap logic, you must check if the target position already contains the correct value: + + ```python + # Wrong: may infinite loop if duplicates exist + while 1 <= nums[i] <= n: + swap(nums[i], nums[nums[i] - 1]) + + # Correct: stop if already in place or duplicate + while 1 <= nums[i] <= n and nums[i] != nums[nums[i] - 1]: + swap(...) + ``` + + Without the second condition, swapping identical values creates an infinite loop. + wrong_approach: "Only check range bounds" + correct_approach: "Also check if target position already has the correct value" + + - title: Forgetting the n+1 Case + description: | + If the array contains exactly [1, 2, 3, ..., n], then no number in the array is missing — the answer is `n + 1`. Make sure your final scan handles this edge case, typically by returning `n + 1` if the entire array is correctly positioned. + wrong_approach: "Only scan the array without a fallback" + correct_approach: "Return n + 1 if all positions are correct" + + key_takeaways: + - "**Cyclic sort pattern**: When values have a natural position (like 1 to n mapping to indices 0 to n-1), consider rearranging the array in-place" + - "**Array as hash table**: The array itself can serve as a constant-space lookup structure when the value range is bounded" + - "**Constraint-driven design**: The O(1) space requirement is the key hint that we must modify the input array rather than use auxiliary data structures" + - "**Related problems**: This technique applies to finding duplicates, missing numbers, and other permutation-based problems" + + time_complexity: "O(n). Each element is swapped at most once to its correct position, and we make two linear passes through the array." + space_complexity: "O(1). We only use a constant number of variables; all rearrangement happens in-place." + +solutions: + - approach_name: Cyclic Sort + is_optimal: true + code: | + def first_missing_positive(nums: list[int]) -> int: + n = len(nums) + + # Phase 1: Place each value at its correct index + # Value x should be at index x-1 + for i in range(n): + # Keep swapping until current element is in place or invalid + while 1 <= nums[i] <= n and nums[i] != nums[nums[i] - 1]: + # Swap nums[i] to its correct position + correct_idx = nums[i] - 1 + nums[i], nums[correct_idx] = nums[correct_idx], nums[i] + + # Phase 2: Find first position where value doesn't match index + 1 + for i in range(n): + if nums[i] != i + 1: + return i + 1 + + # All values 1 to n are present, so answer is n + 1 + return n + 1 + explanation: | + **Time Complexity:** O(n) — Although there's a nested while loop, each element is moved at most once to its final position, giving O(n) total swaps. + + **Space Complexity:** O(1) — Only a few variables are used; the array is modified in-place. + + The algorithm works in two phases: first, we rearrange the array so that value `i` sits at index `i-1`. Then we scan to find the first mismatch. This clever use of the input array as a hash table satisfies both the time and space constraints. + + - approach_name: Hash Set + is_optimal: false + code: | + def first_missing_positive(nums: list[int]) -> int: + # Store all positive numbers in a set + num_set = set(nums) + + # Check each positive integer starting from 1 + for i in range(1, len(nums) + 2): + if i not in num_set: + return i + + # This line is never reached given the loop bounds + return len(nums) + 1 + explanation: | + **Time Complexity:** O(n) — Building the set and scanning are both linear. + + **Space Complexity:** O(n) — The hash set stores up to n elements. + + This approach is intuitive and correct, but uses O(n) extra space, violating the problem's constraints. It's included to illustrate the natural solution that the cyclic sort approach improves upon. + + - approach_name: Index Marking + is_optimal: true + code: | + def first_missing_positive(nums: list[int]) -> int: + n = len(nums) + + # Step 1: Replace non-positive and out-of-range values with n+1 + for i in range(n): + if nums[i] <= 0 or nums[i] > n: + nums[i] = n + 1 + + # Step 2: Mark presence by negating values at corresponding indices + for i in range(n): + val = abs(nums[i]) + if val <= n: + # Mark index val-1 as "seen" by making it negative + nums[val - 1] = -abs(nums[val - 1]) + + # Step 3: Find first positive value (indicates missing number) + for i in range(n): + if nums[i] > 0: + return i + 1 + + return n + 1 + explanation: | + **Time Complexity:** O(n) — Three linear passes through the array. + + **Space Complexity:** O(1) — Only modifies the array in-place. + + This alternative approach uses the sign of each element as a flag. After replacing invalid values with `n+1`, we mark the presence of value `x` by negating the element at index `x-1`. Finally, the first positive element indicates the missing number. Both this and cyclic sort are optimal solutions. diff --git a/backend/data/questions/four-sum.yaml b/backend/data/questions/four-sum.yaml new file mode 100644 index 0000000..f885d70 --- /dev/null +++ b/backend/data/questions/four-sum.yaml @@ -0,0 +1,219 @@ +title: 4Sum +slug: four-sum +difficulty: medium +leetcode_id: 18 +leetcode_url: https://leetcode.com/problems/4sum/ +categories: + - arrays + - two-pointers + - sorting +patterns: + - two-pointers + +description: | + Given an array `nums` of `n` integers, return *an array of all the **unique** quadruplets* `[nums[a], nums[b], nums[c], nums[d]]` such that: + + - `0 <= a, b, c, d < n` + - `a`, `b`, `c`, and `d` are **distinct** + - `nums[a] + nums[b] + nums[c] + nums[d] == target` + + You may return the answer in **any order**. + +constraints: | + - `1 <= nums.length <= 200` + - `-10^9 <= nums[i] <= 10^9` + - `-10^9 <= target <= 10^9` + +examples: + - input: "nums = [1,0,-1,0,-2,2], target = 0" + output: "[[-2,-1,1,2],[-2,0,0,2],[-1,0,0,1]]" + explanation: "Three unique quadruplets sum to 0: [-2,-1,1,2], [-2,0,0,2], and [-1,0,0,1]." + - input: "nums = [2,2,2,2,2], target = 8" + output: "[[2,2,2,2]]" + explanation: "The only quadruplet that sums to 8 is [2,2,2,2]." + +explanation: + intuition: | + If you've solved **3Sum**, 4Sum follows the same reduction strategy: fix one element and solve a smaller problem. + + Think of it like peeling an onion. 3Sum reduces to 2Sum by fixing one element. Similarly, **4Sum reduces to 3Sum** by fixing the first element, which then reduces to 2Sum. Each layer peels away one dimension of complexity. + + The key insight is that after sorting, we can use **two nested loops** to fix the first two elements, then apply the familiar two-pointer technique to find the remaining pair. This gives us O(n³) time — the best we can do when there can be O(n³) valid quadruplets. + + Sorting remains essential for two reasons: + 1. **Two pointers work**: Adjusting sum by moving pointers left or right + 2. **Duplicate skipping**: Adjacent duplicates become neighbours we can easily skip + + approach: | + We solve this using **Sort + Two Nested Loops + Two Pointers**: + + **Step 1: Sort the array** + + - Sorting enables two-pointer technique and easy duplicate detection + - Time: O(n log n), dominated by the O(n³) main algorithm + +   + + **Step 2: Fix the first element** + + - For each `i` from 0 to n-4: + - Skip if `nums[i] == nums[i-1]` (avoid duplicate quadruplets) + - **Early termination**: If `nums[i] + nums[i+1] + nums[i+2] + nums[i+3] > target`, break (smallest possible sum exceeds target) + - **Skip if too small**: If `nums[i] + nums[n-3] + nums[n-2] + nums[n-1] < target`, continue (largest possible sum is still less than target) + +   + + **Step 3: Fix the second element** + + - For each `j` from i+1 to n-3: + - Skip if `nums[j] == nums[j-1]` and `j > i + 1` (avoid duplicates) + - Apply similar early termination and skip optimisations + +   + + **Step 4: Two-pointer search for remaining pair** + + - Set `left = j + 1`, `right = n - 1` + - Calculate `total = nums[i] + nums[j] + nums[left] + nums[right]` + - If `total < target`: move `left` right + - If `total > target`: move `right` left + - If `total == target`: found a quadruplet! + - Add to result, skip duplicates, move both pointers + +   + + **Step 5: Return all unique quadruplets** + + Duplicate skipping happens at all four levels: outer loop, second loop, left pointer, and right pointer. + + common_pitfalls: + - title: Integer Overflow + description: | + With constraints `-10^9 <= nums[i] <= 10^9` and `-10^9 <= target <= 10^9`, the sum of four numbers can reach `4 × 10^9`, which **overflows 32-bit integers**. + + In languages like C++ or Java, you must use `long long` or `long` types for the sum calculation. In Python, integers have arbitrary precision, so this isn't an issue — but be aware when porting to other languages. + wrong_approach: "Using int for sum in C++/Java" + correct_approach: "Use long/long long or cast during addition" + + - title: Incorrect Duplicate Skipping at Second Level + description: | + When skipping duplicates for the second element `j`, you must check `j > i + 1` before comparing `nums[j] == nums[j-1]`. + + Without this check, you might skip the very first valid `j` after `i`, missing valid quadruplets. + + Example: `nums = [0,0,0,0]`, `target = 0` — if you skip when `j == i + 1`, you'd incorrectly skip `j = 1` when comparing to `nums[0]`. + wrong_approach: "if nums[j] == nums[j-1]: continue (always)" + correct_approach: "if j > i + 1 and nums[j] == nums[j-1]: continue" + + - title: Missing Early Termination Optimisations + description: | + Unlike 3Sum where you can break when `nums[i] > 0` (since target is 0), 4Sum has a variable target. The optimisations become: + + - **Break** if `nums[i] + nums[i+1] + nums[i+2] + nums[i+3] > target` — smallest sum exceeds target + - **Continue** if `nums[i] + nums[n-3] + nums[n-2] + nums[n-1] < target` — largest sum too small + + Without these, you may TLE on edge cases with skewed distributions. + wrong_approach: "No early termination checks" + correct_approach: "Check minimum and maximum possible sums at each level" + + key_takeaways: + - "**Generalise N-sum**: Fix k-2 elements with nested loops, then apply two pointers — this pattern works for any kSum" + - "**Time complexity is O(n^(k-1))**: For 4Sum, it's O(n³); for kSum in general, O(n^(k-1)) is optimal when there can be that many solutions" + - "**Early termination matters**: Checking minimum and maximum possible sums can dramatically prune the search space" + - "**Duplicate handling at every level**: Each nested loop needs its own duplicate skip logic with the correct boundary check" + + time_complexity: "O(n³). Sorting is O(n log n), then two nested O(n) loops each contain an O(n) two-pointer search." + space_complexity: "O(log n) to O(n). Depends on the sorting algorithm — O(log n) for in-place sorts, O(n) for others. The output is not counted as extra space." + +solutions: + - approach_name: Sort + Two Pointers + is_optimal: true + code: | + def four_sum(nums: list[int], target: int) -> list[list[int]]: + nums.sort() # Enable two pointers and duplicate detection + result = [] + n = len(nums) + + for i in range(n - 3): + # Skip duplicates for first element + if i > 0 and nums[i] == nums[i - 1]: + continue + + # Early termination: smallest possible sum exceeds target + if nums[i] + nums[i + 1] + nums[i + 2] + nums[i + 3] > target: + break + + # Skip: largest possible sum with nums[i] is still too small + if nums[i] + nums[n - 3] + nums[n - 2] + nums[n - 1] < target: + continue + + for j in range(i + 1, n - 2): + # Skip duplicates for second element (note: j > i + 1) + if j > i + 1 and nums[j] == nums[j - 1]: + continue + + # Early termination for inner loop + if nums[i] + nums[j] + nums[j + 1] + nums[j + 2] > target: + break + + # Skip if largest sum with nums[i], nums[j] is too small + if nums[i] + nums[j] + nums[n - 2] + nums[n - 1] < target: + continue + + # Two pointers for remaining pair + left, right = j + 1, n - 1 + + while left < right: + total = nums[i] + nums[j] + nums[left] + nums[right] + + if total < target: + left += 1 + elif total > target: + right -= 1 + else: + # Found a quadruplet + result.append([nums[i], nums[j], nums[left], nums[right]]) + + # Skip duplicates for left pointer + while left < right and nums[left] == nums[left + 1]: + left += 1 + # Skip duplicates for right pointer + while left < right and nums[right] == nums[right - 1]: + right -= 1 + + # Move both pointers + left += 1 + right -= 1 + + return result + explanation: | + **Time Complexity:** O(n³) — O(n log n) sort + two nested O(n) loops with O(n) two-pointer search inside. + + **Space Complexity:** O(log n) to O(n) — Sorting space; output not counted. + + We sort the array, then use two nested loops to fix the first two elements. For each pair, two pointers find the remaining pair that completes the target sum. Early termination and skip optimisations prune many unnecessary iterations. + + - approach_name: Brute Force + is_optimal: false + code: | + def four_sum(nums: list[int], target: int) -> list[list[int]]: + n = len(nums) + result = set() # Use set to avoid duplicates + + # Try all possible quadruplets + for i in range(n): + for j in range(i + 1, n): + for k in range(j + 1, n): + for l in range(k + 1, n): + if nums[i] + nums[j] + nums[k] + nums[l] == target: + # Sort tuple to handle duplicates + quad = tuple(sorted([nums[i], nums[j], nums[k], nums[l]])) + result.add(quad) + + return [list(q) for q in result] + explanation: | + **Time Complexity:** O(n⁴) — Four nested loops checking all combinations. + + **Space Complexity:** O(k) — Where k is the number of unique quadruplets stored in the set. + + This naive approach checks every possible combination of four elements. While correct, it's too slow for larger inputs. With n=200, this means up to 64 million iterations. The optimal solution reduces this to O(n³) by using sorting and two pointers. diff --git a/backend/data/questions/gas-station.yaml b/backend/data/questions/gas-station.yaml new file mode 100644 index 0000000..148946d --- /dev/null +++ b/backend/data/questions/gas-station.yaml @@ -0,0 +1,178 @@ +title: Gas Station +slug: gas-station +difficulty: medium +leetcode_id: 134 +leetcode_url: https://leetcode.com/problems/gas-station/ +categories: + - arrays +patterns: + - greedy + +description: | + There are `n` gas stations along a circular route, where the amount of gas at the ith station is `gas[i]`. + + You have a car with an unlimited gas tank and it costs `cost[i]` of gas to travel from the ith station to its next (i + 1)th station. You begin the journey with an empty tank at one of the gas stations. + + Given two integer arrays `gas` and `cost`, return *the starting gas station's index if you can travel around the circuit once in the clockwise direction, otherwise return* `-1`. If there exists a solution, it is **guaranteed** to be **unique**. + +constraints: | + - `n == gas.length == cost.length` + - `1 <= n <= 10^5` + - `0 <= gas[i], cost[i] <= 10^4` + - The input is generated such that the answer is unique + +examples: + - input: "gas = [1,2,3,4,5], cost = [3,4,5,1,2]" + output: "3" + explanation: "Start at station 3 (index 3) and fill up with 4 units of gas. Your tank = 0 + 4 = 4. Travel to station 4: tank = 4 - 1 + 5 = 8. Travel to station 0: tank = 8 - 2 + 1 = 7. Travel to station 1: tank = 7 - 3 + 2 = 6. Travel to station 2: tank = 6 - 4 + 3 = 5. Travel to station 3: cost is 5, gas is just enough. Return 3." + - input: "gas = [2,3,4], cost = [3,4,3]" + output: "-1" + explanation: "Starting from any station, you cannot complete the circuit. For example, starting at station 2 with 4 gas, you can reach station 1 but cannot travel back to station 2 (requires 4 gas, you only have 3)." + +explanation: + intuition: | + Imagine you're planning a road trip around a circular route with gas stations. At each station, you can fill up some gas, but travelling to the next station costs some gas. The question is: **can you find a starting point where you never run out of fuel?** + + The key insight comes from two observations: + + **Observation 1: Total gas must be enough.** If the total gas available across all stations is less than the total cost to travel the entire circuit, it's impossible to complete the trip from *any* starting point. Conversely, if total gas >= total cost, a solution is **guaranteed** to exist. + + **Observation 2: If you fail at station `j`, skip all previous candidates.** Suppose you start at station `i` and run out of gas at station `j`. You might think: "Maybe I should try starting at `i+1`?" But here's the crucial insight — if you couldn't reach `j` starting from `i` with a full journey's worth of gas from stations `i` to `j-1`, then starting from any station *between* `i` and `j` would give you even *less* gas (you'd miss the contributions from earlier stations). So **all stations from `i` to `j` are invalid starting points**. + + This means when we fail, we can jump our candidate start directly to `j+1`, making this a linear-time algorithm. + + approach: | + We solve this using a **Single Pass Greedy Approach**: + + **Step 1: Initialise tracking variables** + + - `total_tank`: Tracks the cumulative surplus/deficit across all stations (used to check if a solution exists) + - `current_tank`: Tracks the current fuel level from our candidate starting point + - `start_station`: The index of our current candidate starting point, initialised to `0` + +   + + **Step 2: Iterate through each station** + + - For each station `i`, calculate `gas[i] - cost[i]` (the net gain/loss at this station) + - Add this value to both `total_tank` and `current_tank` + - If `current_tank` becomes negative, it means we can't reach station `i+1` from our current `start_station` + - When this happens, reset `start_station` to `i+1` and reset `current_tank` to `0` + +   + + **Step 3: Check feasibility and return** + + - After the loop, if `total_tank >= 0`, a solution exists and `start_station` is our answer + - If `total_tank < 0`, return `-1` (not enough total gas) + +   + + The greedy choice — skipping all stations between our failed start and the failure point — is valid because those intermediate stations would only give us less fuel to work with. + + common_pitfalls: + - title: Trying Every Starting Point + description: | + A brute force approach would try starting at each station and simulate the entire trip: + - For each starting station `i`, simulate travelling all `n` stations + - Check if the tank ever goes negative + + This results in **O(n^2) time complexity**. With `n` up to `10^5`, this means up to 10 billion operations — a guaranteed **Time Limit Exceeded (TLE)**. + wrong_approach: "Nested loops simulating from each start" + correct_approach: "Single pass with smart candidate elimination" + + - title: Not Understanding Why We Can Skip Stations + description: | + When you fail at station `j` starting from station `i`, it might seem wasteful to skip directly to `j+1`. Why not try `i+1`? + + The reason is mathematical: if you reached stations `i+1`, `i+2`, ..., `j-1` with non-negative fuel (otherwise you would have failed earlier), but still failed at `j`, then starting at any of those intermediate stations means you'd have *less* accumulated fuel when you reach `j`. + + For example, if stations give net gains of `[+3, -1, -1, -2]` and you fail at index 3 starting from index 0, starting at index 1 means you miss the +3 from station 0, making failure even more certain. + wrong_approach: "Increment start by 1 when failing" + correct_approach: "Jump start to failure_point + 1" + + - title: Forgetting to Check Total Feasibility + description: | + Just finding a valid `start_station` candidate isn't enough. You must verify that the **total gas >= total cost** for the entire circuit. + + The `total_tank` variable serves this purpose. Even if we find a candidate, if `total_tank < 0` at the end, no solution exists. + wrong_approach: "Only tracking current_tank" + correct_approach: "Track both current_tank and total_tank" + + key_takeaways: + - "**Greedy elimination**: When a candidate fails, use problem structure to eliminate multiple candidates at once, not just one" + - "**Global vs local tracking**: Use separate variables for local decisions (`current_tank`) and global feasibility (`total_tank`)" + - "**Circular problems**: Often can be solved with a single linear pass by tracking cumulative state" + - "**Proof intuition**: If total resources >= total cost, a valid starting point must exist — this is a key insight for many resource allocation problems" + + time_complexity: "O(n). We traverse both arrays exactly once, performing constant-time operations at each station." + space_complexity: "O(1). We only use three integer variables (`total_tank`, `current_tank`, `start_station`) regardless of input size." + +solutions: + - approach_name: Single Pass Greedy + is_optimal: true + code: | + def can_complete_circuit(gas: list[int], cost: list[int]) -> int: + # Track total surplus to check if solution exists + total_tank = 0 + # Track current surplus from candidate start + current_tank = 0 + # Our candidate starting station + start_station = 0 + + for i in range(len(gas)): + # Net gain/loss at this station + net = gas[i] - cost[i] + total_tank += net + current_tank += net + + # If we can't reach the next station from current start + if current_tank < 0: + # All stations from start to i are invalid + # Try starting from the next station + start_station = i + 1 + current_tank = 0 + + # If total gas >= total cost, solution exists at start_station + # Otherwise, impossible to complete the circuit + return start_station if total_tank >= 0 else -1 + explanation: | + **Time Complexity:** O(n) — Single pass through both arrays. + + **Space Complexity:** O(1) — Only three integer variables used. + + The key insight is that if we fail to reach station `j` from station `i`, all stations between `i` and `j` are also invalid starting points. Combined with tracking total feasibility, this gives us an elegant linear solution. + + - approach_name: Brute Force + is_optimal: false + code: | + def can_complete_circuit(gas: list[int], cost: list[int]) -> int: + n = len(gas) + + # Try each station as a starting point + for start in range(n): + tank = 0 + can_complete = True + + # Simulate travelling around the circuit + for i in range(n): + # Current station index (wrapping around) + station = (start + i) % n + # Fill up and travel to next station + tank += gas[station] - cost[station] + + # Ran out of gas before reaching next station + if tank < 0: + can_complete = False + break + + if can_complete: + return start + + return -1 + explanation: | + **Time Complexity:** O(n^2) — For each of n starting points, we simulate travelling n stations. + + **Space Complexity:** O(1) — Only tracking tank and loop variables. + + This approach is correct but inefficient. It tries every possible starting station and simulates the full circuit. With n up to 10^5, this will cause TLE. Included to illustrate why the greedy optimisation is necessary. diff --git a/backend/data/questions/generate-parentheses.yaml b/backend/data/questions/generate-parentheses.yaml new file mode 100644 index 0000000..ebe5811 --- /dev/null +++ b/backend/data/questions/generate-parentheses.yaml @@ -0,0 +1,195 @@ +title: Generate Parentheses +slug: generate-parentheses +difficulty: medium +leetcode_id: 22 +leetcode_url: https://leetcode.com/problems/generate-parentheses/ +categories: + - strings + - recursion +patterns: + - backtracking + +description: | + Given `n` pairs of parentheses, write a function to *generate all combinations of well-formed parentheses*. + + A well-formed parentheses string has equal numbers of opening and closing parentheses, with every closing parenthesis matching a preceding opening one. + +constraints: | + - `1 <= n <= 8` + +examples: + - input: "n = 3" + output: '["((()))","(()())","(())()","()(())","()()()"]' + explanation: "All 5 valid combinations of 3 pairs of parentheses." + - input: "n = 1" + output: '["()"]' + explanation: "With just one pair, there's only one valid combination." + +explanation: + intuition: | + Imagine you're building a string character by character, and at each step you can choose to add either an opening `(` or a closing `)` parenthesis. + + The key insight is that not every choice is valid. A string of parentheses is **well-formed** if at any point while reading left-to-right, the number of closing parentheses never exceeds the number of opening ones. Think of it like a balance: each `(` adds +1 to the balance, and each `)` subtracts 1. The balance must never go negative. + + This naturally leads to a **decision tree** approach. At each position, we branch based on what characters we *can* legally add: + - We can add `(` if we haven't used all `n` opening parentheses yet + - We can add `)` if we have more opening parentheses than closing ones (i.e., there's an unmatched `(` to close) + + By exploring all valid paths through this decision tree, we generate exactly the set of well-formed parentheses strings — no more, no less. + + approach: | + We solve this using **Backtracking** — systematically building candidates and abandoning paths that can't lead to valid solutions. + + **Step 1: Define the recursive state** + + - `current`: The string we're building + - `open_count`: Number of `(` parentheses used so far + - `close_count`: Number of `)` parentheses used so far + +   + + **Step 2: Identify base case** + + - When `len(current) == 2 * n`, we've placed all parentheses + - Add the completed string to our results list + +   + + **Step 3: Define recursive choices** + + - **Add `(`**: Only if `open_count < n` (we haven't used all opening parentheses) + - **Add `)`**: Only if `close_count < open_count` (there's an unmatched `(` to close) + +   + + **Step 4: Backtrack after each choice** + + - After exploring a path, the recursion naturally "unwinds" + - Since we pass strings (immutable in Python), backtracking is implicit + - With mutable structures, you'd explicitly remove the last character + +   + + The constraints on when we can add each character ensure we only generate valid combinations, making this more efficient than generating all permutations and filtering. + + common_pitfalls: + - title: Generating All Permutations Then Filtering + description: | + A naive approach might generate all possible strings of `(` and `)` characters, then filter for valid ones. + + With `n = 8`, that's `2^16 = 65,536` strings to generate and validate, but only `1,430` are valid (the 8th Catalan number). This wastes significant computation on invalid strings. + + The backtracking approach only explores valid paths, never generating invalid strings in the first place. + wrong_approach: "Generate all 2^(2n) strings, filter valid ones" + correct_approach: "Use constraints during generation to only build valid strings" + + - title: Forgetting the Close Constraint + description: | + It's tempting to think you can always add a `)` as long as you haven't used all `n` of them. But consider building with `n = 2`: + + Starting with `()`, if you add `)` next you get `())` — this is invalid because the third character closes a parenthesis that was never opened. + + The rule is: you can only add `)` when `close_count < open_count`, not just when `close_count < n`. + wrong_approach: "Add ) whenever close_count < n" + correct_approach: "Add ) only when close_count < open_count" + + - title: Modifying Strings In-Place Incorrectly + description: | + In languages with mutable strings or when using a list to build the string, forgetting to backtrack (remove the last character after recursion) leads to corrupted results. + + In Python, passing `current + '('` creates a new string, so backtracking is automatic. But if using a list like `current.append('(')`, you must call `current.pop()` after the recursive call returns. + + key_takeaways: + - "**Backtracking pattern**: Build solutions incrementally, using constraints to prune invalid paths early" + - "**Decision tree thinking**: Visualise recursive problems as trees where each node is a choice point" + - "**Catalan numbers**: The count of valid parentheses combinations follows the Catalan sequence — this appears in many combinatorial problems" + - "**Constraint propagation**: Encoding validity rules into the recursion conditions is more efficient than post-hoc filtering" + + time_complexity: "O(4^n / √n). This is the nth Catalan number, representing the count of valid combinations. Each valid string takes O(n) to construct." + space_complexity: "O(n). The recursion depth is at most `2n` (the length of each string), and we store the current string being built." + +solutions: + - approach_name: Backtracking + is_optimal: true + code: | + def generate_parenthesis(n: int) -> list[str]: + result = [] + + def backtrack(current: str, open_count: int, close_count: int): + # Base case: we've placed all 2n parentheses + if len(current) == 2 * n: + result.append(current) + return + + # Choice 1: Add opening parenthesis if we haven't used all n + if open_count < n: + backtrack(current + '(', open_count + 1, close_count) + + # Choice 2: Add closing parenthesis if it won't make string invalid + if close_count < open_count: + backtrack(current + ')', open_count, close_count + 1) + + backtrack('', 0, 0) + return result + explanation: | + **Time Complexity:** O(4^n / √n) — The number of valid sequences is the nth Catalan number. + + **Space Complexity:** O(n) — Recursion stack depth plus the current string being built. + + We recursively build strings by making valid choices at each step. The constraints (`open_count < n` and `close_count < open_count`) ensure we never explore invalid paths, making this efficient despite the exponential output size. + + - approach_name: Iterative with Stack + is_optimal: false + code: | + def generate_parenthesis(n: int) -> list[str]: + result = [] + # Stack holds tuples of (current_string, open_count, close_count) + stack = [('', 0, 0)] + + while stack: + current, open_count, close_count = stack.pop() + + # Base case: complete string + if len(current) == 2 * n: + result.append(current) + continue + + # Add closing parenthesis option first (will be processed second due to LIFO) + if close_count < open_count: + stack.append((current + ')', open_count, close_count + 1)) + + # Add opening parenthesis option + if open_count < n: + stack.append((current + '(', open_count + 1, close_count)) + + return result + explanation: | + **Time Complexity:** O(4^n / √n) — Same as recursive, we explore all valid paths. + + **Space Complexity:** O(4^n / √n) — The stack can hold many partial solutions simultaneously. + + This converts the recursion to an explicit stack, which can be useful in languages with limited recursion depth. The logic is identical — we just manage the call stack manually. Note that space complexity is worse because we store all pending states on the heap rather than using the call stack. + + - approach_name: Dynamic Programming + is_optimal: false + code: | + def generate_parenthesis(n: int) -> list[str]: + # dp[i] contains all valid strings with i pairs of parentheses + dp = [[] for _ in range(n + 1)] + dp[0] = [''] # Base case: empty string for 0 pairs + + for i in range(1, n + 1): + # Build strings for i pairs using smaller subproblems + # Pattern: "(" + dp[j] + ")" + dp[i-1-j] for all valid j + for j in range(i): + for left in dp[j]: + for right in dp[i - 1 - j]: + dp[i].append('(' + left + ')' + right) + + return dp[n] + explanation: | + **Time Complexity:** O(4^n / √n) — We generate all Catalan(n) strings. + + **Space Complexity:** O(4^n / √n) — We store all valid strings for all values up to n. + + This builds solutions bottom-up. For `i` pairs, we consider all ways to split: `j` pairs inside the first `()` and `i-1-j` pairs after it. While correct, this uses more memory than backtracking since it stores all intermediate results. diff --git a/backend/data/questions/greatest-common-divisor-of-strings.yaml b/backend/data/questions/greatest-common-divisor-of-strings.yaml new file mode 100644 index 0000000..acdf17a --- /dev/null +++ b/backend/data/questions/greatest-common-divisor-of-strings.yaml @@ -0,0 +1,188 @@ +title: Greatest Common Divisor of Strings +slug: greatest-common-divisor-of-strings +difficulty: easy +leetcode_id: 1071 +leetcode_url: https://leetcode.com/problems/greatest-common-divisor-of-strings/ +categories: + - strings + - math +patterns: + - greedy + +description: | + For two strings `s` and `t`, we say "`t` divides `s`" if and only if `s = t + t + t + ... + t` (i.e., `t` is concatenated with itself one or more times). + + Given two strings `str1` and `str2`, return *the largest string* `x` *such that* `x` *divides both* `str1` *and* `str2`. + +constraints: | + - `1 <= str1.length, str2.length <= 1000` + - `str1` and `str2` consist of English uppercase letters. + +examples: + - input: 'str1 = "ABCABC", str2 = "ABC"' + output: '"ABC"' + explanation: '"ABC" divides both strings. "ABCABC" = "ABC" + "ABC" and "ABC" = "ABC".' + - input: 'str1 = "ABABAB", str2 = "ABAB"' + output: '"AB"' + explanation: '"AB" divides both strings. "ABABAB" = "AB" + "AB" + "AB" and "ABAB" = "AB" + "AB".' + - input: 'str1 = "LEET", str2 = "CODE"' + output: '""' + explanation: "There is no string that divides both str1 and str2." + +explanation: + intuition: | + This problem cleverly connects string manipulation to a fundamental mathematical concept: the **Greatest Common Divisor (GCD)**. + + Think of it like this: if a string `x` can "divide" both `str1` and `str2`, then `x` repeated some number of times equals `str1`, and `x` repeated another number of times equals `str2`. This is exactly analogous to how a number `d` divides both `a` and `b` if `a = d * m` and `b = d * n` for some integers `m` and `n`. + + The key insight is that **if a common divisor string exists, the length of the GCD string must be the GCD of the two string lengths**. Why? Because if `x` divides both strings, then `len(str1)` must be a multiple of `len(x)` and `len(str2)` must be a multiple of `len(x)`. The *largest* such length is exactly `gcd(len(str1), len(str2))`. + + But there's one more critical check: **not all string pairs have a common divisor**. For example, `"LEET"` and `"CODE"` have no common divisor because they're fundamentally incompatible. The elegant way to check compatibility is: if `str1 + str2 == str2 + str1`, then a common divisor exists. If the strings are "made of the same building block," the order of concatenation doesn't matter. + + approach: | + We solve this using a **GCD-based Approach**: + + **Step 1: Check if a common divisor exists** + + - Concatenate `str1 + str2` and `str2 + str1` + - If these are not equal, the strings have no common divisor — return an empty string + - This check works because if both strings are built from the same repeating pattern, the order of concatenation won't matter + +   + + **Step 2: Calculate the GCD of the string lengths** + + - Use the Euclidean algorithm to find `gcd(len(str1), len(str2))` + - This gives us the length of the largest possible common divisor string + +   + + **Step 3: Return the GCD string** + + - Return the prefix of `str1` (or `str2`) with length equal to the GCD + - Since we've verified compatibility in Step 1, this prefix is guaranteed to divide both strings + +   + + The mathematical foundation makes this solution both elegant and efficient — we avoid brute-force checking of all possible divisor strings. + + common_pitfalls: + - title: Brute Force All Prefixes + description: | + A naive approach might try every possible prefix of the shorter string and check if it divides both strings. For each candidate prefix of length `k`, you'd verify if `str1` and `str2` are composed entirely of that prefix. + + While correct, this is unnecessarily slow. With strings up to length 1000, and checking each prefix by iterating through both strings, you could do up to O(n^2) work. + + The GCD approach reduces this to O(n) string concatenation checks plus O(log(min(n, m))) for the GCD calculation. + wrong_approach: "Try every prefix and check divisibility" + correct_approach: "Use mathematical GCD on lengths after compatibility check" + + - title: Forgetting the Compatibility Check + description: | + You might be tempted to just compute `gcd(len(str1), len(str2))` and return that prefix. But this fails for cases like `str1 = "LEET"`, `str2 = "CODE"`. + + The GCD of 4 and 4 is 4, but `"LEET"` does not equal `"CODE"` — there's no common divisor string at all! The `str1 + str2 == str2 + str1` check catches this: `"LEETCODE"` ≠ `"CODELEET"`. + wrong_approach: "Just return str1[:gcd(len(str1), len(str2))]" + correct_approach: "First verify str1 + str2 == str2 + str1" + + - title: Checking Wrong String for Prefix + description: | + After finding the GCD length, some might try to construct the result by repeating characters or using complex logic. Simply take a prefix of either string — since we've verified they're compatible, both strings start with the same pattern. + wrong_approach: "Complex construction of the result string" + correct_approach: "Return str1[:gcd_length] directly" + + key_takeaways: + - "**Mathematical insight**: String divisibility mirrors integer divisibility — the GCD concept transfers directly" + - "**Compatibility check first**: The `str1 + str2 == str2 + str1` test elegantly verifies that a common pattern exists" + - "**Euclidean algorithm**: The GCD of two numbers can be computed efficiently in O(log(min(a, b))) time" + - "**Pattern recognition**: Look for mathematical analogies when problems involve repetition or divisibility" + + time_complexity: "O(n + m). We perform two string concatenations of total length `n + m`, one equality check of length `n + m`, and a GCD calculation in O(log(min(n, m)))." + space_complexity: "O(n + m). We create two concatenated strings of length `n + m` for the compatibility check." + +solutions: + - approach_name: GCD of Lengths + is_optimal: true + code: | + from math import gcd + + def gcd_of_strings(str1: str, str2: str) -> str: + # Check if a common divisor pattern exists + # If both strings are made of the same repeating unit, + # concatenation order doesn't matter + if str1 + str2 != str2 + str1: + return "" + + # The GCD string length must be the GCD of both lengths + gcd_length = gcd(len(str1), len(str2)) + + # Return the prefix of that length + return str1[:gcd_length] + explanation: | + **Time Complexity:** O(n + m) — String concatenation and comparison dominate. + + **Space Complexity:** O(n + m) — For the concatenated strings. + + This elegant solution leverages the mathematical relationship between string divisibility and integer GCD. The concatenation equality check `str1 + str2 == str2 + str1` is a brilliant way to verify that both strings share a common building block pattern. + + - approach_name: Iterative GCD Check + is_optimal: false + code: | + def gcd_of_strings(str1: str, str2: str) -> str: + def gcd(a: int, b: int) -> int: + # Euclidean algorithm + while b: + a, b = b, a % b + return a + + def divides(s: str, t: str) -> bool: + # Check if s divides t (t is s repeated) + if len(t) % len(s) != 0: + return False + times = len(t) // len(s) + return s * times == t + + # Find GCD length + gcd_len = gcd(len(str1), len(str2)) + candidate = str1[:gcd_len] + + # Verify it actually divides both strings + if divides(candidate, str1) and divides(candidate, str2): + return candidate + return "" + explanation: | + **Time Complexity:** O(n + m) — Checking divisibility for both strings. + + **Space Complexity:** O(gcd(n, m)) — For the candidate string and repeated copies. + + This approach is more explicit: it computes the candidate GCD string, then verifies it actually divides both inputs. While correct and intuitive, it's slightly less elegant than the concatenation trick which handles the compatibility check in one comparison. + + - approach_name: Brute Force + is_optimal: false + code: | + def gcd_of_strings(str1: str, str2: str) -> str: + # Try all possible prefix lengths, from largest to smallest + min_len = min(len(str1), len(str2)) + + for length in range(min_len, 0, -1): + # Skip if lengths aren't divisible + if len(str1) % length != 0 or len(str2) % length != 0: + continue + + # Get candidate prefix + candidate = str1[:length] + + # Check if candidate divides both strings + times1 = len(str1) // length + times2 = len(str2) // length + + if candidate * times1 == str1 and candidate * times2 == str2: + return candidate + + return "" + explanation: | + **Time Complexity:** O(min(n, m) * (n + m)) — For each candidate length, we check both strings. + + **Space Complexity:** O(n + m) — For the repeated candidate strings. + + This brute force approach tries every possible prefix length from largest to smallest. While it works, it's inefficient because it doesn't leverage the mathematical insight that the answer length must be `gcd(n, m)`. Included to illustrate why the GCD approach is superior. diff --git a/backend/data/questions/group-anagrams.yaml b/backend/data/questions/group-anagrams.yaml new file mode 100644 index 0000000..b927bc1 --- /dev/null +++ b/backend/data/questions/group-anagrams.yaml @@ -0,0 +1,156 @@ +title: Group Anagrams +slug: group-anagrams +difficulty: medium +leetcode_id: 49 +leetcode_url: https://leetcode.com/problems/group-anagrams/ +categories: + - strings + - hash-tables + - sorting +patterns: + - hashing + +description: | + Given an array of strings `strs`, group the **anagrams** together. You can return the answer in **any order**. + + An **anagram** is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once. + +constraints: | + - `1 <= strs.length <= 10^4` + - `0 <= strs[i].length <= 100` + - `strs[i]` consists of lowercase English letters + +examples: + - input: 'strs = ["eat","tea","tan","ate","nat","bat"]' + output: '[["bat"],["nat","tan"],["ate","eat","tea"]]' + explanation: "Words with the same letters are grouped together." + - input: 'strs = [""]' + output: '[[""]]' + explanation: "Empty string forms its own group." + - input: 'strs = ["a"]' + output: '[["a"]]' + explanation: "Single character forms its own group." + +explanation: + intuition: | + What makes two words anagrams? They have exactly the same letters in exactly the same quantities. "eat" and "tea" both have one 'e', one 'a', and one 't'. + + Think of it like this: if you sort the letters of any anagram, you get the same result. `sorted("eat") = "aet"` and `sorted("tea") = "aet"`. This sorted form is a **canonical representation** — a fingerprint that's identical for all anagrams. + + So the strategy is simple: for each word, compute its fingerprint (sorted letters), and group words with the same fingerprint together. A hash map is perfect for this — the fingerprint is the key, and each key maps to a list of original words. + + There's an alternative fingerprint: instead of sorting, count each letter's frequency. `"eat"` becomes `(1,0,0,0,1,0,...,1,0,0)` — a tuple of 26 counts. This is O(k) instead of O(k log k), better for long strings. + + approach: | + We solve this using **Hash Map with Sorted String Keys**: + + **Step 1: Create a hash map for grouping** + + - Use a `defaultdict(list)` so we can append to non-existent keys + - Keys will be the canonical form (sorted string) + - Values will be lists of original strings + +   + + **Step 2: Process each string** + + - For each string `s` in the input: + - Compute the key: `''.join(sorted(s))` + - Append the original string to `groups[key]` + +   + + **Step 3: Return all groups** + + - Return `list(groups.values())` — each value is one anagram group + +   + + Why does sorting work? Two strings are anagrams if and only if they contain the same characters. Sorting arranges characters in a canonical order, so anagrams produce identical sorted strings. + + common_pitfalls: + - title: Using Unhashable Types as Dictionary Keys + description: | + In Python, `sorted(s)` returns a **list**, which can't be a dictionary key (lists are mutable, hence unhashable). + + You must convert to a hashable type: + - `''.join(sorted(s))` → string key + - `tuple(sorted(s))` → tuple key + wrong_approach: "groups[sorted(s)].append(s)" + correct_approach: "groups[''.join(sorted(s))].append(s)" + + - title: Forgetting Empty Strings + description: | + An empty string `""` is a valid input. `sorted("")` returns `[]`, and `''.join([])` returns `""`. The algorithm handles this correctly, but edge case testing is important. + wrong_approach: "Assuming all strings are non-empty" + correct_approach: "Empty strings are handled naturally — they form their own group" + + - title: Using Regular Dict Without Default + description: | + With a regular `dict`, you must check if a key exists before appending: + ```python + if key not in groups: + groups[key] = [] + groups[key].append(s) + ``` + Using `defaultdict(list)` eliminates this boilerplate. + wrong_approach: "groups[key].append(s) with regular dict (KeyError)" + correct_approach: "Use defaultdict(list) for automatic list creation" + + key_takeaways: + - "**Canonical form for grouping**: Anagrams share a canonical representation (sorted or counted)" + - "**Hash map for grouping**: When grouping by some property, use that property as the key" + - "**Sorting vs counting**: Sorting is O(k log k), counting is O(k) — counting is faster for long strings" + - "**defaultdict simplifies code**: Eliminates key-existence checks when building lists" + + time_complexity: "O(n × k log k). We process n strings, and sorting each string of length k takes O(k log k). With the counting approach, this becomes O(n × k)." + space_complexity: "O(n × k). We store all n strings in the hash map. Each string has length up to k." + +solutions: + - approach_name: Sorted String Key + is_optimal: true + code: | + from collections import defaultdict + + def group_anagrams(strs: list[str]) -> list[list[str]]: + # Map: sorted string -> list of original strings + groups = defaultdict(list) + + for s in strs: + # All anagrams sort to the same string + key = ''.join(sorted(s)) + groups[key].append(s) + + # Return all groups (order doesn't matter) + return list(groups.values()) + explanation: | + **Time Complexity:** O(n × k log k) — Sorting each of n strings of average length k. + + **Space Complexity:** O(n × k) — Storing all strings in the hash map. + + Sorting gives each string a canonical form. All anagrams produce the same sorted string, so they end up in the same bucket. Simple, readable, and efficient enough for most cases. + + - approach_name: Character Count Key + is_optimal: true + code: | + from collections import defaultdict + + def group_anagrams(strs: list[str]) -> list[list[str]]: + groups = defaultdict(list) + + for s in strs: + # Count frequency of each letter (a-z) + count = [0] * 26 + for c in s: + count[ord(c) - ord('a')] += 1 + + # Use tuple of counts as key (tuples are hashable) + groups[tuple(count)].append(s) + + return list(groups.values()) + explanation: | + **Time Complexity:** O(n × k) — Counting is O(k) per string, better than O(k log k) sorting. + + **Space Complexity:** O(n × k) — Same as sorted approach. + + Instead of sorting, we count the frequency of each letter. Two strings are anagrams if and only if they have identical character counts. The count array is converted to a tuple (hashable) for use as a dictionary key. This is faster for long strings. diff --git a/backend/data/questions/guess-number-higher-or-lower.yaml b/backend/data/questions/guess-number-higher-or-lower.yaml new file mode 100644 index 0000000..efb92de --- /dev/null +++ b/backend/data/questions/guess-number-higher-or-lower.yaml @@ -0,0 +1,170 @@ +title: Guess Number Higher or Lower +slug: guess-number-higher-or-lower +difficulty: easy +leetcode_id: 374 +leetcode_url: https://leetcode.com/problems/guess-number-higher-or-lower/ +categories: + - binary-search +patterns: + - binary-search + +description: | + We are playing the Guess Game. The game is as follows: + + I pick a number from `1` to `n`. You have to guess which number I picked (the number I picked stays the same throughout the game). + + Every time you guess wrong, I will tell you whether the number I picked is higher or lower than your guess. + + You call a pre-defined API `int guess(int num)`, which returns three possible results: + + - `-1`: Your guess is higher than the number I picked (i.e. `num > pick`). + - `1`: Your guess is lower than the number I picked (i.e. `num < pick`). + - `0`: Your guess is equal to the number I picked (i.e. `num == pick`). + + Return *the number that I picked*. + +constraints: | + - `1 <= n <= 2^31 - 1` + - `1 <= pick <= n` + +examples: + - input: "n = 10, pick = 6" + output: "6" + explanation: "Using binary search, we narrow down the range until we find 6." + - input: "n = 1, pick = 1" + output: "1" + explanation: "There's only one number, so the answer is 1." + - input: "n = 2, pick = 1" + output: "1" + explanation: "We guess the middle (or lower), and the API tells us we found it or to go lower." + +explanation: + intuition: | + Imagine playing a number guessing game with a friend. They're thinking of a number between 1 and 100, and after each guess, they tell you "higher" or "lower". What's the smartest strategy? + + The optimal approach is to **always guess the middle of the remaining range**. If they say "higher", you eliminate all numbers below your guess. If they say "lower", you eliminate all numbers above. Each guess cuts the search space in half. + + Think of it like searching for a word in a dictionary. You don't start from page 1 and check every page — you open to the middle, see if your word comes before or after, and repeat. This is the essence of **binary search**. + + The key insight is that the API feedback (`-1`, `0`, `1`) directly tells us which half of the range to keep searching. We're guaranteed to find the answer because we systematically narrow down until only one number remains. + + approach: | + We solve this using **Binary Search**: + + **Step 1: Initialise the search boundaries** + + - `low`: Set to `1` (the smallest possible number) + - `high`: Set to `n` (the largest possible number) + +   + + **Step 2: Binary search loop** + + - While `low <= high`, calculate the middle point: `mid = low + (high - low) // 2` + - Call `guess(mid)` to get feedback + - If result is `0`: We found the number — return `mid` + - If result is `-1`: Our guess is too high — search in the lower half by setting `high = mid - 1` + - If result is `1`: Our guess is too low — search in the upper half by setting `low = mid + 1` + +   + + **Step 3: Return the result** + + - The loop is guaranteed to find the answer since `1 <= pick <= n` + +   + + Note: We use `mid = low + (high - low) // 2` instead of `(low + high) // 2` to avoid integer overflow when `low + high` exceeds the maximum integer value. + + common_pitfalls: + - title: Integer Overflow in Midpoint Calculation + description: | + A classic bug is calculating the midpoint as `(low + high) // 2`. When `low` and `high` are both large (close to `2^31 - 1`), their sum overflows. + + For example, if `low = 2^30` and `high = 2^31 - 1`, then `low + high` exceeds the 32-bit signed integer limit, causing incorrect behavior or runtime errors. + + Always use `low + (high - low) // 2` to safely compute the midpoint. + wrong_approach: "(low + high) // 2" + correct_approach: "low + (high - low) // 2" + + - title: Linear Search + description: | + Guessing numbers one by one from `1` to `n` works but is extremely inefficient. With `n` up to `2^31 - 1` (over 2 billion), a linear approach could require billions of guesses. + + Binary search guarantees finding the answer in at most `log2(n)` guesses — about 31 guesses for the maximum `n`. + wrong_approach: "Loop from 1 to n, calling guess(i)" + correct_approach: "Binary search halving the range each time" + + - title: Off-by-One Errors + description: | + When updating `low` and `high`, you must exclude the middle element since we already checked it: + + - When `guess(mid)` returns `-1` (guess too high), set `high = mid - 1`, not `high = mid` + - When `guess(mid)` returns `1` (guess too low), set `low = mid + 1`, not `low = mid` + + Using `mid` instead of `mid - 1` or `mid + 1` can cause infinite loops. + + key_takeaways: + - "**Binary search foundation**: This problem teaches the core binary search template — divide the search space in half based on a condition" + - "**Overflow prevention**: Always use `low + (high - low) // 2` for midpoint calculation in production code" + - "**Interactive problems**: Problems with API calls follow the same patterns — the API response guides which half to search" + - "**Logarithmic efficiency**: Binary search reduces `O(n)` to `O(log n)`, essential for large input ranges" + + time_complexity: "O(log n). Each guess eliminates half of the remaining candidates, so we need at most log2(n) guesses." + space_complexity: "O(1). We only use three variables (`low`, `high`, `mid`) regardless of the input size." + +solutions: + - approach_name: Binary Search + is_optimal: true + code: | + # The guess API is already defined for you. + # @param num: your guess + # @return -1 if num is higher than the picked number + # 1 if num is lower than the picked number + # otherwise return 0 + # def guess(num: int) -> int: + + def guess_number(n: int) -> int: + low, high = 1, n + + while low <= high: + # Safe midpoint calculation to avoid overflow + mid = low + (high - low) // 2 + + result = guess(mid) + + if result == 0: + # Found the number + return mid + elif result == -1: + # Guess is too high, search lower half + high = mid - 1 + else: + # Guess is too low, search upper half + low = mid + 1 + + # Should never reach here given problem constraints + return -1 + explanation: | + **Time Complexity:** O(log n) — Each iteration halves the search space. + + **Space Complexity:** O(1) — Only three integer variables used. + + This is the classic binary search template. We maintain a range `[low, high]` and repeatedly guess the middle value. Based on the API response, we eliminate half the range until we find the target. + + - approach_name: Linear Search + is_optimal: false + code: | + def guess_number(n: int) -> int: + # Check each number from 1 to n + for i in range(1, n + 1): + if guess(i) == 0: + return i + + return -1 + explanation: | + **Time Complexity:** O(n) — In the worst case, we check every number. + + **Space Complexity:** O(1) — Only loop variable used. + + This brute force approach checks numbers sequentially. While correct, it's far too slow for large `n` (up to 2 billion). With `n = 2^31 - 1`, this could require over 2 billion API calls, causing Time Limit Exceeded. Included to illustrate why binary search is necessary. diff --git a/backend/data/questions/hand-of-straights.yaml b/backend/data/questions/hand-of-straights.yaml new file mode 100644 index 0000000..f599e78 --- /dev/null +++ b/backend/data/questions/hand-of-straights.yaml @@ -0,0 +1,191 @@ +title: Hand of Straights +slug: hand-of-straights +difficulty: medium +leetcode_id: 846 +leetcode_url: https://leetcode.com/problems/hand-of-straights/ +categories: + - arrays + - hash-tables + - sorting +patterns: + - greedy + - heap + +description: | + Alice has some number of cards and she wants to rearrange the cards into groups so that each group is of size `groupSize`, and consists of `groupSize` consecutive cards. + + Given an integer array `hand` where `hand[i]` is the value written on the ith card and an integer `groupSize`, return `true` if she can rearrange the cards, or `false` otherwise. + +constraints: | + - `1 <= hand.length <= 10^4` + - `0 <= hand[i] <= 10^9` + - `1 <= groupSize <= hand.length` + +examples: + - input: "hand = [1,2,3,6,2,3,4,7,8], groupSize = 3" + output: "true" + explanation: "Alice's hand can be rearranged as [1,2,3], [2,3,4], [6,7,8]." + - input: "hand = [1,2,3,4,5], groupSize = 4" + output: "false" + explanation: "Alice's hand cannot be rearranged into groups of 4 because 5 is not divisible by 4." + +explanation: + intuition: | + Imagine you're organising a deck of cards into runs of consecutive numbers, like arranging cards in a hand of rummy. + + The key insight is that **the smallest card in any valid arrangement must start a group**. Why? Because no smaller card exists to precede it in a consecutive sequence. So if the smallest card is `3`, you must form a group starting at `3` (i.e., `3, 4, 5` for `groupSize = 3`). + + Think of it like this: you're forced to "use up" the smallest remaining card first. Once you commit to starting a group with that card, you must find the next `groupSize - 1` consecutive cards to complete the group. If any of those cards are missing, the arrangement is impossible. + + This greedy approach works because: + - Every card must belong to exactly one group + - The smallest card has no choice — it must start a group + - By always processing the smallest unused card, we systematically build all possible groups + + approach: | + We solve this using a **Greedy Approach with Hash Map Counting**: + + **Step 1: Check divisibility** + + - If `len(hand)` is not divisible by `groupSize`, return `false` immediately + - We need exactly `n / groupSize` complete groups + +   + + **Step 2: Count card frequencies** + + - Use a hash map to count occurrences of each card value + - This allows O(1) lookups and decrements + +   + + **Step 3: Sort unique card values** + + - Sort the unique card values (or use a min-heap) + - This ensures we always process the smallest available card first + +   + + **Step 4: Greedily form groups** + + - For each smallest card value with count > 0: + - Attempt to form a group starting at this value + - For each of the next `groupSize` consecutive values: + - If the count is 0 (card unavailable), return `false` + - Decrement the count of each card used + - If all groups formed successfully, return `true` + +   + + The greedy choice of always starting from the smallest available card guarantees correctness because that card has no other valid placement. + + common_pitfalls: + - title: Forgetting the Divisibility Check + description: | + Before any complex logic, check if `len(hand) % groupSize == 0`. + + For example, with `hand = [1,2,3,4,5]` and `groupSize = 4`, it's impossible to form complete groups of 4 from 5 cards. This quick check avoids unnecessary computation. + wrong_approach: "Skipping the divisibility check and processing all cards" + correct_approach: "Return false immediately if total cards aren't divisible by groupSize" + + - title: Not Processing Cards in Sorted Order + description: | + If you try to form groups starting from arbitrary cards, you might use up cards needed for smaller sequences. + + For example, with `hand = [1,2,3,2,3,4]` and `groupSize = 3`, if you greedily grab `[2,3,4]` first, you're left with `[1,2,3]` which works. But if you tried `[1,2,3]` and `[2,3,4]` in different orderings without tracking properly, you could miss valid arrangements or incorrectly report failure. + + By always starting groups from the **smallest available card**, you ensure deterministic and correct grouping. + wrong_approach: "Processing cards in arbitrary order" + correct_approach: "Sort cards and always start groups from the smallest value" + + - title: Using Cards Multiple Times + description: | + Each card can only belong to one group. When forming a group, you must decrement the count for each card used. + + A common bug is checking if a card exists but forgetting to reduce its count, leading to the same card being "used" in multiple groups. + wrong_approach: "Checking card existence without decrementing counts" + correct_approach: "Decrement count immediately after using each card" + + key_takeaways: + - "**Greedy with constraints**: When elements have no flexibility in placement (smallest must start a group), greedy works" + - "**Hash map for frequency tracking**: Counting occurrences enables efficient lookups and updates in O(1)" + - "**Sort to establish processing order**: Sorting unique values ensures we always handle the most constrained element first" + - "**Early termination**: Simple checks like divisibility can save significant computation" + + time_complexity: "O(n log n). Sorting the unique card values dominates. The grouping phase visits each card at most once, contributing O(n)." + space_complexity: "O(n). The hash map stores counts for up to `n` unique card values." + +solutions: + - approach_name: Greedy with Hash Map + is_optimal: true + code: | + from collections import Counter + + def is_n_straight_hand(hand: list[int], group_size: int) -> bool: + # Quick check: total cards must be divisible by group size + if len(hand) % group_size != 0: + return False + + # Count frequency of each card value + card_count = Counter(hand) + + # Process cards in sorted order (smallest first) + for card in sorted(card_count): + # If this card has remaining copies, it must start a group + count = card_count[card] + if count > 0: + # Try to form 'count' groups starting at this card + for i in range(group_size): + # Need 'count' copies of each consecutive card + if card_count[card + i] < count: + return False # Not enough cards to complete groups + card_count[card + i] -= count + + return True + explanation: | + **Time Complexity:** O(n log n) — Sorting unique values takes O(k log k) where k ≤ n, and we process each card once. + + **Space Complexity:** O(n) — Hash map stores up to n entries. + + We count card frequencies, then iterate through sorted values. When a card has remaining copies, we greedily form as many groups as possible starting from that card. If any consecutive card is missing, we return false. + + - approach_name: Min-Heap Approach + is_optimal: false + code: | + from collections import Counter + import heapq + + def is_n_straight_hand(hand: list[int], group_size: int) -> bool: + if len(hand) % group_size != 0: + return False + + card_count = Counter(hand) + # Min-heap of unique card values + min_heap = list(card_count.keys()) + heapq.heapify(min_heap) + + while min_heap: + # Get smallest card (must start a group) + smallest = min_heap[0] + + # Form one group starting at smallest + for i in range(group_size): + card = smallest + i + if card_count[card] == 0: + return False # Card unavailable + + card_count[card] -= 1 + # Remove from heap if exhausted + if card_count[card] == 0: + # Only remove if it's the heap minimum + if card != min_heap[0]: + return False # Gap in sequence + heapq.heappop(min_heap) + + return True + explanation: | + **Time Complexity:** O(n log n) — Heap operations for each card removal. + + **Space Complexity:** O(n) — Hash map and heap storage. + + This approach uses a min-heap to always access the smallest card. We form one group at a time, removing cards from the heap when exhausted. The constraint that we can only pop the heap minimum ensures consecutive sequences are valid. This is slightly less efficient than the hash map approach but demonstrates an alternative technique. diff --git a/backend/data/questions/happy-number.yaml b/backend/data/questions/happy-number.yaml new file mode 100644 index 0000000..13b6e35 --- /dev/null +++ b/backend/data/questions/happy-number.yaml @@ -0,0 +1,195 @@ +title: Happy Number +slug: happy-number +difficulty: easy +leetcode_id: 202 +leetcode_url: https://leetcode.com/problems/happy-number/ +categories: + - math + - hash-tables +patterns: + - fast-slow-pointers + +function_signature: "def is_happy(n: int) -> bool:" + +test_cases: + visible: + - input: { n: 19 } + expected: true + - input: { n: 2 } + expected: false + - input: { n: 1 } + expected: true + hidden: + - input: { n: 7 } + expected: true + - input: { n: 4 } + expected: false + - input: { n: 100 } + expected: true + +description: | + Write an algorithm to determine if a number `n` is happy. + + A **happy number** is a number defined by the following process: + + - Starting with any positive integer, replace the number by the sum of the squares of its digits. + - Repeat the process until the number equals `1` (where it will stay), or it **loops endlessly in a cycle** which does not include `1`. + - Those numbers for which this process **ends in 1** are happy. + + Return `true` if `n` is a happy number, and `false` if not. + +constraints: | + - `1 <= n <= 2^31 - 1` + +examples: + - input: "n = 19" + output: "true" + explanation: "1^2 + 9^2 = 82 -> 8^2 + 2^2 = 68 -> 6^2 + 8^2 = 100 -> 1^2 + 0^2 + 0^2 = 1" + - input: "n = 2" + output: "false" + explanation: "The sequence 2 -> 4 -> 16 -> 37 -> 58 -> 89 -> 145 -> 42 -> 20 -> 4 enters a cycle that never reaches 1." + +explanation: + intuition: | + Think of this problem as following a path through a maze of numbers. Starting from `n`, you compute the sum of squared digits to get the next number, then repeat. The key insight is that this sequence must eventually do one of two things: either reach `1` (happy!) or enter a cycle (unhappy). + + Why must it cycle? Because the sum of squared digits for any number has an upper bound. For a number with `d` digits, the maximum sum is `d * 81` (when all digits are `9`). For the largest input (`2^31 - 1`, which has 10 digits), the maximum possible sum is 810. So after at most one step, you're working with numbers in a bounded range, and a bounded sequence that never terminates must eventually repeat. + + This cycle-detection insight opens up two elegant solutions: + + 1. **Hash Set**: Track every number you've seen. If you see a repeat before reaching `1`, there's a cycle. + 2. **Floyd's Cycle Detection (Fast-Slow Pointers)**: Use two "runners" through the sequence at different speeds. If there's a cycle, the fast runner will eventually lap the slow runner. + + The fast-slow pointer approach is particularly elegant because it uses O(1) space instead of O(n) for storing visited numbers. + + approach: | + We solve this using **Floyd's Cycle Detection** (also known as the tortoise and hare algorithm): + + **Step 1: Define a helper function** + + - `get_next(n)`: Computes the sum of squares of digits + - Extract each digit using modulo and integer division + - Square each digit and accumulate the sum + +   + + **Step 2: Initialise two pointers** + + - `slow`: Starts at `n`, moves one step at a time + - `fast`: Starts at `get_next(n)`, moves two steps at a time + +   + + **Step 3: Run the cycle detection loop** + + - While `fast != 1` and `fast != slow`: + - Move `slow` one step: `slow = get_next(slow)` + - Move `fast` two steps: `fast = get_next(get_next(fast))` + - If they meet before reaching `1`, there's a cycle (unhappy) + - If `fast` reaches `1`, the number is happy + +   + + **Step 4: Return the result** + + - Return `fast == 1` + - If `fast` is `1`, we found happiness; otherwise we detected a cycle + + common_pitfalls: + - title: Infinite Loop Without Cycle Detection + description: | + A naive approach might just keep computing the next number forever: + + ```python + while n != 1: + n = sum_of_squares(n) + return True + ``` + + This will never terminate for unhappy numbers like `2`, which cycle endlessly through `2 -> 4 -> 16 -> 37 -> 58 -> 89 -> 145 -> 42 -> 20 -> 4 -> ...` + + You **must** detect cycles, either with a hash set or Floyd's algorithm. + wrong_approach: "Loop until n equals 1" + correct_approach: "Track visited numbers or use Floyd's cycle detection" + + - title: Forgetting Edge Cases + description: | + The number `1` is already happy (sum of squares of `1` is `1`). Single-digit numbers like `7` are also happy (`7 -> 49 -> 97 -> 130 -> 10 -> 1`). + + Make sure your initial setup handles these correctly. With Floyd's algorithm, initialising `slow = n` and `fast = get_next(n)` naturally handles `n = 1` because `fast` immediately becomes `1`. + + - title: Integer Overflow in get_next + description: | + When extracting digits, some implementations might use string conversion which is slower. The mathematical approach using `n % 10` and `n // 10` is both faster and avoids any potential issues with very large numbers during intermediate steps. + + However, since the sum of squared digits is bounded (maximum ~810 for 10-digit numbers), overflow is not a concern for the result. + + key_takeaways: + - "**Cycle detection pattern**: Floyd's algorithm (fast-slow pointers) is useful whenever you need to detect cycles in a sequence with O(1) space" + - "**Bounded sequences**: Recognising that the sequence values are bounded (max ~810) proves that cycles must occur for non-happy numbers" + - "**Math vs Hash Table tradeoff**: The hash set approach is simpler to understand but uses O(k) space where k is the cycle length; Floyd's uses O(1)" + - "**Related problems**: This pattern applies to Linked List Cycle, Find the Duplicate Number, and other sequence-based cycle problems" + + time_complexity: "O(log n). The number of digits in n is O(log n), and we process each number in the sequence. The sequence length is bounded by a constant for any starting value." + space_complexity: "O(1) for Floyd's algorithm, or O(log n) for the hash set approach (storing visited numbers)." + +solutions: + - approach_name: Floyd's Cycle Detection + is_optimal: true + code: | + def is_happy(n: int) -> bool: + def get_next(num: int) -> int: + """Calculate sum of squares of digits.""" + total = 0 + while num > 0: + digit = num % 10 # Extract last digit + total += digit * digit # Add its square + num //= 10 # Remove last digit + return total + + # Floyd's algorithm: slow moves 1 step, fast moves 2 steps + slow = n + fast = get_next(n) + + # Continue until fast reaches 1 or they meet (cycle detected) + while fast != 1 and slow != fast: + slow = get_next(slow) # One step + fast = get_next(get_next(fast)) # Two steps + + # Happy if we reached 1, unhappy if cycle detected + return fast == 1 + explanation: | + **Time Complexity:** O(log n) — Each number has O(log n) digits to process, and the sequence is bounded. + + **Space Complexity:** O(1) — Only uses two pointer variables regardless of input size. + + Floyd's cycle detection elegantly solves the problem: if a cycle exists, the fast pointer will eventually catch up to the slow pointer. If no cycle exists (happy number), fast reaches 1 first. + + - approach_name: Hash Set + is_optimal: false + code: | + def is_happy(n: int) -> bool: + def get_next(num: int) -> int: + """Calculate sum of squares of digits.""" + total = 0 + while num > 0: + digit = num % 10 + total += digit * digit + num //= 10 + return total + + # Track all numbers we've seen + seen = set() + + while n != 1 and n not in seen: + seen.add(n) # Mark current number as visited + n = get_next(n) # Move to next in sequence + + # Happy if we reached 1, unhappy if we saw a repeat + return n == 1 + explanation: | + **Time Complexity:** O(log n) — Same as Floyd's approach. + + **Space Complexity:** O(log n) — Stores visited numbers in the set. + + This approach is more intuitive: just remember what you've seen. If you see a number twice before reaching 1, you're in a cycle. The tradeoff is using extra memory for the set. diff --git a/backend/data/questions/house-robber-ii.yaml b/backend/data/questions/house-robber-ii.yaml new file mode 100644 index 0000000..2e7bc57 --- /dev/null +++ b/backend/data/questions/house-robber-ii.yaml @@ -0,0 +1,213 @@ +title: House Robber II +slug: house-robber-ii +difficulty: medium +leetcode_id: 213 +leetcode_url: https://leetcode.com/problems/house-robber-ii/ +categories: + - arrays + - dynamic-programming +patterns: + - dynamic-programming + +description: | + You are a professional robber planning to rob houses along a street. Each house has a certain amount of money stashed. All houses at this place are **arranged in a circle**. That means the first house is the neighbour of the last one. Meanwhile, adjacent houses have a security system connected, and **it will automatically contact the police if two adjacent houses were broken into on the same night**. + + Given an integer array `nums` representing the amount of money of each house, return *the maximum amount of money you can rob tonight without alerting the police*. + +constraints: | + - `1 <= nums.length <= 100` + - `0 <= nums[i] <= 1000` + +examples: + - input: "nums = [2,3,2]" + output: "3" + explanation: "You cannot rob house 1 (money = 2) and then rob house 3 (money = 2), because they are adjacent houses." + - input: "nums = [1,2,3,1]" + output: "4" + explanation: "Rob house 1 (money = 1) and then rob house 3 (money = 3). Total amount you can rob = 1 + 3 = 4." + - input: "nums = [1,2,3]" + output: "3" + explanation: "Rob house 2 (money = 3) since it's the highest value and not adjacent to itself." + +explanation: + intuition: | + This problem is a clever extension of the classic House Robber problem. The twist? The houses are arranged in a **circle**, meaning the first and last houses are neighbours. + + Think of it like this: imagine the houses arranged around a cul-de-sac instead of a straight street. If you rob the first house, you can't rob the last one (they share a fence). Conversely, if you rob the last house, you can't rob the first. + + The key insight is that **you can never rob both the first and last house** — they're mutually exclusive. This transforms the circular problem into two linear problems: + - **Scenario A**: Rob from houses `0` to `n-2` (exclude the last house) + - **Scenario B**: Rob from houses `1` to `n-1` (exclude the first house) + + The answer is simply the maximum of these two scenarios. Each scenario is just the original House Robber problem, which we solve with dynamic programming! + + approach: | + We solve this by **reducing the circular problem to two linear problems**: + + **Step 1: Handle the edge case** + + - If there's only one house, return `nums[0]` — no circular constraint applies + +   + + **Step 2: Define a helper function for linear House Robber** + + - This function solves the original problem on a subarray + - Use two variables (`prev1`, `prev2`) to track the maximum money achievable + - Recurrence: `current = max(nums[i] + prev2, prev1)` + +   + + **Step 3: Run the helper on two scenarios** + + - `rob_linear(nums[0:n-1])`: Exclude the last house (can rob the first) + - `rob_linear(nums[1:n])`: Exclude the first house (can rob the last) + - These two ranges cover all valid combinations — if we rob both first and last, neither scenario includes it + +   + + **Step 4: Return the maximum** + + - `max(scenario_a, scenario_b)` gives the optimal answer + - One of these scenarios will contain the true optimal solution + + common_pitfalls: + - title: Treating It Like a Linear Array + description: | + A common mistake is to directly apply the House Robber I solution without considering the circular constraint. + + For `nums = [2, 3, 2]`: + - Linear approach might yield `2 + 2 = 4` (houses 0 and 2) + - But houses 0 and 2 are adjacent in a circle! + - Correct answer is `3` (just house 1) + + Always remember: in a circle, index `0` and index `n-1` are neighbours. + wrong_approach: "Apply House Robber I directly" + correct_approach: "Split into two linear subproblems excluding first or last house" + + - title: Forgetting the Single House Case + description: | + When `nums.length == 1`, both scenarios (`nums[0:0]` and `nums[1:1]`) would be empty arrays, returning 0. + + But with one house, there are no neighbours — you can simply rob it! Always handle this edge case explicitly by returning `nums[0]` when `n == 1`. + wrong_approach: "Let the helper function handle all cases" + correct_approach: "Check n == 1 before splitting into scenarios" + + - title: Off-by-One in Array Slicing + description: | + When excluding the last house, use `nums[0:n-1]` (indices 0 to n-2 inclusive). + When excluding the first house, use `nums[1:n]` (indices 1 to n-1 inclusive). + + Python's slice notation is `[start:end)` — end is exclusive. A common error is: + - `nums[0:n-2]` — misses one house + - `nums[1:n+1]` — goes out of bounds + + Double-check your slice boundaries match the scenarios described. + + key_takeaways: + - "**Problem reduction**: Convert a harder problem (circular) into simpler subproblems (linear)" + - "**Mutual exclusion insight**: When constraints create mutually exclusive choices, solve each case separately" + - "**Reuse existing solutions**: House Robber II builds directly on House Robber I — recognise when you can leverage solved subproblems" + - "**Pattern for circular arrays**: Many circular array problems can be solved by breaking the cycle and running linear algorithms twice" + + time_complexity: "O(n). We run the linear House Robber algorithm twice, each taking O(n) time, giving O(2n) = O(n)." + space_complexity: "O(1). The space-optimised linear algorithm uses only two variables, and we run it twice sequentially." + +solutions: + - approach_name: Two-Pass Dynamic Programming + is_optimal: true + code: | + def rob(nums: list[int]) -> int: + # Edge case: single house has no circular constraint + if len(nums) == 1: + return nums[0] + + def rob_linear(houses: list[int]) -> int: + """Solve the linear House Robber problem.""" + prev2 = 0 # Max money from two houses back + prev1 = 0 # Max money from previous house + + for money in houses: + # Rob this house + prev2, or skip and keep prev1 + current = max(money + prev2, prev1) + prev2 = prev1 + prev1 = current + + return prev1 + + n = len(nums) + # Scenario A: exclude last house (can rob first) + # Scenario B: exclude first house (can rob last) + return max(rob_linear(nums[:n-1]), rob_linear(nums[1:])) + explanation: | + **Time Complexity:** O(n) — Two linear passes through subarrays of size n-1. + + **Space Complexity:** O(1) — Only uses constant extra space (two variables per pass). + + By excluding either the first or last house, we break the circular constraint and can apply the standard House Robber DP approach. The maximum of both scenarios gives us the optimal answer because any valid solution must exclude at least one of the endpoints. + + - approach_name: Two-Pass with Explicit Ranges + is_optimal: false + code: | + def rob(nums: list[int]) -> int: + n = len(nums) + + # Edge case: single house + if n == 1: + return nums[0] + + def rob_range(start: int, end: int) -> int: + """Rob houses from index start to end (inclusive).""" + prev2 = 0 + prev1 = 0 + + for i in range(start, end + 1): + current = max(nums[i] + prev2, prev1) + prev2 = prev1 + prev1 = current + + return prev1 + + # Exclude last house OR exclude first house + return max(rob_range(0, n - 2), rob_range(1, n - 1)) + explanation: | + **Time Complexity:** O(n) — Two passes through subarrays. + + **Space Complexity:** O(1) — Constant extra space. + + This version uses explicit index ranges instead of array slicing. It avoids creating subarray copies (though in practice, Python's slice is efficient). The logic is identical: solve two linear subproblems and take the maximum. + + - approach_name: DP with Array (Educational) + is_optimal: false + code: | + def rob(nums: list[int]) -> int: + n = len(nums) + + if n == 1: + return nums[0] + if n == 2: + return max(nums[0], nums[1]) + + def rob_linear(houses: list[int]) -> int: + """Standard House Robber with DP array.""" + m = len(houses) + if m == 1: + return houses[0] + + dp = [0] * m + dp[0] = houses[0] + dp[1] = max(houses[0], houses[1]) + + for i in range(2, m): + dp[i] = max(houses[i] + dp[i - 2], dp[i - 1]) + + return dp[m - 1] + + # Two scenarios: exclude last or exclude first + return max(rob_linear(nums[:-1]), rob_linear(nums[1:])) + explanation: | + **Time Complexity:** O(n) — Two linear passes. + + **Space Complexity:** O(n) — DP arrays of size n-1 for each pass. + + This version explicitly builds the DP table, making the recurrence relation easier to trace. Each `dp[i]` represents the maximum money from houses 0 to i in that subarray. While less space-efficient, this is useful for understanding the DP transition before optimising to O(1) space. diff --git a/backend/data/questions/house-robber-iii.yaml b/backend/data/questions/house-robber-iii.yaml new file mode 100644 index 0000000..abcb67e --- /dev/null +++ b/backend/data/questions/house-robber-iii.yaml @@ -0,0 +1,246 @@ +title: House Robber III +slug: house-robber-iii +difficulty: medium +leetcode_id: 337 +leetcode_url: https://leetcode.com/problems/house-robber-iii/ +categories: + - trees + - dynamic-programming +patterns: + - dfs + - dynamic-programming + +description: | + The thief has found himself a new place for his thievery again. There is only one entrance to this area, called `root`. + + Besides the `root`, each house has one and only one parent house. After a tour, the smart thief realised that all houses in this place form a **binary tree**. It will automatically contact the police if **two directly-linked houses were broken into on the same night**. + + Given the `root` of the binary tree, return *the maximum amount of money the thief can rob without alerting the police*. + +constraints: | + - The number of nodes in the tree is in the range `[1, 10^4]` + - `0 <= Node.val <= 10^4` + +examples: + - input: "root = [3,2,3,null,3,null,1]" + output: "7" + explanation: "Maximum amount of money the thief can rob = 3 + 3 + 1 = 7 (root and two grandchildren)." + - input: "root = [3,4,5,1,3,null,1]" + output: "9" + explanation: "Maximum amount of money the thief can rob = 4 + 5 = 9 (the two children of root)." + +explanation: + intuition: | + Picture a family tree where each person holds some cash. You want to collect as much money as possible, but there's a catch: if you take money from someone, you can't take from their direct parent or children — only from grandparents, grandchildren, or unrelated branches. + + The key insight is that at every node, you face a **binary choice**: + + - **Rob this node**: You get its value, but you *cannot* rob its children. However, you *can* rob its grandchildren (the children's children). + - **Skip this node**: You don't get its value, but you're free to rob its children (and potentially their children too). + + Think of it like this: each node needs to report back two pieces of information to its parent — *"Here's how much you can get if you rob me, and here's how much you can get if you skip me."* The parent then uses both pieces to make its own optimal decision. + + This naturally suggests a **post-order DFS** approach: process children first, collect their "rob/skip" information, then compute the current node's optimal values. + + approach: | + We solve this using **Tree DP with DFS**, where each node returns a pair of values: `(rob_this_node, skip_this_node)`. + + **Step 1: Define what each node returns** + + - `rob`: Maximum money if we rob this node (includes node's value, but excludes children) + - `skip`: Maximum money if we skip this node (children are free to be robbed or skipped) + +   + + **Step 2: Handle the base case** + + - For a `null` node (empty subtree), return `(0, 0)` — no money either way + - This provides the termination condition for our recursion + +   + + **Step 3: Recurse on children (post-order DFS)** + + - Call the function on `left` child, getting `(left_rob, left_skip)` + - Call the function on `right` child, getting `(right_rob, right_skip)` + - We now know the optimal values for both subtrees + +   + + **Step 4: Calculate current node's values** + + - `rob_current = node.val + left_skip + right_skip` + - If we rob this node, children must be skipped + - `skip_current = max(left_rob, left_skip) + max(right_rob, right_skip)` + - If we skip this node, each child independently chooses its best option + +   + + **Step 5: Return the answer** + + - At the root, return `max(rob_root, skip_root)` + - This gives the global maximum across all valid robbery plans + + common_pitfalls: + - title: Naive Recursion Without Memoisation + description: | + A tempting approach is to write a simple recursive function: + + ```python + def rob(node): + if not node: + return 0 + # Rob this node + grandchildren + rob_this = node.val + if node.left: + rob_this += rob(node.left.left) + rob(node.left.right) + if node.right: + rob_this += rob(node.right.left) + rob(node.right.right) + # Skip this node, rob children + skip_this = rob(node.left) + rob(node.right) + return max(rob_this, skip_this) + ``` + + This recalculates the same subtrees multiple times. For example, `rob(node.left)` is computed both when considering robbing the current node's grandchildren and when skipping the current node. This leads to **exponential time complexity O(2^n)** and will cause TLE. + wrong_approach: "Simple recursion visiting same nodes repeatedly" + correct_approach: "Return (rob, skip) pair so each node is visited exactly once" + + - title: Forgetting the Skip Option Gives Freedom + description: | + When you skip a node, you're not forced to rob its children — you simply have the *option* to rob them. + + The correct formula for `skip_current` is: + ``` + skip = max(left_rob, left_skip) + max(right_rob, right_skip) + ``` + + A common mistake is writing `skip = left_rob + right_rob`, which forces robbing both children. But sometimes skipping a child yields more money (e.g., if the grandchildren have higher values). + wrong_approach: "skip = left_rob + right_rob" + correct_approach: "skip = max(left_rob, left_skip) + max(right_rob, right_skip)" + + - title: Confusing Tree DP with Array DP + description: | + Unlike House Robber I (array) where you track `dp[i-1]` and `dp[i-2]`, tree DP tracks relationships via parent-child edges, not indices. + + You can't simply apply the array recurrence `dp[i] = max(nums[i] + dp[i-2], dp[i-1])` because: + - Trees have multiple children (not just one "previous" element) + - The "skip two" concept becomes "skip direct link" (rob grandchildren, not `i-2`) + - Each node can have 0, 1, or 2 children + + The pair-returning approach `(rob, skip)` is the tree analogue of the space-optimised array DP. + + key_takeaways: + - "**Tree DP pattern**: Return multiple values (rob/skip) from recursion to avoid redundant computation" + - "**Post-order traversal**: Process children first, then compute parent's answer from children's results" + - "**Binary choice at each node**: Rob (take value, skip children) vs Skip (children choose freely)" + - "**Generalises House Robber**: Same core constraint (no adjacent), different data structure (tree vs array)" + + time_complexity: "O(n). Each node is visited exactly once during the DFS traversal, and we do O(1) work per node." + space_complexity: "O(h) where h is the tree height. The recursion stack can grow as deep as the tree. In the worst case (skewed tree), this is O(n); for a balanced tree, it's O(log n)." + +solutions: + - approach_name: Tree DP with DFS + is_optimal: true + code: | + class TreeNode: + def __init__(self, val=0, left=None, right=None): + self.val = val + self.left = left + self.right = right + + def rob(root: TreeNode | None) -> int: + def dfs(node: TreeNode | None) -> tuple[int, int]: + # Base case: null node contributes nothing + if not node: + return (0, 0) + + # Post-order: process children first + left_rob, left_skip = dfs(node.left) + right_rob, right_skip = dfs(node.right) + + # If we rob this node, we must skip both children + rob_current = node.val + left_skip + right_skip + + # If we skip this node, each child chooses its best option + skip_current = max(left_rob, left_skip) + max(right_rob, right_skip) + + return (rob_current, skip_current) + + rob_root, skip_root = dfs(root) + return max(rob_root, skip_root) + explanation: | + **Time Complexity:** O(n) — Each node visited exactly once. + + **Space Complexity:** O(h) — Recursion stack depth equals tree height. + + Each node returns a pair: (max if robbed, max if skipped). The parent combines these to compute its own pair. At the root, we take the maximum of both options. This eliminates redundant computation by ensuring each subtree is evaluated exactly once. + + - approach_name: Naive Recursion (TLE) + is_optimal: false + code: | + class TreeNode: + def __init__(self, val=0, left=None, right=None): + self.val = val + self.left = left + self.right = right + + def rob(root: TreeNode | None) -> int: + if not root: + return 0 + + # Option 1: Rob this node + grandchildren + rob_this = root.val + if root.left: + rob_this += rob(root.left.left) + rob(root.left.right) + if root.right: + rob_this += rob(root.right.left) + rob(root.right.right) + + # Option 2: Skip this node, consider children + skip_this = rob(root.left) + rob(root.right) + + return max(rob_this, skip_this) + explanation: | + **Time Complexity:** O(2^n) — Exponential due to overlapping subproblems. + + **Space Complexity:** O(h) — Recursion stack depth. + + This approach correctly identifies the two choices (rob or skip) but recalculates the same subtrees multiple times. For example, `rob(root.left)` is computed both directly and indirectly through grandchildren. This causes TLE on large trees. Included to illustrate why the pair-returning approach is necessary. + + - approach_name: Recursion with Memoisation + is_optimal: false + code: | + class TreeNode: + def __init__(self, val=0, left=None, right=None): + self.val = val + self.left = left + self.right = right + + def rob(root: TreeNode | None) -> int: + memo = {} + + def helper(node: TreeNode | None) -> int: + if not node: + return 0 + if node in memo: + return memo[node] + + # Option 1: Rob this node + grandchildren + rob_this = node.val + if node.left: + rob_this += helper(node.left.left) + helper(node.left.right) + if node.right: + rob_this += helper(node.right.left) + helper(node.right.right) + + # Option 2: Skip this node, consider children + skip_this = helper(node.left) + helper(node.right) + + memo[node] = max(rob_this, skip_this) + return memo[node] + + return helper(root) + explanation: | + **Time Complexity:** O(n) — Each node computed once due to memoisation. + + **Space Complexity:** O(n) — Hash map stores result for each node, plus O(h) recursion stack. + + Adding memoisation to the naive approach fixes the exponential blowup. However, this uses O(n) extra space for the hash map, whereas the pair-returning approach achieves the same time complexity with only O(h) space. This solution is correct and efficient, but the optimal approach is more elegant. diff --git a/backend/data/questions/house-robber.yaml b/backend/data/questions/house-robber.yaml new file mode 100644 index 0000000..c17e5fd --- /dev/null +++ b/backend/data/questions/house-robber.yaml @@ -0,0 +1,182 @@ +title: House Robber +slug: house-robber +difficulty: medium +leetcode_id: 198 +leetcode_url: https://leetcode.com/problems/house-robber/ +categories: + - arrays + - dynamic-programming +patterns: + - dynamic-programming + +description: | + You are a professional robber planning to rob houses along a street. Each house has a certain amount of money stashed, the only constraint stopping you from robbing each of them is that adjacent houses have security systems connected and **it will automatically contact the police if two adjacent houses were broken into on the same night**. + + Given an integer array `nums` representing the amount of money of each house, return *the maximum amount of money you can rob tonight without alerting the police*. + +constraints: | + - `1 <= nums.length <= 100` + - `0 <= nums[i] <= 400` + +examples: + - input: "nums = [1,2,3,1]" + output: "4" + explanation: "Rob house 1 (money = 1) and then rob house 3 (money = 3). Total amount you can rob = 1 + 3 = 4." + - input: "nums = [2,7,9,3,1]" + output: "12" + explanation: "Rob house 1 (money = 2), rob house 3 (money = 9) and rob house 5 (money = 1). Total amount you can rob = 2 + 9 + 1 = 12." + +explanation: + intuition: | + Imagine walking down a street, deciding which houses to rob. At each house, you face a simple choice: **rob it or skip it**. + + If you rob the current house, you can't rob the previous one (they're adjacent). But if you skip the current house, you keep whatever maximum you could achieve up to the previous house. + + Think of it like this: for every house, you're asking *"What's better — taking this house plus the best I could do two houses ago, or skipping this house and keeping the best I could do at the previous house?"* + + This is the core insight: the **optimal decision at each house only depends on the optimal decisions for the previous two houses**. This "overlapping subproblems" property makes it a textbook dynamic programming problem. + + The key realisation is that you don't need to track *which specific houses* you robbed — you only need to track the **maximum money possible** up to each point. + + approach: | + We solve this using **Dynamic Programming with Space Optimisation**: + + **Step 1: Define the recurrence relation** + + - Let `dp[i]` represent the maximum money we can rob from houses `0` to `i` + - At each house `i`, we have two choices: + - **Rob house `i`**: Take `nums[i]` plus the best from two houses back: `nums[i] + dp[i-2]` + - **Skip house `i`**: Keep the best from the previous house: `dp[i-1]` + - The recurrence: `dp[i] = max(nums[i] + dp[i-2], dp[i-1])` + +   + + **Step 2: Recognise we only need two variables** + + - The recurrence only looks back two steps (`dp[i-1]` and `dp[i-2]`) + - Instead of storing an entire array, use two variables: + - `prev1`: Maximum money up to the previous house (i.e., `dp[i-1]`) + - `prev2`: Maximum money up to two houses back (i.e., `dp[i-2]`) + +   + + **Step 3: Iterate through each house** + + - For each house, calculate: `current = max(nums[i] + prev2, prev1)` + - Update variables: `prev2 = prev1`, then `prev1 = current` + - This "slides" our window of knowledge forward by one house + +   + + **Step 4: Return the result** + + - After processing all houses, `prev1` contains the maximum money achievable + - Return `prev1` + + common_pitfalls: + - title: The Greedy Trap (Alternating Houses) + description: | + A common first instinct is to simply take every other house — either all odd-indexed or all even-indexed houses. + + This fails for cases like `nums = [2, 1, 1, 2]`: + - Odd indices (0, 2): `2 + 1 = 3` + - Even indices (1, 3): `1 + 2 = 3` + - But optimal is indices (0, 3): `2 + 2 = 4` + + The pattern of which houses to rob isn't regular — it depends on the actual values. You might skip two houses in a row if the third house has a high value. + wrong_approach: "Take every other house" + correct_approach: "DP considering all valid combinations" + + - title: Off-by-One Errors in Base Cases + description: | + The DP approach requires handling base cases carefully: + - If there's only one house, return `nums[0]` + - If there are two houses, return `max(nums[0], nums[1])` + + Forgetting these edge cases leads to index out-of-bounds errors or incorrect results for small inputs. + wrong_approach: "Start iteration at index 0 without base cases" + correct_approach: "Handle n=1 and n=2 explicitly, then iterate from index 2" + + - title: Confusing the Variable Updates + description: | + When using the space-optimised approach, the order of updates matters: + + ```python + # WRONG: prev2 gets the new value before we use it + prev2 = prev1 + prev1 = max(nums[i] + prev2, prev1) + + # CORRECT: Calculate first, then update in order + current = max(nums[i] + prev2, prev1) + prev2 = prev1 + prev1 = current + ``` + + Always calculate the new value *before* updating the variables it depends on. + + key_takeaways: + - "**Classic DP pattern**: When optimal solutions depend on previous optimal solutions, think dynamic programming" + - "**Space optimisation**: If recurrence only looks back a fixed number of steps, replace the array with variables" + - "**Greedy doesn't always work**: Problems with non-local dependencies (like adjacency constraints) often need DP" + - "**Foundation for variants**: This logic extends to House Robber II (circular street) and House Robber III (binary tree)" + + time_complexity: "O(n). We iterate through the array exactly once, making a constant-time decision at each house." + space_complexity: "O(1). We only use two variables (`prev1` and `prev2`) regardless of input size, thanks to space optimisation." + +solutions: + - approach_name: Dynamic Programming (Space Optimised) + is_optimal: true + code: | + def rob(nums: list[int]) -> int: + # Edge case: only one house + if len(nums) == 1: + return nums[0] + + # prev2 = max money from two houses back + # prev1 = max money from previous house + prev2 = 0 + prev1 = nums[0] + + for i in range(1, len(nums)): + # Choice: rob this house + prev2, or skip and keep prev1 + current = max(nums[i] + prev2, prev1) + # Slide the window forward + prev2 = prev1 + prev1 = current + + return prev1 + explanation: | + **Time Complexity:** O(n) — Single pass through the array. + + **Space Complexity:** O(1) — Only two variables used. + + We iterate once, at each step choosing the better option: rob current house (add to best from 2 houses back) or skip (keep best from previous house). The space optimisation works because we only ever look back two positions. + + - approach_name: Dynamic Programming (Array) + is_optimal: false + code: | + def rob(nums: list[int]) -> int: + n = len(nums) + + # Edge cases + if n == 1: + return nums[0] + if n == 2: + return max(nums[0], nums[1]) + + # dp[i] = max money from houses 0..i + dp = [0] * n + dp[0] = nums[0] + dp[1] = max(nums[0], nums[1]) + + for i in range(2, n): + # Rob house i (add to best from i-2) or skip (keep best from i-1) + dp[i] = max(nums[i] + dp[i - 2], dp[i - 1]) + + return dp[n - 1] + explanation: | + **Time Complexity:** O(n) — Single pass through the array. + + **Space Complexity:** O(n) — We store the entire DP array. + + This version explicitly builds the DP table, making the recurrence relation clearer. Each `dp[i]` represents the maximum money achievable from houses 0 to i. While correct, it uses more space than necessary since we only need the last two values. diff --git a/backend/data/questions/implement-queue-using-stacks.yaml b/backend/data/questions/implement-queue-using-stacks.yaml new file mode 100644 index 0000000..8574421 --- /dev/null +++ b/backend/data/questions/implement-queue-using-stacks.yaml @@ -0,0 +1,204 @@ +title: Implement Queue using Stacks +slug: implement-queue-using-stacks +difficulty: easy +leetcode_id: 232 +leetcode_url: https://leetcode.com/problems/implement-queue-using-stacks/ +categories: + - stack + - queue +patterns: + - monotonic-stack + +description: | + Implement a first in first out (FIFO) queue using only two stacks. The implemented queue should support all the functions of a normal queue (`push`, `peek`, `pop`, and `empty`). + + Implement the `MyQueue` class: + + - `void push(int x)` Pushes element `x` to the back of the queue. + - `int pop()` Removes the element from the front of the queue and returns it. + - `int peek()` Returns the element at the front of the queue. + - `boolean empty()` Returns `true` if the queue is empty, `false` otherwise. + + **Notes:** + + - You must use **only** standard operations of a stack, which means only `push to top`, `peek/pop from top`, `size`, and `is empty` operations are valid. + - Depending on your language, the stack may not be supported natively. You may simulate a stack using a list or deque (double-ended queue) as long as you use only a stack's standard operations. + +constraints: | + - `1 <= x <= 9` + - At most `100` calls will be made to `push`, `pop`, `peek`, and `empty` + - All the calls to `pop` and `peek` are valid + +examples: + - input: | + ["MyQueue", "push", "push", "peek", "pop", "empty"] + [[], [1], [2], [], [], []] + output: "[null, null, null, 1, 1, false]" + explanation: | + MyQueue myQueue = new MyQueue(); + myQueue.push(1); // queue is: [1] + myQueue.push(2); // queue is: [1, 2] (leftmost is front of the queue) + myQueue.peek(); // return 1 + myQueue.pop(); // return 1, queue is [2] + myQueue.empty(); // return false + +explanation: + intuition: | + Think of this problem like having two buckets to simulate a conveyor belt. + + A **queue** follows First-In-First-Out (FIFO) order — the first item added is the first to leave, like a line at a coffee shop. A **stack** follows Last-In-First-Out (LIFO) order — the last item added is the first to leave, like a stack of plates. + + The key insight is that **reversing a stack gives you the opposite order**. If you push elements `1, 2, 3` onto a stack, they come out as `3, 2, 1`. But if you pop them all onto a *second* stack, they reverse again to `1, 2, 3` — exactly FIFO order! + + So we use two stacks: + - An **input stack** where new elements are pushed + - An **output stack** from which elements are popped + + When we need to pop or peek and the output stack is empty, we transfer all elements from the input stack to the output stack. This reversal converts the LIFO order to FIFO order. + + approach: | + We solve this using a **Two-Stack Approach** with lazy transfer: + + **Step 1: Initialise two stacks** + + - `input_stack`: Where we push new elements + - `output_stack`: Where we pop/peek elements from + +   + + **Step 2: Push operation** + + - Simply push the element onto `input_stack` + - This is always O(1) + +   + + **Step 3: Pop/Peek operation** + + - If `output_stack` is empty, transfer all elements from `input_stack` to `output_stack` + - Each transfer reverses the order, converting LIFO to FIFO + - Pop or peek from `output_stack` + +   + + **Step 4: Empty check** + + - Queue is empty only when both stacks are empty + +   + + The lazy transfer approach is key: we only move elements when necessary, which gives us amortised O(1) time per operation. + + common_pitfalls: + - title: Transferring on Every Operation + description: | + A naive approach might transfer elements between stacks on every push or pop. This leads to O(n) for every operation. + + The correct approach uses **lazy transfer**: only move elements from input to output when output is empty and we need to pop/peek. Each element is moved at most twice (once to input, once to output), giving amortised O(1). + wrong_approach: "Transfer between stacks on every operation" + correct_approach: "Lazy transfer only when output stack is empty" + + - title: Forgetting to Check Output Stack First + description: | + When implementing pop/peek, you must first check if the output stack has elements before transferring from input. If you always transfer, you'll break the FIFO order. + + For example, if output has `[1]` and input has `[2, 3]`, transferring would make output `[1, 3, 2]` which is wrong. + wrong_approach: "Always transfer from input to output" + correct_approach: "Only transfer when output is empty" + + - title: Not Handling Peek Efficiently + description: | + Some implementations might pop, save the value, then push back for peek. This is unnecessary. + + Since we have access to the top of the output stack, we can simply return the top element without modification. + + key_takeaways: + - "**Data structure simulation**: You can simulate one data structure with another by understanding their fundamental properties" + - "**Amortised analysis**: Each element is pushed and popped at most twice total, so n operations take O(n) time overall" + - "**Lazy evaluation**: Deferring work (transfers) until necessary often improves average performance" + - "**Related problem**: The inverse problem — implementing a stack using queues (LeetCode 225) — uses similar reversal logic" + + time_complexity: "O(1) amortised for all operations. Each element is moved between stacks at most once, so n operations take O(n) total." + space_complexity: "O(n). We store all n elements across the two stacks." + +solutions: + - approach_name: Two Stacks with Lazy Transfer + is_optimal: true + code: | + class MyQueue: + def __init__(self): + # Input stack: where we push new elements + self.input_stack = [] + # Output stack: where we pop/peek from + self.output_stack = [] + + def push(self, x: int) -> None: + # Always push to input stack - O(1) + self.input_stack.append(x) + + def pop(self) -> int: + # Ensure output stack has elements + self._transfer_if_needed() + # Pop from output stack - FIFO order + return self.output_stack.pop() + + def peek(self) -> int: + # Ensure output stack has elements + self._transfer_if_needed() + # Return top of output stack without removing + return self.output_stack[-1] + + def empty(self) -> bool: + # Queue is empty only when both stacks are empty + return not self.input_stack and not self.output_stack + + def _transfer_if_needed(self) -> None: + # Only transfer when output is empty - lazy evaluation + if not self.output_stack: + # Move all elements from input to output + # This reverses order: LIFO -> FIFO + while self.input_stack: + self.output_stack.append(self.input_stack.pop()) + explanation: | + **Time Complexity:** O(1) amortised for all operations. + - `push`: O(1) — direct append + - `pop`/`peek`: O(1) amortised — each element is transferred at most once + - `empty`: O(1) — two boolean checks + + **Space Complexity:** O(n) — storing n elements across both stacks. + + The key insight is lazy transfer: we only move elements when the output stack is empty. Since each element moves from input to output exactly once, the amortised cost per operation is O(1). + + - approach_name: Two Stacks with Eager Transfer + is_optimal: false + code: | + class MyQueue: + def __init__(self): + self.stack1 = [] # Main storage + self.stack2 = [] # Temporary for reversal + + def push(self, x: int) -> None: + # Move everything to stack2 + while self.stack1: + self.stack2.append(self.stack1.pop()) + # Push new element to bottom of stack1 + self.stack1.append(x) + # Move everything back + while self.stack2: + self.stack1.append(self.stack2.pop()) + + def pop(self) -> int: + # Top of stack1 is front of queue + return self.stack1.pop() + + def peek(self) -> int: + return self.stack1[-1] + + def empty(self) -> bool: + return not self.stack1 + explanation: | + **Time Complexity:** O(n) for push, O(1) for pop/peek/empty. + + **Space Complexity:** O(n) — storing n elements across both stacks. + + This approach maintains FIFO order at all times by doing expensive work during push. Every push transfers all elements twice. While pop/peek become O(1), push is O(n), making this less efficient than lazy transfer when pushes are frequent. diff --git a/backend/data/questions/implement-stack-using-queues.yaml b/backend/data/questions/implement-stack-using-queues.yaml new file mode 100644 index 0000000..15a9b81 --- /dev/null +++ b/backend/data/questions/implement-stack-using-queues.yaml @@ -0,0 +1,216 @@ +title: Implement Stack using Queues +slug: implement-stack-using-queues +difficulty: easy +leetcode_id: 225 +leetcode_url: https://leetcode.com/problems/implement-stack-using-queues/ +categories: + - stack + - queue +patterns: + - monotonic-stack + +description: | + Implement a last-in-first-out (LIFO) stack using only two queues. The implemented stack should support all the functions of a normal stack (`push`, `top`, `pop`, and `empty`). + + Implement the `MyStack` class: + + - `void push(int x)` Pushes element `x` to the top of the stack. + - `int pop()` Removes the element on the top of the stack and returns it. + - `int top()` Returns the element on the top of the stack. + - `boolean empty()` Returns `true` if the stack is empty, `false` otherwise. + + **Notes:** + + - You must use **only** standard operations of a queue, which means only `push to back`, `peek/pop from front`, `size`, and `is empty` operations are valid. + - Depending on your language, the queue may not be supported natively. You may simulate a queue using a list or deque (double-ended queue) as long as you use only a queue's standard operations. + + **Follow-up:** Can you implement the stack using only one queue? + +constraints: | + - `1 <= x <= 9` + - At most `100` calls will be made to `push`, `pop`, `top`, and `empty` + - All the calls to `pop` and `top` are valid + +examples: + - input: | + ["MyStack", "push", "push", "top", "pop", "empty"] + [[], [1], [2], [], [], []] + output: "[null, null, null, 2, 2, false]" + explanation: | + MyStack myStack = new MyStack(); + myStack.push(1); // stack is: [1] + myStack.push(2); // stack is: [1, 2] (rightmost is top of stack) + myStack.top(); // return 2 + myStack.pop(); // return 2, stack is [1] + myStack.empty(); // return false + +explanation: + intuition: | + Think of this problem like having a queue of people, but you want the *last* person who joined to be served first — the opposite of how a normal queue works. + + A **stack** follows Last-In-First-Out (LIFO) order — the most recently added item is the first to leave, like a stack of plates. A **queue** follows First-In-First-Out (FIFO) order — the first item added is the first to leave, like a line at a coffee shop. + + The key insight is that **rotating a queue puts the back element at the front**. If we push a new element to a queue and then rotate all the *other* elements behind it (by dequeuing from front and enqueuing to back), the new element ends up at the front — exactly where we need it for LIFO access! + + For example, if the queue contains `[1, 2]` (front to back) and we push `3`: + 1. Queue becomes `[1, 2, 3]` + 2. Rotate twice: dequeue `1`, enqueue `1` → `[2, 3, 1]` + 3. Rotate again: dequeue `2`, enqueue `2` → `[3, 1, 2]` + + Now `3` (the most recent) is at the front, ready to be popped first! + + approach: | + We solve this using a **Single Queue with Rotation** approach: + + **Step 1: Initialise one queue** + + - `queue`: A deque used with only queue operations (append to back, pop from front) + +   + + **Step 2: Push operation** + + - Append the new element to the back of the queue + - Rotate the queue by moving all *previous* elements behind the new one + - Specifically: pop from front and append to back, `n-1` times (where `n` is the current size) + - After rotation, the new element is at the front + +   + + **Step 3: Pop operation** + + - Simply pop from the front of the queue + - Since we maintain LIFO order, the front element is always the most recently pushed + +   + + **Step 4: Top operation** + + - Return the front element without removing it + - Use index access `queue[0]` or peek operation + +   + + **Step 5: Empty check** + + - Return whether the queue is empty + +   + + This approach makes push O(n) but keeps pop and top at O(1), which is often preferable since pops are typically more frequent than pushes. + + common_pitfalls: + - title: Rotating the Wrong Number of Times + description: | + When pushing a new element, you need to rotate exactly `n-1` elements (the elements that were already in the queue before the push), not `n` elements. + + If you rotate `n` times, you'll move the new element to the back again, undoing the work: + - Push `3` to `[1, 2]` → `[1, 2, 3]` + - Rotate 3 times: `[2, 3, 1]` → `[3, 1, 2]` → `[1, 2, 3]` (back to original!) + + The correct rotation count is `len(queue) - 1` after appending. + wrong_approach: "Rotate n times after pushing" + correct_approach: "Rotate n-1 times (previous size) after pushing" + + - title: Using Two Queues Unnecessarily + description: | + The problem mentions "two queues", but the follow-up asks if you can do it with one. Many solutions use two queues and swap between them, which adds complexity without benefit. + + The single-queue rotation approach is simpler and equally efficient. The two-queue approach might seem more intuitive at first, but it requires more bookkeeping. + + - title: Confusing Queue Operations with Deque Operations + description: | + In Python, `collections.deque` supports both `appendleft` and `append`, but a true queue only allows: + - `append` (enqueue to back) + - `popleft` (dequeue from front) + - `len` and checking if empty + + Using `appendleft` or `pop` (from back) violates the "queue only" constraint. Make sure your solution only uses valid queue operations. + + key_takeaways: + - "**Data structure simulation**: You can simulate one data structure with another by understanding their ordering properties" + - "**Rotation technique**: Moving elements from front to back of a queue is a powerful way to reorder elements" + - "**Trade-off decisions**: Making push expensive (O(n)) keeps pop/top cheap (O(1)) — choose based on expected usage patterns" + - "**Related problem**: The inverse problem — implementing a queue using stacks (LeetCode 232) — uses a similar reversal concept" + + time_complexity: "O(n) for push, O(1) for pop/top/empty. Each push rotates n-1 elements, while other operations access the front directly." + space_complexity: "O(n). We store all n elements in a single queue." + +solutions: + - approach_name: Single Queue with Rotation + is_optimal: true + code: | + from collections import deque + + class MyStack: + def __init__(self): + # Single queue to simulate stack + self.queue = deque() + + def push(self, x: int) -> None: + # Add new element to back + self.queue.append(x) + # Rotate all previous elements behind it + # This puts the new element at the front + for _ in range(len(self.queue) - 1): + self.queue.append(self.queue.popleft()) + + def pop(self) -> int: + # Front of queue is top of stack (most recent) + return self.queue.popleft() + + def top(self) -> int: + # Return front without removing + return self.queue[0] + + def empty(self) -> bool: + return len(self.queue) == 0 + explanation: | + **Time Complexity:** + - `push`: O(n) — rotate n-1 elements + - `pop`: O(1) — remove from front + - `top`: O(1) — access front element + - `empty`: O(1) — check length + + **Space Complexity:** O(n) — storing n elements in the queue. + + The key insight is rotating during push: after adding a new element to the back, we cycle all previous elements behind it. This ensures the most recently pushed element is always at the front, ready for O(1) pop/top access. + + - approach_name: Two Queues with Transfer + is_optimal: false + code: | + from collections import deque + + class MyStack: + def __init__(self): + self.q1 = deque() # Main queue + self.q2 = deque() # Temporary queue + + def push(self, x: int) -> None: + # Push to temporary queue first + self.q2.append(x) + # Move all elements from q1 to q2 + while self.q1: + self.q2.append(self.q1.popleft()) + # Swap queues - q2 becomes the new main queue + self.q1, self.q2 = self.q2, self.q1 + + def pop(self) -> int: + # Front of q1 is top of stack + return self.q1.popleft() + + def top(self) -> int: + return self.q1[0] + + def empty(self) -> bool: + return len(self.q1) == 0 + explanation: | + **Time Complexity:** + - `push`: O(n) — transfer all elements + - `pop`: O(1) — remove from front + - `top`: O(1) — access front element + - `empty`: O(1) — check length + + **Space Complexity:** O(n) — storing n elements across two queues. + + This approach uses two queues. On each push, we add the new element to an empty queue, then transfer all elements from the main queue. This puts the new element at the front. While correct, the single-queue rotation approach is simpler. diff --git a/backend/data/questions/implement-trie-prefix-tree.yaml b/backend/data/questions/implement-trie-prefix-tree.yaml new file mode 100644 index 0000000..66db4ba --- /dev/null +++ b/backend/data/questions/implement-trie-prefix-tree.yaml @@ -0,0 +1,245 @@ +title: Implement Trie (Prefix Tree) +slug: implement-trie-prefix-tree +difficulty: medium +leetcode_id: 208 +leetcode_url: https://leetcode.com/problems/implement-trie-prefix-tree/ +categories: + - strings + - hash-tables +patterns: + - trie + +description: | + A [**trie**](https://en.wikipedia.org/wiki/Trie) (pronounced as "try") or **prefix tree** is a tree data structure used to efficiently store and retrieve keys in a dataset of strings. There are various applications of this data structure, such as autocomplete and spellchecker. + + Implement the `Trie` class: + + - `Trie()` Initialises the trie object. + - `void insert(String word)` Inserts the string `word` into the trie. + - `boolean search(String word)` Returns `true` if the string `word` is in the trie (i.e., was inserted before), and `false` otherwise. + - `boolean startsWith(String prefix)` Returns `true` if there is a previously inserted string `word` that has the prefix `prefix`, and `false` otherwise. + +constraints: | + - `1 <= word.length, prefix.length <= 2000` + - `word` and `prefix` consist only of lowercase English letters. + - At most `3 * 10^4` calls **in total** will be made to `insert`, `search`, and `startsWith`. + +examples: + - input: | + ["Trie", "insert", "search", "search", "startsWith", "insert", "search"] + [[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]] + output: "[null, null, true, false, true, null, true]" + explanation: | + Trie trie = new Trie(); + trie.insert("apple"); + trie.search("apple"); // return True + trie.search("app"); // return False + trie.startsWith("app"); // return True + trie.insert("app"); + trie.search("app"); // return True + +explanation: + intuition: | + Imagine building a word-completion system like the one in your phone's keyboard. When you type "app", the system suggests "apple", "application", "approve", and so on. How can we efficiently store thousands of words and quickly find all words that start with a given prefix? + + A **trie** (prefix tree) is the perfect data structure for this. Think of it like a tree where each node represents a single character, and paths from the root to nodes spell out words or prefixes. Unlike a hash table which stores complete words, a trie shares common prefixes among words, making it extremely efficient for prefix-based operations. + + Visualise it like a family tree of letters: + ``` + root + | + a + | + p + | + p + / \ + l (end of "app") + | + e + | + (end of "apple") + ``` + + The key insight is that **each node stores its children** (the next possible characters) and a **flag indicating if a complete word ends here**. This allows us to distinguish between "app" being a complete word vs. just a prefix of "apple". + + approach: | + We implement the Trie using nodes, where each node contains: + - A dictionary/hashmap mapping characters to child nodes + - A boolean flag indicating if a word ends at this node + +   + + **Step 1: Define the TrieNode structure** + + - `children`: A dictionary to store child nodes, keyed by character + - `is_end_of_word`: A boolean flag, initially `False` + +   + + **Step 2: Implement insert(word)** + + - Start at the root node + - For each character in the word: + - If the character doesn't exist in current node's children, create a new node + - Move to the child node for this character + - After processing all characters, mark the final node as `is_end_of_word = True` + +   + + **Step 3: Implement search(word)** + + - Start at the root node + - For each character in the word: + - If the character doesn't exist in current node's children, return `False` + - Move to the child node for this character + - After processing all characters, return the value of `is_end_of_word` + - This distinguishes between finding a prefix vs. a complete word + +   + + **Step 4: Implement startsWith(prefix)** + + - Follow the same traversal as `search` + - The only difference: return `True` if we successfully traverse all characters + - We don't need to check `is_end_of_word` since we only care if the prefix exists + +   + + The beauty of this approach is that all three operations share the same traversal logic, just with different termination conditions. + + common_pitfalls: + - title: Confusing search() with startsWith() + description: | + A common mistake is implementing `search()` the same way as `startsWith()` — both traverse the trie, but they have different success conditions. + + For example, if we insert "apple" and then call `search("app")`, we should return `False` because "app" was never inserted as a complete word. However, `startsWith("app")` should return `True` because "apple" starts with "app". + + The fix: `search()` must check `is_end_of_word` after traversal, while `startsWith()` only needs to confirm the path exists. + wrong_approach: "Return True after traversing the prefix successfully in search()" + correct_approach: "Check is_end_of_word flag after traversal for search(), not for startsWith()" + + - title: Using Arrays Instead of Hash Maps + description: | + Some implementations use a fixed-size array of 26 elements (for lowercase letters a-z) instead of a hash map. While this works, it has drawbacks: + + - Wastes memory for sparse nodes (most nodes won't have all 26 children) + - Less flexible if requirements change (e.g., supporting uppercase or other characters) + + Using a hash map is more space-efficient for typical use cases and more adaptable. + wrong_approach: "Fixed array children[26] for every node" + correct_approach: "Dictionary/hash map for children" + + - title: Forgetting to Initialise the Root + description: | + The root node is special — it doesn't represent any character but serves as the starting point for all operations. Forgetting to initialise it in the constructor leads to null pointer errors. + + Always create an empty root node in `__init__()` with an empty children dictionary. + + key_takeaways: + - "**Trie fundamentals**: Each node has children (a map of character → node) and an `is_end_of_word` flag" + - "**Prefix sharing**: Tries naturally share common prefixes, making them memory-efficient for related words" + - "**O(m) operations**: All operations (insert, search, startsWith) run in O(m) time where m is the word/prefix length — independent of how many words are stored" + - "**Foundation for advanced problems**: Tries are essential for autocomplete, spell checking, word search, and problems like Word Search II" + + time_complexity: "O(m) for all operations, where `m` is the length of the word or prefix being processed. We traverse at most `m` nodes." + space_complexity: "O(n * m) in the worst case, where `n` is the number of words and `m` is the average word length. However, shared prefixes reduce actual space usage significantly." + +solutions: + - approach_name: Hash Map Based Trie + is_optimal: true + code: | + class TrieNode: + def __init__(self): + # Maps character -> child TrieNode + self.children: dict[str, 'TrieNode'] = {} + # True if a complete word ends at this node + self.is_end_of_word: bool = False + + + class Trie: + def __init__(self): + # Root node doesn't represent any character + self.root = TrieNode() + + def insert(self, word: str) -> None: + node = self.root + for char in word: + # Create child node if it doesn't exist + if char not in node.children: + node.children[char] = TrieNode() + # Move to the child node + node = node.children[char] + # Mark the end of the word + node.is_end_of_word = True + + def search(self, word: str) -> bool: + node = self._traverse(word) + # Word exists only if we found the path AND it's marked as end + return node is not None and node.is_end_of_word + + def startsWith(self, prefix: str) -> bool: + # Prefix exists if we can traverse to it (don't need end marker) + return self._traverse(prefix) is not None + + def _traverse(self, s: str) -> TrieNode | None: + """Helper to traverse the trie following string s. + Returns the final node if path exists, None otherwise.""" + node = self.root + for char in s: + if char not in node.children: + return None + node = node.children[char] + return node + explanation: | + **Time Complexity:** O(m) for all operations, where m is the length of the input string. + + **Space Complexity:** O(n * m) worst case for storing n words of average length m. + + This implementation uses a hash map for children, providing O(1) average-case lookup per character. The `_traverse` helper method eliminates code duplication between `search` and `startsWith`. + + - approach_name: Array Based Trie + is_optimal: false + code: | + class TrieNode: + def __init__(self): + # Fixed array for 26 lowercase letters (a=0, b=1, ..., z=25) + self.children: list[TrieNode | None] = [None] * 26 + self.is_end_of_word: bool = False + + + class Trie: + def __init__(self): + self.root = TrieNode() + + def insert(self, word: str) -> None: + node = self.root + for char in word: + # Convert character to index (a=0, b=1, etc.) + index = ord(char) - ord('a') + if node.children[index] is None: + node.children[index] = TrieNode() + node = node.children[index] + node.is_end_of_word = True + + def search(self, word: str) -> bool: + node = self._traverse(word) + return node is not None and node.is_end_of_word + + def startsWith(self, prefix: str) -> bool: + return self._traverse(prefix) is not None + + def _traverse(self, s: str) -> TrieNode | None: + node = self.root + for char in s: + index = ord(char) - ord('a') + if node.children[index] is None: + return None + node = node.children[index] + return node + explanation: | + **Time Complexity:** O(m) for all operations — same as hash map version. + + **Space Complexity:** O(n * 26 * m) worst case, since each node allocates 26 slots. + + This approach uses a fixed-size array instead of a hash map. It has O(1) guaranteed lookup (no hash collisions), but wastes memory for sparse nodes. Useful when you know the character set is small and fixed. diff --git a/backend/data/questions/insert-interval.yaml b/backend/data/questions/insert-interval.yaml new file mode 100644 index 0000000..955dddf --- /dev/null +++ b/backend/data/questions/insert-interval.yaml @@ -0,0 +1,194 @@ +title: Insert Interval +slug: insert-interval +difficulty: medium +leetcode_id: 57 +leetcode_url: https://leetcode.com/problems/insert-interval/ +categories: + - arrays +patterns: + - intervals + +description: | + You are given an array of non-overlapping intervals `intervals` where `intervals[i] = [start_i, end_i]` represent the start and the end of the ith interval and `intervals` is sorted in ascending order by `start_i`. You are also given an interval `newInterval = [start, end]` that represents the start and end of another interval. + + Insert `newInterval` into `intervals` such that `intervals` is still sorted in ascending order by `start_i` and `intervals` still does not have any overlapping intervals (merge overlapping intervals if necessary). + + Return `intervals` *after the insertion*. + + **Note** that you don't need to modify `intervals` in-place. You can make a new array and return it. + +constraints: | + - `0 <= intervals.length <= 10^4` + - `intervals[i].length == 2` + - `0 <= start_i <= end_i <= 10^5` + - `intervals` is sorted by `start_i` in **ascending** order + - `newInterval.length == 2` + - `0 <= start <= end <= 10^5` + +examples: + - input: "intervals = [[1,3],[6,9]], newInterval = [2,5]" + output: "[[1,5],[6,9]]" + explanation: "The new interval [2,5] overlaps with [1,3], so they merge into [1,5]. The interval [6,9] doesn't overlap and remains unchanged." + - input: "intervals = [[1,2],[3,5],[6,7],[8,10],[12,16]], newInterval = [4,8]" + output: "[[1,2],[3,10],[12,16]]" + explanation: "The new interval [4,8] overlaps with [3,5], [6,7], and [8,10]. These all merge into [3,10]. Intervals [1,2] and [12,16] don't overlap." + - input: "intervals = [], newInterval = [5,7]" + output: "[[5,7]]" + explanation: "When there are no existing intervals, simply return the new interval." + +explanation: + intuition: | + Imagine you have a timeline with several non-overlapping time blocks already scheduled, and you need to add a new meeting. Some existing blocks might overlap with your new meeting and need to be combined into one larger block. + + The key insight is that the intervals are **already sorted** by start time. This means we can process them in order, and any interval that overlaps with our new interval must be *consecutive* in the list. There can't be a non-overlapping interval sandwiched between two overlapping ones. + + Think of it as walking through a sorted list of events: first, we encounter events that end before our new event starts (no overlap, keep them). Then we hit events that overlap with ours (merge them all together). Finally, we see events that start after our merged event ends (no overlap, keep them too). + + This three-phase approach lets us solve the problem in a single pass through the intervals. + + approach: | + We solve this using a **Single Pass with Three Phases**: + + **Step 1: Initialise result list** + + - `result`: Empty list to store our final intervals + +   + + **Step 2: Add all intervals that come before the new interval** + + - Iterate through intervals while `intervals[i].end < newInterval.start` + - These intervals end before our new interval starts, so no overlap + - Add each of these directly to `result` + +   + + **Step 3: Merge all overlapping intervals** + + - Continue iterating while `intervals[i].start <= newInterval.end` + - These intervals overlap with our new interval (they start before it ends) + - Expand `newInterval` to encompass each overlapping interval: + - `newInterval.start = min(newInterval.start, intervals[i].start)` + - `newInterval.end = max(newInterval.end, intervals[i].end)` + - After processing all overlaps, add the merged `newInterval` to `result` + +   + + **Step 4: Add all remaining intervals** + + - Any remaining intervals start after our merged interval ends + - Add each of these directly to `result` + +   + + **Step 5: Return the result** + + - Return the `result` list containing all non-overlapping intervals + + common_pitfalls: + - title: Forgetting to Handle Edge Cases + description: | + The new interval might need to be inserted at the very beginning (before all existing intervals), at the very end (after all existing intervals), or the input might be empty. + + For example, with `intervals = [[3,5],[6,9]]` and `newInterval = [1,2]`, the new interval comes before everything. With `newInterval = [10,12]`, it comes after everything. + + The three-phase approach handles these naturally: if no intervals are "before," phase 2 starts immediately. If no intervals overlap, we just add the new interval. If no intervals are "after," we're done after phase 3. + wrong_approach: "Assuming the new interval always overlaps with something" + correct_approach: "Handle all three phases even if some are empty" + + - title: Incorrect Overlap Detection + description: | + Two intervals `[a, b]` and `[c, d]` overlap if and only if `a <= d` AND `c <= b`. A common mistake is checking only one condition. + + For example, `[1, 5]` and `[3, 7]` overlap because `1 <= 7` AND `3 <= 5`. + + In our algorithm, we use: "no overlap before" means `end < newStart`, and "overlap" means `start <= newEnd`. These conditions partition all intervals correctly. + wrong_approach: "Checking only if starts overlap or only if ends overlap" + correct_approach: "Check both conditions: interval.end >= new.start AND interval.start <= new.end" + + - title: Mutating the New Interval Incorrectly + description: | + When merging, you must expand `newInterval` using both `min` for the start and `max` for the end. A common bug is only updating one bound. + + For example, merging `[4, 8]` with `[3, 5]` should give `[3, 8]`, not `[4, 8]` or `[3, 5]`. + wrong_approach: "Only updating end or only updating start during merge" + correct_approach: "Always update: start = min(start, interval.start), end = max(end, interval.end)" + + key_takeaways: + - "**Intervals pattern**: When intervals are sorted, overlapping intervals are always consecutive, enabling single-pass solutions" + - "**Three-phase structure**: Before, during, and after overlap is a common pattern for interval insertion and merging problems" + - "**Overlap condition**: Two intervals `[a,b]` and `[c,d]` overlap if and only if `max(a,c) <= min(b,d)`" + - "**Foundation for harder problems**: This technique extends to Merge Intervals, Meeting Rooms, and interval scheduling problems" + + time_complexity: "O(n). We traverse the list of intervals exactly once, performing constant-time operations for each interval." + space_complexity: "O(n). We create a new result list that stores all intervals. In the worst case (no merging), this contains n+1 intervals." + +solutions: + - approach_name: Single Pass with Three Phases + is_optimal: true + code: | + def insert(intervals: list[list[int]], newInterval: list[int]) -> list[list[int]]: + result = [] + i = 0 + n = len(intervals) + + # Phase 1: Add all intervals that end before newInterval starts + while i < n and intervals[i][1] < newInterval[0]: + result.append(intervals[i]) + i += 1 + + # Phase 2: Merge all overlapping intervals with newInterval + while i < n and intervals[i][0] <= newInterval[1]: + # Expand newInterval to include the overlapping interval + newInterval[0] = min(newInterval[0], intervals[i][0]) + newInterval[1] = max(newInterval[1], intervals[i][1]) + i += 1 + + # Add the merged interval + result.append(newInterval) + + # Phase 3: Add all intervals that start after newInterval ends + while i < n: + result.append(intervals[i]) + i += 1 + + return result + explanation: | + **Time Complexity:** O(n) — Single pass through all intervals. + + **Space Complexity:** O(n) — Result list stores up to n+1 intervals. + + We process intervals in three phases: (1) add non-overlapping intervals before, (2) merge all overlapping intervals into one, (3) add non-overlapping intervals after. The sorted property guarantees overlapping intervals are consecutive. + + - approach_name: Binary Search Optimisation + is_optimal: false + code: | + import bisect + + def insert(intervals: list[list[int]], newInterval: list[int]) -> list[list[int]]: + if not intervals: + return [newInterval] + + # Find where overlaps might start and end using binary search + starts = [interval[0] for interval in intervals] + ends = [interval[1] for interval in intervals] + + # Find first interval that might overlap (ends >= newInterval start) + left = bisect.bisect_left(ends, newInterval[0]) + + # Find last interval that might overlap (starts <= newInterval end) + right = bisect.bisect_right(starts, newInterval[1]) + + # If there are overlapping intervals, merge them + if left < right: + newInterval[0] = min(newInterval[0], intervals[left][0]) + newInterval[1] = max(newInterval[1], intervals[right - 1][1]) + + # Build result: before + merged + after + return intervals[:left] + [newInterval] + intervals[right:] + explanation: | + **Time Complexity:** O(n) — While binary search is O(log n), slicing creates new lists in O(n). + + **Space Complexity:** O(n) — Creating lists for starts, ends, and the result. + + This approach uses binary search to find the range of overlapping intervals quickly. While binary search itself is O(log n), the overall complexity remains O(n) due to list slicing. This approach is more elegant but not faster in practice. It's included to show how binary search can identify overlap boundaries. diff --git a/backend/data/questions/insert-into-a-binary-search-tree.yaml b/backend/data/questions/insert-into-a-binary-search-tree.yaml new file mode 100644 index 0000000..caf864c --- /dev/null +++ b/backend/data/questions/insert-into-a-binary-search-tree.yaml @@ -0,0 +1,182 @@ +title: Insert into a Binary Search Tree +slug: insert-into-a-binary-search-tree +difficulty: easy +leetcode_id: 701 +leetcode_url: https://leetcode.com/problems/insert-into-a-binary-search-tree/ +categories: + - trees +patterns: + - tree-traversal + +description: | + You are given the `root` node of a binary search tree (BST) and a `val` to insert into the tree. Return *the root node of the BST after the insertion*. It is **guaranteed** that the new value does not exist in the original BST. + + **Notice** that there may exist multiple valid ways for the insertion, as long as the tree remains a BST after insertion. You can return **any of them**. + +constraints: | + - `0 <= Number of nodes <= 10^4` + - `-10^8 <= Node.val <= 10^8` + - All values `Node.val` are **unique** + - `-10^8 <= val <= 10^8` + - It's **guaranteed** that `val` does not exist in the original BST + +examples: + - input: "root = [4,2,7,1,3], val = 5" + output: "[4,2,7,1,3,5]" + explanation: "Insert 5 as the left child of 7, since 5 < 7 and 7 has no left child. Another valid answer would insert 5 elsewhere while maintaining BST properties." + - input: "root = [40,20,60,10,30,50,70], val = 25" + output: "[40,20,60,10,30,50,70,null,null,25]" + explanation: "Navigate right from 20 (since 25 > 20), then left from 30 (since 25 < 30). Insert 25 as the left child of 30." + - input: "root = [], val = 5" + output: "[5]" + explanation: "When the tree is empty, the new value becomes the root." + +explanation: + intuition: | + Think of a BST as a decision tree for binary search. At each node, you ask: "Is my value smaller or larger than this node?" The answer tells you which direction to go — left for smaller, right for larger. + + The key insight is that **every value has exactly one "correct" leaf position** where it can be inserted without restructuring the tree. You simply follow the BST property until you hit an empty spot (`None`), and that's where the new node belongs. + + Imagine you're looking up a word in a dictionary. You flip to the middle, decide if your word comes before or after, and keep narrowing down. When you reach the exact spot where your word *would* be if it existed, that's where you insert it. + + This means insertion is essentially a **search that ends at a null pointer**, and we replace that null with our new node. + + approach: | + We solve this using a **Recursive BST Traversal**: + + **Step 1: Handle the base case** + + - If `root` is `None`, we've found the insertion point + - Create and return a new `TreeNode` with the given value + +   + + **Step 2: Decide which subtree to explore** + + - If `val < root.val`, the new node belongs in the **left subtree** + - If `val > root.val`, the new node belongs in the **right subtree** + +   + + **Step 3: Recursively insert and connect** + + - Make a recursive call on the appropriate child + - Assign the result back to `root.left` or `root.right` + - This automatically handles the case where the child was `None` + +   + + **Step 4: Return the root** + + - Return the (unchanged) root to maintain the tree structure + - The recursive assignment ensures the new node gets properly linked + +   + + The recursion naturally terminates when we reach a `None` child, which becomes the insertion point. By assigning the recursive result back to the parent's child pointer, we elegantly connect the new node. + + common_pitfalls: + - title: Forgetting to Handle the Empty Tree + description: | + When `root` is `None` (empty tree), you must return a new node as the root. Some solutions only handle the case of inserting into existing trees, causing a crash or returning `None` for empty input. + + Always check for `root is None` as your base case and return the new node. + wrong_approach: "Only handling non-empty trees" + correct_approach: "Check if root is None and return new TreeNode(val)" + + - title: Not Connecting the New Node + description: | + A common mistake is to traverse to the correct position but forget to actually link the new node to its parent. Simply creating a new node isn't enough — you must assign it to the parent's `left` or `right` pointer. + + The recursive approach handles this elegantly by assigning `root.left = insertIntoBST(root.left, val)`. + wrong_approach: "Creating node without linking to parent" + correct_approach: "Assign recursive result back to parent's child pointer" + + - title: Modifying Existing Node Values + description: | + BST insertion adds a **new node**, not modifying an existing one. Don't try to swap values or restructure the tree — simply find the correct empty spot and insert there. + + The problem guarantees the value doesn't exist, so you'll always reach a `None` position. + wrong_approach: "Changing existing node values" + correct_approach: "Always insert as a new leaf node" + + key_takeaways: + - "**BST property drives insertion**: Left for smaller, right for larger — no complex logic needed" + - "**Recursion simplifies tree operations**: The base case handles insertion, recursive calls handle navigation" + - "**Insertion is search + create**: Follow the search path until hitting `None`, then insert" + - "**Foundation for BST operations**: This pattern extends to deletion, search, and validation problems" + + time_complexity: "O(h) where h is the height of the tree. In a balanced BST, h = log(n), giving O(log n). In the worst case (skewed tree), h = n, giving O(n)." + space_complexity: "O(h) for the recursion stack. In a balanced tree this is O(log n), worst case O(n) for a skewed tree. The iterative approach achieves O(1) space." + +solutions: + - approach_name: Recursive + is_optimal: true + code: | + class TreeNode: + def __init__(self, val=0, left=None, right=None): + self.val = val + self.left = left + self.right = right + + def insert_into_bst(root: TreeNode | None, val: int) -> TreeNode: + # Base case: found the insertion point + if root is None: + return TreeNode(val) + + # Decide which subtree to insert into + if val < root.val: + # Value belongs in left subtree + root.left = insert_into_bst(root.left, val) + else: + # Value belongs in right subtree + root.right = insert_into_bst(root.right, val) + + # Return root to maintain tree structure + return root + explanation: | + **Time Complexity:** O(h) — We traverse from root to a leaf, where h is the tree height. + + **Space Complexity:** O(h) — Recursion stack depth equals the path length. + + The recursive approach elegantly handles both navigation and connection. When we hit `None`, we return the new node, and the parent's assignment (`root.left = ...`) automatically links it. + + - approach_name: Iterative + is_optimal: true + code: | + class TreeNode: + def __init__(self, val=0, left=None, right=None): + self.val = val + self.left = left + self.right = right + + def insert_into_bst(root: TreeNode | None, val: int) -> TreeNode: + # Handle empty tree + if root is None: + return TreeNode(val) + + # Find the correct position + current = root + while True: + if val < current.val: + # Go left + if current.left is None: + # Found insertion point + current.left = TreeNode(val) + break + current = current.left + else: + # Go right + if current.right is None: + # Found insertion point + current.right = TreeNode(val) + break + current = current.right + + return root + explanation: | + **Time Complexity:** O(h) — Same traversal as recursive approach. + + **Space Complexity:** O(1) — No recursion stack, only a single pointer. + + The iterative approach uses a while loop to find the insertion point, then directly links the new node. This avoids recursion overhead and is slightly more efficient in practice, though both have the same time complexity. diff --git a/backend/data/questions/integer-break.yaml b/backend/data/questions/integer-break.yaml new file mode 100644 index 0000000..7ce9513 --- /dev/null +++ b/backend/data/questions/integer-break.yaml @@ -0,0 +1,215 @@ +title: Integer Break +slug: integer-break +difficulty: medium +leetcode_id: 343 +leetcode_url: https://leetcode.com/problems/integer-break/ +categories: + - dynamic-programming + - math +patterns: + - dynamic-programming + - greedy + +description: | + Given an integer `n`, break it into the sum of `k` **positive integers**, where `k >= 2`, and maximize the product of those integers. + + Return *the maximum product you can get*. + +constraints: | + - `2 <= n <= 58` + +examples: + - input: "n = 2" + output: "1" + explanation: "2 = 1 + 1, 1 x 1 = 1." + - input: "n = 10" + output: "36" + explanation: "10 = 3 + 3 + 4, 3 x 3 x 4 = 36." + +explanation: + intuition: | + Imagine you have a rope of length `n` and you must cut it into at least two pieces. You want the **product of the piece lengths** to be as large as possible. + + The key mathematical insight is: **3s are magical**. When you break a number into parts, using 3s (with some 2s for adjustment) produces the maximum product. + + Why? Consider that for any number greater than 4, breaking off a 3 gives a better product than keeping it whole. For example, `6` as a single piece contributes 6 to the product, but `3 + 3` contributes `3 x 3 = 9`. + + Think of it like this: you're trying to pack as many 3s as possible because they're the most "efficient" multiplier. The exceptions are: + - If the remainder is 1, you should take one 3 back and make two 2s instead (because `2 x 2 = 4 > 3 x 1 = 3`) + - If the remainder is 2, just keep it as a 2 + + This greedy insight can also be approached with dynamic programming, where we build up optimal products for smaller numbers. + + approach: | + We can solve this using either a **Mathematical (Greedy)** approach or **Dynamic Programming**. The math approach is O(1), but DP helps understand the structure. + + **Mathematical Approach:** + + **Step 1: Handle base cases** + + - If `n == 2`: Return `1` (must split into `1 + 1`) + - If `n == 3`: Return `2` (must split into `1 + 2`, giving `1 x 2 = 2`) + +   + + **Step 2: Divide n by 3 to determine the split** + + - If `n % 3 == 0`: Use all 3s. The answer is `3^(n/3)` + - If `n % 3 == 1`: Use one fewer 3 and add two 2s. The answer is `3^(n/3 - 1) x 4` + - If `n % 3 == 2`: Use all 3s plus one 2. The answer is `3^(n/3) x 2` + +   + + **Dynamic Programming Approach:** + + **Step 1: Initialise the DP array** + + - Create array `dp` of size `n + 1` where `dp[i]` represents the maximum product for integer `i` + - Set `dp[1] = 1` as the base case + +   + + **Step 2: Fill the DP table** + + - For each `i` from `2` to `n`: + - Try every possible first cut `j` from `1` to `i - 1` + - The product is either `j x (i - j)` (if we don't break further) or `j x dp[i - j]` (if we continue breaking) + - Take the maximum across all cuts + +   + + **Step 3: Return the result** + + - Return `dp[n]` + + common_pitfalls: + - title: Forgetting Base Cases + description: | + For `n = 2` and `n = 3`, the optimal "unforced" choice would be to not break at all, but the problem requires at least 2 pieces. + + For `n = 2`: Must return `1` (from `1 + 1`) + For `n = 3`: Must return `2` (from `1 + 2`), not `3` + + When using DP, remember that `dp[2]` and `dp[3]` used as subproblems can return their full value (2 and 3), since the "must break" constraint only applies to the original number. + wrong_approach: "Returning 2 for n=2 or 3 for n=3" + correct_approach: "Handle n=2 and n=3 as special cases with forced splits" + + - title: Not Considering Both Options in DP + description: | + When computing `dp[i]`, for each cut position `j`, you must consider two options: + - `j x (i - j)`: Don't break the remaining part further + - `j x dp[i - j]`: Break the remaining part optimally + + Missing the first option means you miss cases where not breaking further is optimal. For example, when `i = 4` and `j = 2`, the answer is `2 x 2 = 4`, not `2 x dp[2] = 2 x 1 = 2`. + wrong_approach: "Only considering j x dp[i - j]" + correct_approach: "max(j x (i - j), j x dp[i - j]) for each j" + + - title: Using 1s in the Split + description: | + Including 1 in your split is almost always suboptimal. For any factor of 1, you could add that 1 to another factor to increase the product. + + For example, `3 + 3 + 1` gives `3 x 3 x 1 = 9`, but `3 + 4` gives `3 x 4 = 12`. + + The only time 1 appears is in forced base cases (`n = 2` and `n = 3`). + wrong_approach: "Splitting into parts that include 1" + correct_approach: "Only use 2s and 3s (except for base cases)" + + key_takeaways: + - "**The power of 3**: For maximising products, 3 is the optimal factor (provable via calculus or discrete analysis)" + - "**Greedy meets math**: Sometimes a mathematical insight replaces the need for DP entirely, reducing O(n^2) to O(1)" + - "**DP transition structure**: The pattern of choosing whether to break further (`j x (i-j)` vs `j x dp[i-j]`) appears in many partition problems" + - "**Related problems**: This connects to *Cutting a Rod*, *Partition Equal Subset Sum*, and other optimisation over partitions" + + time_complexity: "O(1) for the mathematical approach. O(n^2) for dynamic programming, as we compute each `dp[i]` by iterating through all possible cuts." + space_complexity: "O(1) for the mathematical approach. O(n) for dynamic programming to store the `dp` array." + +solutions: + - approach_name: Mathematical (Greedy) + is_optimal: true + code: | + def integer_break(n: int) -> int: + # Base cases: must split, but would prefer not to + if n == 2: + return 1 # 1 + 1 = 2, product = 1 + if n == 3: + return 2 # 1 + 2 = 3, product = 2 + + # For n >= 4, use as many 3s as possible + if n % 3 == 0: + # n is divisible by 3, use all 3s + return 3 ** (n // 3) + elif n % 3 == 1: + # Remainder 1: take one 3 back, use 2 + 2 instead + # (because 2 x 2 = 4 > 3 x 1 = 3) + return 3 ** (n // 3 - 1) * 4 + else: + # Remainder 2: use all 3s plus one 2 + return 3 ** (n // 3) * 2 + explanation: | + **Time Complexity:** O(1) — Just arithmetic operations (exponentiation is O(log n) but with small exponents here). + + **Space Complexity:** O(1) — Only a few variables used. + + The mathematical insight is that 3 is the optimal factor. For any `n >= 5`, breaking off a 3 and multiplying gives a larger product than keeping the number whole. We handle remainders: if `n % 3 == 1`, we use `2 + 2` instead of `3 + 1` since `4 > 3`. + + - approach_name: Dynamic Programming + is_optimal: false + code: | + def integer_break(n: int) -> int: + # dp[i] = maximum product for integer i + dp = [0] * (n + 1) + dp[1] = 1 # Base case + + for i in range(2, n + 1): + for j in range(1, i): + # Option 1: don't break (i - j) further + # Option 2: break (i - j) optimally using dp + product = max(j * (i - j), j * dp[i - j]) + dp[i] = max(dp[i], product) + + return dp[n] + explanation: | + **Time Complexity:** O(n^2) — For each `i` from 2 to n, we try all cuts from 1 to i-1. + + **Space Complexity:** O(n) — We store the dp array of size n+1. + + For each number `i`, we try every possible first cut `j`. The remaining part `i - j` can either stay whole (giving `j * (i - j)`) or be broken further (giving `j * dp[i - j]`). We take the maximum across all possibilities. + + - approach_name: Recursion with Memoization + is_optimal: false + code: | + def integer_break(n: int) -> int: + memo = {} + + def helper(num: int, must_break: bool) -> int: + # If we've computed this before, return cached result + if (num, must_break) in memo: + return memo[(num, must_break)] + + # Base case + if num <= 1: + return num + + # If we don't have to break, we can return num itself + if not must_break: + result = num + else: + result = 0 + + # Try all possible first cuts + for i in range(1, num): + # First piece is i, remaining is num - i (which can stay whole) + product = i * helper(num - i, False) + result = max(result, product) + + memo[(num, must_break)] = result + return result + + # Start with must_break=True since we need at least 2 pieces + return helper(n, True) + explanation: | + **Time Complexity:** O(n^2) — Each subproblem is solved once, and each takes O(n) to compute. + + **Space Complexity:** O(n) — Memoization cache plus recursion stack. + + This top-down approach explicitly tracks whether we're forced to break. The original call has `must_break=True`, but recursive calls for remaining parts use `must_break=False` since they can stay whole if that's optimal. diff --git a/backend/data/questions/interleaving-string.yaml b/backend/data/questions/interleaving-string.yaml new file mode 100644 index 0000000..497c6bf --- /dev/null +++ b/backend/data/questions/interleaving-string.yaml @@ -0,0 +1,249 @@ +title: Interleaving String +slug: interleaving-string +difficulty: medium +leetcode_id: 97 +leetcode_url: https://leetcode.com/problems/interleaving-string/ +categories: + - strings + - dynamic-programming +patterns: + - dynamic-programming + +description: | + Given strings `s1`, `s2`, and `s3`, find whether `s3` is formed by an **interleaving** of `s1` and `s2`. + + An **interleaving** of two strings `s` and `t` is a configuration where `s` and `t` are divided into `n` and `m` substrings respectively, such that: + + - `s = s1 + s2 + ... + sn` + - `t = t1 + t2 + ... + tm` + - `|n - m| <= 1` + - The **interleaving** is `s1 + t1 + s2 + t2 + s3 + t3 + ...` or `t1 + s1 + t2 + s2 + t3 + s3 + ...` + + **Note:** `a + b` is the concatenation of strings `a` and `b`. + +constraints: | + - `0 <= s1.length, s2.length <= 100` + - `0 <= s3.length <= 200` + - `s1`, `s2`, and `s3` consist of lowercase English letters. + +examples: + - input: 's1 = "aabcc", s2 = "dbbca", s3 = "aadbbcbcac"' + output: "true" + explanation: "One way to obtain s3 is: Split s1 into s1 = \"aa\" + \"bc\" + \"c\", and s2 into s2 = \"dbbc\" + \"a\". Interleaving the two splits, we get \"aa\" + \"dbbc\" + \"bc\" + \"a\" + \"c\" = \"aadbbcbcac\"." + - input: 's1 = "aabcc", s2 = "dbbca", s3 = "aadbbbaccc"' + output: "false" + explanation: "It is impossible to interleave s2 with any other string to obtain s3." + - input: 's1 = "", s2 = "", s3 = ""' + output: "true" + explanation: "Two empty strings trivially interleave to form an empty string." + +explanation: + intuition: | + Imagine you have two decks of cards (representing `s1` and `s2`), and you want to merge them into a single pile (`s3`) while preserving the relative order of cards within each original deck. The question is: can `s3` be formed by picking cards alternately (or in any valid interleaving pattern) from the tops of these two decks? + + The key insight is that at any point in building `s3`, you have a **choice**: take the next character from `s1` or from `s2`. This decision tree branches exponentially, but many branches lead to the same "state" — defined by how many characters we've used from each string. + + Think of it as navigating a 2D grid where the x-axis represents progress through `s1` and the y-axis represents progress through `s2`. Starting at `(0, 0)`, you want to reach `(len(s1), len(s2))`. At each cell `(i, j)`, you can move right (use a character from `s1`) or down (use a character from `s2`) — but only if that character matches the next character needed in `s3`. + + This grid perspective reveals the **optimal substructure**: whether we can reach `(i, j)` depends only on whether we could reach `(i-1, j)` or `(i, j-1)` with a matching character. This is the hallmark of dynamic programming. + + approach: | + We solve this using **2D Dynamic Programming**: + + **Step 1: Early termination check** + + - If `len(s1) + len(s2) != len(s3)`, return `False` immediately — the lengths don't match, so interleaving is impossible + +   + + **Step 2: Initialize the DP table** + + - Create a 2D boolean table `dp` of size `(len(s1) + 1) x (len(s2) + 1)` + - `dp[i][j]` represents: "Can the first `i` characters of `s1` and first `j` characters of `s2` interleave to form the first `i + j` characters of `s3`?" + - Set `dp[0][0] = True` — empty strings trivially interleave to form an empty string + +   + + **Step 3: Fill the first row and column** + + - First row (`dp[0][j]`): Can `s2[:j]` alone form `s3[:j]`? Only if all characters match sequentially + - First column (`dp[i][0]`): Can `s1[:i]` alone form `s3[:i]`? Only if all characters match sequentially + - These represent paths that use only one string + +   + + **Step 4: Fill the rest of the table** + + - For each cell `dp[i][j]`, check two possibilities: + - **From the left** (`dp[i-1][j]`): If we could form `s3[:i+j-1]` and `s1[i-1] == s3[i+j-1]`, then `dp[i][j] = True` + - **From above** (`dp[i][j-1]`): If we could form `s3[:i+j-1]` and `s2[j-1] == s3[i+j-1]`, then `dp[i][j] = True` + - Either path being valid makes the current state valid + +   + + **Step 5: Return the answer** + + - Return `dp[len(s1)][len(s2)]` — whether we can use all of both strings to form all of `s3` + + common_pitfalls: + - title: Exponential Brute Force + description: | + A naive recursive approach tries every possible way to interleave: + - At each position in `s3`, try matching with `s1` or `s2` + - This leads to `2^(m+n)` possibilities in the worst case + + With `s1.length, s2.length <= 100`, this means up to `2^200` operations — astronomically too slow. The key insight is that many recursive calls compute the same subproblem (same `(i, j)` position), making this a perfect candidate for memoization or bottom-up DP. + wrong_approach: "Recursive backtracking without memoization" + correct_approach: "Dynamic programming with O(m*n) states" + + - title: Forgetting the Length Check + description: | + If `len(s1) + len(s2) != len(s3)`, it's impossible to interleave — every character from `s1` and `s2` must appear exactly once in `s3`. + + Without this early check, your DP might give false positives for cases like `s1 = "a"`, `s2 = "b"`, `s3 = "ab"` (valid) vs `s3 = "abc"` (invalid — extra character). Always verify lengths first. + wrong_approach: "Skip length validation" + correct_approach: "Check len(s1) + len(s2) == len(s3) upfront" + + - title: Off-by-One Index Errors + description: | + The DP table has dimensions `(m+1) x (n+1)` to handle empty prefixes. When accessing characters: + - `dp[i][j]` uses `s1[i-1]` and `s2[j-1]` (0-indexed strings) + - The corresponding `s3` character is at index `i + j - 1` + + Confusing 0-indexed strings with 1-indexed DP indices is a common source of bugs. Draw the grid and trace through an example to verify your indexing. + wrong_approach: "Using dp[i][j] with s1[i] and s2[j]" + correct_approach: "Using dp[i][j] with s1[i-1] and s2[j-1]" + + key_takeaways: + - "**2D DP for two sequences**: When combining or comparing two sequences, think of a 2D grid where axes represent progress through each sequence" + - "**State definition is crucial**: Here, `dp[i][j]` captures whether prefixes of length `i` and `j` can form a prefix of `s3` — a clean, sufficient state" + - "**Space optimization possible**: The follow-up asks for `O(s2.length)` space — since each row only depends on the previous row and current row, you can use a 1D array" + - "**Early termination**: Simple checks like length validation can save significant computation and handle edge cases cleanly" + + time_complexity: "O(m * n). We fill a 2D table of size `(m+1) x (n+1)` where `m = len(s1)` and `n = len(s2)`, with O(1) work per cell." + space_complexity: "O(m * n). We store a 2D boolean table. This can be optimized to O(n) by using a 1D array and updating in-place." + +solutions: + - approach_name: 2D Dynamic Programming + is_optimal: true + code: | + def is_interleave(s1: str, s2: str, s3: str) -> bool: + m, n = len(s1), len(s2) + + # Early termination: lengths must match + if m + n != len(s3): + return False + + # dp[i][j] = can s1[:i] and s2[:j] interleave to form s3[:i+j]? + dp = [[False] * (n + 1) for _ in range(m + 1)] + + # Base case: empty strings form empty string + dp[0][0] = True + + # Fill first column: using only s1 + for i in range(1, m + 1): + dp[i][0] = dp[i - 1][0] and s1[i - 1] == s3[i - 1] + + # Fill first row: using only s2 + for j in range(1, n + 1): + dp[0][j] = dp[0][j - 1] and s2[j - 1] == s3[j - 1] + + # Fill the rest of the table + for i in range(1, m + 1): + for j in range(1, n + 1): + # Current position in s3 + k = i + j - 1 + # Can we get here from the left (using s1[i-1])? + from_s1 = dp[i - 1][j] and s1[i - 1] == s3[k] + # Can we get here from above (using s2[j-1])? + from_s2 = dp[i][j - 1] and s2[j - 1] == s3[k] + dp[i][j] = from_s1 or from_s2 + + return dp[m][n] + explanation: | + **Time Complexity:** O(m * n) — We iterate through every cell in the DP table once. + + **Space Complexity:** O(m * n) — We store the full 2D table. + + This solution builds up the answer by considering all valid ways to consume characters from `s1` and `s2`. Each cell represents a subproblem that's computed exactly once. + + - approach_name: Space-Optimized DP (1D Array) + is_optimal: true + code: | + def is_interleave(s1: str, s2: str, s3: str) -> bool: + m, n = len(s1), len(s2) + + # Early termination: lengths must match + if m + n != len(s3): + return False + + # Use 1D array: dp[j] represents dp[i][j] for current row i + dp = [False] * (n + 1) + + # Fill the DP table row by row + for i in range(m + 1): + for j in range(n + 1): + if i == 0 and j == 0: + dp[j] = True + elif i == 0: + # First row: only using s2 + dp[j] = dp[j - 1] and s2[j - 1] == s3[j - 1] + elif j == 0: + # First column: only using s1 + dp[j] = dp[j] and s1[i - 1] == s3[i - 1] + else: + # General case: from left (s1) or from above (s2) + k = i + j - 1 + dp[j] = (dp[j] and s1[i - 1] == s3[k]) or \ + (dp[j - 1] and s2[j - 1] == s3[k]) + + return dp[n] + explanation: | + **Time Complexity:** O(m * n) — Same iteration as 2D approach. + + **Space Complexity:** O(n) — Only one row of the DP table is stored. + + This answers the follow-up question. Since each row only depends on the current and previous row values, we can overwrite the array in-place. `dp[j]` holds the "from above" value before we update it, and `dp[j-1]` holds the already-updated "from left" value. + + - approach_name: Recursive with Memoization + is_optimal: false + code: | + def is_interleave(s1: str, s2: str, s3: str) -> bool: + m, n = len(s1), len(s2) + + if m + n != len(s3): + return False + + # Memoization cache + memo = {} + + def dp(i: int, j: int) -> bool: + # Base case: used all characters + if i == m and j == n: + return True + + # Check cache + if (i, j) in memo: + return memo[(i, j)] + + k = i + j # Current position in s3 + result = False + + # Try using next character from s1 + if i < m and s1[i] == s3[k]: + result = dp(i + 1, j) + + # Try using next character from s2 + if not result and j < n and s2[j] == s3[k]: + result = dp(i, j + 1) + + memo[(i, j)] = result + return result + + return dp(0, 0) + explanation: | + **Time Complexity:** O(m * n) — Each unique `(i, j)` pair is computed once. + + **Space Complexity:** O(m * n) — For the memoization cache, plus O(m + n) recursion stack. + + This top-down approach is often more intuitive to write. It explores the decision tree but caches results to avoid redundant computation. The bottom-up DP is generally preferred for avoiding stack overflow on large inputs. diff --git a/backend/data/questions/invert-binary-tree.yaml b/backend/data/questions/invert-binary-tree.yaml new file mode 100644 index 0000000..2fad6af --- /dev/null +++ b/backend/data/questions/invert-binary-tree.yaml @@ -0,0 +1,214 @@ +title: Invert Binary Tree +slug: invert-binary-tree +difficulty: easy +leetcode_id: 226 +leetcode_url: https://leetcode.com/problems/invert-binary-tree/ +categories: + - trees + - recursion +patterns: + - tree-traversal + - dfs + - bfs + +description: | + Given the `root` of a binary tree, invert the tree, and return *its root*. + + Inverting a binary tree means swapping the left and right children of every node in the tree, creating a mirror image of the original structure. + +constraints: | + - The number of nodes in the tree is in the range `[0, 100]` + - `-100 <= Node.val <= 100` + +examples: + - input: "root = [4,2,7,1,3,6,9]" + output: "[4,7,2,9,6,3,1]" + explanation: "The tree is inverted by swapping left and right children at every level. Node 4's children swap (2↔7), then node 7's children swap (6↔9) and node 2's children swap (1↔3)." + - input: "root = [2,1,3]" + output: "[2,3,1]" + explanation: "The root's left child (1) and right child (3) are swapped." + - input: "root = []" + output: "[]" + explanation: "An empty tree remains empty after inversion." + +explanation: + intuition: | + Imagine holding a mirror up to a binary tree. The reflection you see is the inverted tree — every left branch becomes a right branch, and vice versa. + + The key insight is that **inverting a tree is a recursive operation**: to invert a tree rooted at any node, you simply swap its left and right children, then recursively invert each subtree. This naturally follows the structure of the tree itself. + + Think of it like this: if someone asked you to mirror-flip a family tree diagram, you'd swap each parent's children, then do the same for each of those children's subtrees. The operation is the same at every level — a perfect fit for recursion. + + The beauty of this problem is that the recursive solution directly mirrors the problem definition: invert the left subtree, invert the right subtree, then swap them. + + approach: | + We solve this using a **Recursive (DFS) Approach**: + + **Step 1: Handle the base case** + + - If the node is `None`, return `None` immediately + - This handles empty trees and serves as the recursion termination condition + +   + + **Step 2: Swap the children** + + - Store the left child in a temporary variable (or use simultaneous assignment) + - Assign the right child to the left + - Assign the stored left child to the right + - This mirrors the current node's immediate children + +   + + **Step 3: Recursively invert subtrees** + + - Call `invert_tree` on the new left child (which was originally the right child) + - Call `invert_tree` on the new right child (which was originally the left child) + - This ensures all descendants are also inverted + +   + + **Step 4: Return the root** + + - Return the current node after its subtree has been fully inverted + - This allows the recursion to build back up correctly + +   + + The order of swapping vs recursing doesn't matter — you can swap first then recurse, or recurse first then swap. Both produce the same result because every node gets visited and swapped exactly once. + + common_pitfalls: + - title: Forgetting the Base Case + description: | + Without checking for `None`, your recursion will crash when it tries to access `.left` or `.right` on a null node. + + Always start recursive tree functions with: + ```python + if not root: + return None + ``` + wrong_approach: "Directly accessing root.left without null check" + correct_approach: "Check if root is None before any operations" + + - title: Only Swapping at One Level + description: | + A common mistake is to swap the root's children but forget to recursively process the subtrees. + + For example, with `[4,2,7,1,3,6,9]`, only swapping at the root gives `[4,7,2,1,3,6,9]` — the grandchildren are in the wrong positions. You need to continue swapping at every level. + wrong_approach: "Only swapping root.left and root.right" + correct_approach: "Recursively invert both subtrees after swapping" + + - title: Overwriting Before Saving + description: | + If you write `root.left = root.right` first, you lose the original left child before you can assign it to the right. + + Use either a temporary variable or Python's simultaneous assignment: + ```python + root.left, root.right = root.right, root.left + ``` + wrong_approach: "Sequential assignment without temp variable" + correct_approach: "Simultaneous swap or use temporary variable" + + key_takeaways: + - "**Recursive tree operations**: Many tree problems have elegant recursive solutions where you process subtrees and combine results" + - "**Base case discipline**: Always handle the `None` case first in tree recursion" + - "**Multiple valid approaches**: This can be solved with DFS (recursive or stack-based) or BFS (queue-based) — all with the same complexity" + - "**Famous interview problem**: This problem gained notoriety when a senior engineer reportedly couldn't solve it on a whiteboard, highlighting that even simple problems require practice" + + time_complexity: "O(n). We visit each node exactly once to swap its children, where n is the number of nodes in the tree." + space_complexity: "O(h). The recursion stack can grow to the height of the tree. In the worst case (skewed tree), this is O(n). For a balanced tree, it's O(log n)." + +solutions: + - approach_name: Recursive DFS + is_optimal: true + code: | + class TreeNode: + def __init__(self, val=0, left=None, right=None): + self.val = val + self.left = left + self.right = right + + def invert_tree(root: TreeNode | None) -> TreeNode | None: + # Base case: empty tree or leaf's null child + if not root: + return None + + # Swap the left and right children + root.left, root.right = root.right, root.left + + # Recursively invert both subtrees + invert_tree(root.left) + invert_tree(root.right) + + # Return the root of the inverted tree + return root + explanation: | + **Time Complexity:** O(n) — Each node is visited exactly once. + + **Space Complexity:** O(h) — Recursion stack depth equals tree height. + + This elegant solution directly mirrors the problem definition. We swap children at the current node, then recursively handle subtrees. The simultaneous assignment `root.left, root.right = root.right, root.left` safely swaps without needing a temporary variable. + + - approach_name: Iterative BFS + is_optimal: true + code: | + from collections import deque + + def invert_tree(root: TreeNode | None) -> TreeNode | None: + if not root: + return None + + # Use a queue for level-order traversal + queue = deque([root]) + + while queue: + # Process the next node + node = queue.popleft() + + # Swap its children + node.left, node.right = node.right, node.left + + # Add children to queue for processing + if node.left: + queue.append(node.left) + if node.right: + queue.append(node.right) + + return root + explanation: | + **Time Complexity:** O(n) — Each node is visited exactly once. + + **Space Complexity:** O(w) — Queue holds at most one level, where w is the maximum width of the tree. In the worst case (complete tree), this is O(n/2) = O(n). + + This iterative approach uses BFS to visit nodes level by level. At each node, we swap its children and add them to the queue. This avoids recursion stack overhead but uses explicit queue memory instead. + + - approach_name: Iterative DFS (Stack) + is_optimal: true + code: | + def invert_tree(root: TreeNode | None) -> TreeNode | None: + if not root: + return None + + # Use a stack for depth-first traversal + stack = [root] + + while stack: + # Process the next node + node = stack.pop() + + # Swap its children + node.left, node.right = node.right, node.left + + # Add children to stack for processing + if node.left: + stack.append(node.left) + if node.right: + stack.append(node.right) + + return root + explanation: | + **Time Complexity:** O(n) — Each node is visited exactly once. + + **Space Complexity:** O(h) — Stack depth equals tree height in the worst case. + + This converts the recursive DFS to an iterative approach using an explicit stack. The logic is identical to the recursive version but avoids potential stack overflow for very deep trees. The order of processing differs from BFS but the final result is the same. diff --git a/backend/data/questions/ipo.yaml b/backend/data/questions/ipo.yaml new file mode 100644 index 0000000..2e7f307 --- /dev/null +++ b/backend/data/questions/ipo.yaml @@ -0,0 +1,207 @@ +title: IPO +slug: ipo +difficulty: hard +leetcode_id: 502 +leetcode_url: https://leetcode.com/problems/ipo/ +categories: + - arrays + - heap + - sorting +patterns: + - heap + - greedy + +description: | + Suppose LeetCode will start its **IPO** soon. In order to sell a good price of its shares to Venture Capital, LeetCode would like to work on some projects to increase its capital before the **IPO**. Since it has limited resources, it can only finish at most `k` distinct projects before the **IPO**. Help LeetCode design the best way to maximize its total capital after finishing at most `k` distinct projects. + + You are given `n` projects where the ith project has a pure profit `profits[i]` and a minimum capital of `capital[i]` is needed to start it. + + Initially, you have `w` capital. When you finish a project, you will obtain its pure profit and the profit will be added to your total capital. + + Pick a list of **at most** `k` distinct projects from given projects to **maximize your final capital**, and return *the final maximized capital*. + + The answer is guaranteed to fit in a 32-bit signed integer. + +constraints: | + - `1 <= k <= 10^5` + - `0 <= w <= 10^9` + - `n == profits.length` + - `n == capital.length` + - `1 <= n <= 10^5` + - `0 <= profits[i] <= 10^4` + - `0 <= capital[i] <= 10^9` + +examples: + - input: "k = 2, w = 0, profits = [1,2,3], capital = [0,1,1]" + output: "4" + explanation: "Since your initial capital is 0, you can only start the project indexed 0. After finishing it you will obtain profit 1 and your capital becomes 1. With capital 1, you can either start the project indexed 1 or the project indexed 2. Since you can choose at most 2 projects, you need to finish the project indexed 2 to get the maximum capital. Therefore, output the final maximized capital, which is 0 + 1 + 3 = 4." + - input: "k = 3, w = 0, profits = [1,2,3], capital = [0,1,2]" + output: "6" + explanation: "With initial capital 0, start project 0 (profit 1, capital becomes 1). With capital 1, start project 1 (profit 2, capital becomes 3). With capital 3, start project 2 (profit 3, capital becomes 6)." + +explanation: + intuition: | + Imagine you're an investor with limited starting capital, and you want to grow your wealth as quickly as possible by completing projects. Each project requires a minimum investment (capital) to start, but once completed, you pocket the profit and can reinvest. + + The key insight is a **greedy observation**: at any point, among all the projects you *can* afford (those with `capital[i] <= current_capital`), you should always pick the one with the **highest profit**. Why? Because maximising your capital at each step opens up more project options for future rounds. + + Think of it like this: you're standing at the edge of a pool of projects. Some are within reach (you can afford them), others are too expensive. Among those within reach, grab the most valuable one. After completing it, your reach extends further, potentially unlocking more lucrative projects. + + This greedy strategy works because: + 1. Completing a project never decreases your capital (profits are non-negative) + 2. More capital means more options — you can only unlock projects, never lose access to previously affordable ones + 3. We want to maximise final capital, so greedily maximising at each step is optimal + + The challenge is efficiently finding the highest-profit affordable project at each step, which is where the **two-heap** (or sorted array + max-heap) approach shines. + + approach: | + We solve this using a **Greedy approach with a Max-Heap**: + + **Step 1: Pair and sort projects by capital requirement** + + - Create pairs of `(capital[i], profits[i])` for each project + - Sort these pairs by capital requirement in ascending order + - This allows us to efficiently unlock projects as our capital grows + +   + + **Step 2: Initialise tracking variables** + + - `current_capital`: Set to `w` (our starting capital) + - `max_heap`: Empty heap to store profits of affordable projects (use negative values for max-heap in Python) + - `project_index`: Set to `0` to track which projects we've processed + +   + + **Step 3: Repeat up to k times (greedy selection)** + + - **Unlock projects**: While there are unprocessed projects and the next project's capital requirement is within our budget, push its profit onto the max-heap and move to the next project + - **Select best project**: If the heap is non-empty, pop the maximum profit and add it to `current_capital` + - **Early exit**: If no projects are affordable (heap is empty), we cannot proceed further — break early + +   + + **Step 4: Return the result** + + - Return `current_capital` after completing up to `k` projects + +   + + The sorting ensures we process projects in order of affordability, while the max-heap lets us instantly retrieve the highest-profit option among all currently affordable projects. + + common_pitfalls: + - title: Brute Force Selection + description: | + A naive approach might scan all projects at each step to find the best affordable one: + - For each of `k` rounds, scan all `n` projects + - Check if affordable and track the maximum profit + + This results in **O(k × n)** time complexity. With `k` and `n` both up to `10^5`, this means up to 10 billion operations — causing **Time Limit Exceeded (TLE)**. + + The heap-based approach reduces this to O(n log n) for sorting + O(k log n) for heap operations. + wrong_approach: "Linear scan for best affordable project each round" + correct_approach: "Max-heap to track affordable projects" + + - title: Using a Min-Heap Instead of Max-Heap + description: | + Python's `heapq` module implements a min-heap by default. If you push profits directly, you'll get the *smallest* profit, not the largest. + + Always negate profits when pushing (`-profit`) and negate again when popping to get the actual maximum. Alternatively, use a max-heap wrapper. + wrong_approach: "heappush(heap, profit)" + correct_approach: "heappush(heap, -profit) and negate when popping" + + - title: Forgetting Early Termination + description: | + If at any point no projects are affordable (heap is empty after unlocking), continuing the loop is wasteful. More importantly, trying to pop from an empty heap causes an error. + + Always check if the heap is non-empty before popping. If empty, break out of the loop early — no further progress is possible. + wrong_approach: "Always iterate k times" + correct_approach: "Break early if no affordable projects remain" + + - title: Not Sorting by Capital + description: | + Without sorting by capital requirement, you'd need to scan all projects each round to find affordable ones. Sorting by capital allows linear unlocking as your capital grows — once a project is unaffordable, all subsequent ones (in sorted order) are too. + wrong_approach: "Check all projects for affordability each round" + correct_approach: "Sort by capital, unlock in order as budget grows" + + key_takeaways: + - "**Greedy + Heap pattern**: When repeatedly selecting the 'best' option from a growing set, use a heap to efficiently track candidates" + - "**Two-phase processing**: Sort by one criterion (capital) to control unlocking, heap by another (profit) to optimise selection" + - "**Greedy validity**: This greedy approach works because completing projects only increases capital, never restricting future options" + - "**Real-world analogy**: This mirrors investment strategies where you reinvest profits to access larger opportunities — a common pattern in scheduling and resource allocation problems" + + time_complexity: "O(n log n). Sorting takes O(n log n), and we perform at most n heap pushes and k heap pops, each O(log n)." + space_complexity: "O(n). We store all projects as pairs and the heap can hold up to n profit values." + +solutions: + - approach_name: Greedy with Max-Heap + is_optimal: true + code: | + import heapq + + def find_maximized_capital(k: int, w: int, profits: list[int], capital: list[int]) -> int: + n = len(profits) + + # Pair projects as (capital_required, profit) and sort by capital + projects = sorted(zip(capital, profits)) + + current_capital = w + max_heap = [] # Max-heap (using negative values) + project_index = 0 + + for _ in range(k): + # Unlock all projects we can now afford + while project_index < n and projects[project_index][0] <= current_capital: + # Push negative profit for max-heap behavior + heapq.heappush(max_heap, -projects[project_index][1]) + project_index += 1 + + # If no affordable projects, we're done + if not max_heap: + break + + # Take the most profitable affordable project + current_capital += -heapq.heappop(max_heap) + + return current_capital + explanation: | + **Time Complexity:** O(n log n) — Sorting dominates; heap operations are O(log n) each. + + **Space Complexity:** O(n) — Storage for sorted pairs and heap. + + We sort projects by capital requirement, then greedily select the highest-profit affordable project at each step using a max-heap. The sorted order ensures we efficiently unlock projects as our capital grows. + + - approach_name: Brute Force + is_optimal: false + code: | + def find_maximized_capital(k: int, w: int, profits: list[int], capital: list[int]) -> int: + n = len(profits) + current_capital = w + completed = [False] * n # Track which projects are done + + for _ in range(k): + best_profit = -1 + best_index = -1 + + # Find the best affordable project + for i in range(n): + if not completed[i] and capital[i] <= current_capital: + if profits[i] > best_profit: + best_profit = profits[i] + best_index = i + + # No affordable project found + if best_index == -1: + break + + # Complete the best project + completed[best_index] = True + current_capital += best_profit + + return current_capital + explanation: | + **Time Complexity:** O(k × n) — For each of k rounds, scan all n projects. + + **Space Complexity:** O(n) — Boolean array to track completed projects. + + This approach scans all projects each round to find the best affordable one. While correct, it's too slow for large inputs where k and n approach 10^5. Included to illustrate why the heap optimisation is necessary. diff --git a/backend/data/questions/island-perimeter.yaml b/backend/data/questions/island-perimeter.yaml new file mode 100644 index 0000000..05647c4 --- /dev/null +++ b/backend/data/questions/island-perimeter.yaml @@ -0,0 +1,218 @@ +title: Island Perimeter +slug: island-perimeter +difficulty: easy +leetcode_id: 463 +leetcode_url: https://leetcode.com/problems/island-perimeter/ +categories: + - arrays + - math +patterns: + - matrix-traversal + +description: | + You are given a `row x col` grid representing a map where `grid[i][j] = 1` represents land and `grid[i][j] = 0` represents water. + + Grid cells are connected **horizontally/vertically** (not diagonally). The `grid` is completely surrounded by water, and there is exactly one island (i.e., one or more connected land cells). + + The island doesn't have "lakes", meaning the water inside isn't connected to the water around the island. One cell is a square with side length 1. The grid is rectangular, width and height don't exceed 100. + + Determine the perimeter of the island. + +constraints: | + - `row == grid.length` + - `col == grid[i].length` + - `1 <= row, col <= 100` + - `grid[i][j]` is `0` or `1` + - There is exactly one island in `grid` + +examples: + - input: "grid = [[0,1,0,0],[1,1,1,0],[0,1,0,0],[1,1,0,0]]" + output: "16" + explanation: "The perimeter is formed by counting the edges of land cells that touch water or the grid boundary." + - input: "grid = [[1]]" + output: "4" + explanation: "A single land cell has 4 sides, all contributing to the perimeter." + - input: "grid = [[1,0]]" + output: "4" + explanation: "The single land cell is surrounded by water on one side and grid boundaries on the others." + +explanation: + intuition: | + Imagine looking at the island from above, like a map. Each land cell is a square with 4 sides. The **perimeter** is the total length of the island's outer boundary — every edge where land meets water or the edge of the grid. + + Think of it like this: if you placed a fence around every land cell, you'd have 4 fence segments per cell. But when two land cells are **adjacent** (share an edge), those touching sides are *internal* to the island — they shouldn't count toward the perimeter. + + The key insight is: **each land cell contributes 4 to the perimeter, minus 2 for each neighbour it has**. Why minus 2? Because when two cells share an edge, that edge is counted by both cells, but it's actually an internal edge that shouldn't be part of the perimeter. We lose 1 from each cell's contribution. + + Alternatively, you can think of it as: for each land cell, count how many of its 4 sides touch water or the boundary. That's its direct contribution to the perimeter. + + approach: | + We solve this using a **Simple Counting** approach: + + **Step 1: Initialise counters** + + - `perimeter`: Set to `0` to accumulate the total perimeter + +   + + **Step 2: Iterate through every cell in the grid** + + - Use nested loops to visit each cell at position `(i, j)` + - If `grid[i][j] == 1` (it's a land cell), count its perimeter contribution + +   + + **Step 3: For each land cell, check all 4 sides** + + - **Top side**: If `i == 0` (top boundary) or `grid[i-1][j] == 0` (water above), add 1 + - **Bottom side**: If `i == rows-1` (bottom boundary) or `grid[i+1][j] == 0` (water below), add 1 + - **Left side**: If `j == 0` (left boundary) or `grid[i][j-1] == 0` (water left), add 1 + - **Right side**: If `j == cols-1` (right boundary) or `grid[i][j+1] == 0` (water right), add 1 + +   + + **Step 4: Return the total perimeter** + + - After checking all cells, return the accumulated `perimeter` + +   + + This approach works because we directly count the edges that form the island's boundary — any edge touching water or the grid boundary contributes to the perimeter. + + common_pitfalls: + - title: Counting Internal Edges + description: | + A common mistake is to count 4 for every land cell without subtracting shared edges between adjacent land cells. + + For example, if two land cells are horizontally adjacent, the edge between them is internal — it's not part of the perimeter. You must either subtract these internal edges or only count edges that touch water/boundary. + + With the grid `[[1,1]]`, simply counting `4 * 2 = 8` is wrong. The correct answer is `6` because the two cells share one edge. + wrong_approach: "Count 4 for every land cell" + correct_approach: "Count edges touching water or boundary only" + + - title: Index Out of Bounds + description: | + When checking neighbours, it's easy to accidentally access `grid[i-1][j]` when `i == 0`, causing an index error. + + Always check boundary conditions **before** accessing neighbouring cells. The order of conditions matters: `i == 0 or grid[i-1][j] == 0` short-circuits correctly, but `grid[i-1][j] == 0 or i == 0` will crash. + wrong_approach: "Check neighbour first, then boundary" + correct_approach: "Check boundary first (short-circuit evaluation)" + + - title: Overcomplicating with DFS/BFS + description: | + While DFS or BFS can solve this problem, they're unnecessary complexity for this particular task. The problem states there's exactly one island with no lakes, so you don't need to track visited cells or flood-fill. + + A simple double loop examining each cell independently is cleaner and equally efficient at O(rows * cols). + wrong_approach: "Implement full DFS/BFS traversal" + correct_approach: "Simple iteration checking each cell's edges" + + key_takeaways: + - "**Count contributions**: Each land cell contributes its edges that touch water or boundaries" + - "**Boundary checks first**: Use short-circuit evaluation to avoid index errors when checking neighbours" + - "**Matrix traversal pattern**: Iterating through a 2D grid with nested loops is fundamental for many problems" + - "**Simplicity wins**: Don't overcomplicate — this problem doesn't need DFS/BFS despite being tagged as such" + + time_complexity: "O(m * n). We visit each cell in the grid exactly once, where m is the number of rows and n is the number of columns." + space_complexity: "O(1). We only use a single counter variable regardless of input size." + +solutions: + - approach_name: Simple Counting + is_optimal: true + code: | + def island_perimeter(grid: list[list[int]]) -> int: + rows, cols = len(grid), len(grid[0]) + perimeter = 0 + + for i in range(rows): + for j in range(cols): + # Only process land cells + if grid[i][j] == 1: + # Check all 4 sides - add 1 for each edge touching water/boundary + + # Top: boundary or water above + if i == 0 or grid[i - 1][j] == 0: + perimeter += 1 + + # Bottom: boundary or water below + if i == rows - 1 or grid[i + 1][j] == 0: + perimeter += 1 + + # Left: boundary or water to the left + if j == 0 or grid[i][j - 1] == 0: + perimeter += 1 + + # Right: boundary or water to the right + if j == cols - 1 or grid[i][j + 1] == 0: + perimeter += 1 + + return perimeter + explanation: | + **Time Complexity:** O(m * n) — We iterate through every cell once. + + **Space Complexity:** O(1) — Only a counter variable is used. + + For each land cell, we check its 4 neighbours. If a neighbour is water or out of bounds, that edge contributes to the perimeter. This direct approach is clean and efficient. + + - approach_name: Count Land and Subtract Neighbours + is_optimal: true + code: | + def island_perimeter(grid: list[list[int]]) -> int: + rows, cols = len(grid), len(grid[0]) + land_cells = 0 + neighbour_edges = 0 + + for i in range(rows): + for j in range(cols): + if grid[i][j] == 1: + land_cells += 1 + + # Count neighbours (only check right and down to avoid double counting) + if i < rows - 1 and grid[i + 1][j] == 1: + neighbour_edges += 1 + if j < cols - 1 and grid[i][j + 1] == 1: + neighbour_edges += 1 + + # Each land cell contributes 4, each shared edge removes 2 from perimeter + return land_cells * 4 - neighbour_edges * 2 + explanation: | + **Time Complexity:** O(m * n) — Single pass through the grid. + + **Space Complexity:** O(1) — Two counter variables. + + This alternative approach uses the formula: `perimeter = 4 * land_cells - 2 * shared_edges`. Each land cell starts with 4 sides. Each pair of adjacent land cells shares an edge, removing 2 from the total perimeter (1 from each cell). We only check right and down neighbours to avoid counting each shared edge twice. + + - approach_name: DFS Traversal + is_optimal: false + code: | + def island_perimeter(grid: list[list[int]]) -> int: + rows, cols = len(grid), len(grid[0]) + visited = set() + + def dfs(i: int, j: int) -> int: + # Out of bounds or water - this edge contributes 1 to perimeter + if i < 0 or i >= rows or j < 0 or j >= cols or grid[i][j] == 0: + return 1 + + # Already visited - don't count again + if (i, j) in visited: + return 0 + + visited.add((i, j)) + + # Explore all 4 directions and sum up perimeter + return (dfs(i - 1, j) + dfs(i + 1, j) + + dfs(i, j - 1) + dfs(i, j + 1)) + + # Find the first land cell and start DFS + for i in range(rows): + for j in range(cols): + if grid[i][j] == 1: + return dfs(i, j) + + return 0 + explanation: | + **Time Complexity:** O(m * n) — Each cell is visited at most once. + + **Space Complexity:** O(m * n) — Visited set and recursion stack in worst case. + + DFS explores the island by recursively visiting connected land cells. When we hit water or the boundary, that's a perimeter edge (return 1). When we hit a visited cell, return 0 to avoid double counting. While correct, this uses extra space and is overkill for this problem. diff --git a/backend/data/questions/jump-game-ii.yaml b/backend/data/questions/jump-game-ii.yaml new file mode 100644 index 0000000..ef438dd --- /dev/null +++ b/backend/data/questions/jump-game-ii.yaml @@ -0,0 +1,178 @@ +title: Jump Game II +slug: jump-game-ii +difficulty: medium +leetcode_id: 45 +leetcode_url: https://leetcode.com/problems/jump-game-ii/ +categories: + - arrays + - dynamic-programming +patterns: + - greedy + +description: | + You are given a **0-indexed** array of integers `nums` of length `n`. You are initially positioned at index `0`. + + Each element `nums[i]` represents the maximum length of a forward jump from index `i`. In other words, if you are at index `i`, you can jump to any index `i + j` where: + + - `0 <= j <= nums[i]` and + - `i + j < n` + + Return *the minimum number of jumps to reach index* `n - 1`. The test cases are generated such that you can reach index `n - 1`. + +constraints: | + - `1 <= nums.length <= 10^4` + - `0 <= nums[i] <= 1000` + - It's guaranteed that you can reach `nums[n - 1]` + +examples: + - input: "nums = [2,3,1,1,4]" + output: "2" + explanation: "The minimum number of jumps to reach the last index is 2. Jump 1 step from index 0 to 1, then 3 steps to the last index." + - input: "nums = [2,3,0,1,4]" + output: "2" + explanation: "Jump 1 step from index 0 to 1, then 3 steps to the last index." + +explanation: + intuition: | + Imagine you're standing at the start of a path with numbered tiles, and each tile tells you the maximum distance you can leap forward. Your goal is to reach the end in as few jumps as possible. + + Think of it like a **level-based exploration**: from your current position, you can reach a range of tiles. Within that range, you want to pick the tile that lets you jump the *farthest* on your next move. This is the **greedy insight** — at each "level" (jump), choose the landing spot that maximises your future reach. + + Visualise it as expanding waves: your first jump creates a "wave" of reachable positions. From all positions in that wave, you determine how far the next wave can extend. Each wave represents one jump. + + The key observation is that you don't need to try every possible path. By always tracking the farthest point reachable within your current jump's range, you guarantee the minimum number of jumps. This works because reaching farther never hurts — a farther position can reach everything a closer position can, plus more. + + approach: | + We solve this using a **Greedy (BFS-like) Approach**: + + **Step 1: Handle edge cases** + + - If the array has only one element, we're already at the destination — return `0` jumps + +   + + **Step 2: Initialise tracking variables** + + - `jumps`: Counter for the number of jumps made, starting at `0` + - `current_end`: The farthest index reachable with the current number of jumps (initially `0`) + - `farthest`: The farthest index we can reach from any position within the current range (initially `0`) + +   + + **Step 3: Iterate through the array** + + - For each index `i` from `0` to `n - 2` (we don't need to process the last index): + - Update `farthest` to be the maximum of `farthest` and `i + nums[i]` + - When we reach `current_end` (the boundary of our current jump range): + - Increment `jumps` — we must take another jump + - Update `current_end` to `farthest` — this is now our new reachable boundary + - If `current_end` reaches or exceeds the last index, we can stop + +   + + **Step 4: Return the result** + + - Return `jumps` after processing the array + +   + + This approach works because we're essentially doing a BFS level by level. Each "level" represents positions reachable in exactly `k` jumps. We greedily extend to the farthest reachable point at each level, ensuring minimum jumps. + + common_pitfalls: + - title: Using Dynamic Programming When Greedy Suffices + description: | + A natural first approach is DP: let `dp[i]` be the minimum jumps to reach index `i`. For each position, check all positions that can reach it. + + While correct, this is **O(n^2) time complexity**. For `n = 10^4`, this means up to 100 million operations, which may cause TLE. + + The greedy approach achieves **O(n)** by recognising that we don't need to track exact paths — just the farthest reachable point at each jump level. + wrong_approach: "DP with O(n^2) transitions" + correct_approach: "Greedy tracking of reachable range per jump" + + - title: Processing the Last Index + description: | + A subtle bug occurs when iterating through all indices including `n - 1`. If the last index happens to equal `current_end`, you'd incorrectly count an extra jump. + + We only need to iterate to `n - 2`. Once we know we can reach the last index, we're done. Processing the last index itself is unnecessary and can inflate the jump count. + wrong_approach: "Iterating i from 0 to n - 1" + correct_approach: "Iterating i from 0 to n - 2" + + - title: Forgetting to Update Farthest Before Checking Boundary + description: | + The order of operations matters. You must update `farthest = max(farthest, i + nums[i])` *before* checking if `i == current_end`. + + If you check the boundary first and then update farthest, you miss accounting for the current position's reach, potentially getting a wrong answer. + wrong_approach: "Check boundary, then update farthest" + correct_approach: "Update farthest, then check boundary" + + key_takeaways: + - "**Greedy as implicit BFS**: When optimising for minimum steps in reachability problems, think of expanding 'waves' or 'levels' of positions reachable in k jumps" + - "**Track ranges, not paths**: Instead of enumerating all possible paths (exponential), track the reachable range at each step (linear)" + - "**Foundation for Jump Game variants**: This pattern extends to problems with obstacles, costs, or different movement rules" + - "**Recognise when DP is overkill**: If the problem has optimal substructure but greedy choice works, prefer the simpler O(n) greedy solution" + + time_complexity: "O(n). We traverse the array exactly once, processing each element in constant time." + space_complexity: "O(1). We only use three variables (`jumps`, `current_end`, `farthest`), regardless of input size." + +solutions: + - approach_name: Greedy (BFS-like) + is_optimal: true + code: | + def jump(nums: list[int]) -> int: + n = len(nums) + # Already at destination + if n <= 1: + return 0 + + jumps = 0 # Number of jumps taken + current_end = 0 # Farthest we can reach with current jumps + farthest = 0 # Farthest we can reach from positions in current range + + # Don't process last index - we just need to reach it + for i in range(n - 1): + # Update the farthest point reachable from current position + farthest = max(farthest, i + nums[i]) + + # Reached the end of current jump's range + if i == current_end: + jumps += 1 # Must take another jump + current_end = farthest # Extend range to farthest reachable + + # Early exit if we can reach the end + if current_end >= n - 1: + break + + return jumps + explanation: | + **Time Complexity:** O(n) — Single pass through the array. + + **Space Complexity:** O(1) — Only three integer variables used. + + We simulate a BFS where each "level" represents positions reachable with the same number of jumps. At each level, we track the farthest position we can reach, then "jump" to extend our range. The greedy choice of always extending to the farthest point guarantees minimum jumps. + + - approach_name: Dynamic Programming + is_optimal: false + code: | + def jump(nums: list[int]) -> int: + n = len(nums) + # dp[i] = minimum jumps to reach index i + dp = [float('inf')] * n + dp[0] = 0 # Start position needs 0 jumps + + for i in range(n): + # Skip unreachable positions + if dp[i] == float('inf'): + continue + + # Update all positions reachable from i + for j in range(1, nums[i] + 1): + if i + j < n: + dp[i + j] = min(dp[i + j], dp[i] + 1) + + return dp[n - 1] + explanation: | + **Time Complexity:** O(n × m) where m is the average jump length — can be O(n^2) in worst case. + + **Space Complexity:** O(n) — DP array storing minimum jumps to each index. + + For each position, we update all positions reachable from it. While correct, this is slower than the greedy approach because we're doing redundant work. Included to illustrate why greedy is preferred when it works. diff --git a/backend/data/questions/jump-game-vii.yaml b/backend/data/questions/jump-game-vii.yaml new file mode 100644 index 0000000..3aabaae --- /dev/null +++ b/backend/data/questions/jump-game-vii.yaml @@ -0,0 +1,206 @@ +title: Jump Game VII +slug: jump-game-vii +difficulty: medium +leetcode_id: 1871 +leetcode_url: https://leetcode.com/problems/jump-game-vii/ +categories: + - strings + - dynamic-programming +patterns: + - bfs + - sliding-window + - dynamic-programming + +description: | + You are given a **0-indexed** binary string `s` and two integers `minJump` and `maxJump`. In the beginning, you are standing at index `0`, which is equal to `'0'`. You can move from index `i` to index `j` if the following conditions are fulfilled: + + - `i + minJump <= j <= min(i + maxJump, s.length - 1)`, and + - `s[j] == '0'`. + + Return `true` *if you can reach index* `s.length - 1` *in* `s`*, or* `false` *otherwise*. + +constraints: | + - `2 <= s.length <= 10^5` + - `s[i]` is either `'0'` or `'1'` + - `s[0] == '0'` + - `1 <= minJump <= maxJump < s.length` + +examples: + - input: 's = "011010", minJump = 2, maxJump = 3' + output: "true" + explanation: "In the first step, move from index 0 to index 3. In the second step, move from index 3 to index 5." + - input: 's = "01101110", minJump = 2, maxJump = 3' + output: "false" + explanation: "There is no way to reach the last index starting from index 0." + +explanation: + intuition: | + Imagine you're hopping across stepping stones in a river, where `'0'` represents a safe stone and `'1'` represents water. From any stone, you can only jump forward between `minJump` and `maxJump` steps. + + The naive approach would be to try every possible jump from every reachable position — but with up to 10^5 characters and a jump range that could span thousands of indices, this becomes extremely slow. + + The key insight is that we need to efficiently track **which positions are reachable** and then, for each new position, check if **any** reachable position can jump to it. Instead of checking every source position individually, we can use a **sliding window** or **prefix sum** to answer "is there any reachable position in the valid jump range?" in O(1) time. + + Think of it like this: as you scan left to right, you maintain a count of how many reachable positions fall within the "jump window" for the current index. If that count is positive and the current character is `'0'`, you can reach it. + + approach: | + We solve this using **Dynamic Programming with Prefix Sum optimization**: + + **Step 1: Set up the DP array** + + - Create a boolean array `dp` where `dp[i]` indicates whether index `i` is reachable + - Set `dp[0] = True` since we start at index 0 + - Initialize a counter `reachable` to track reachable positions in the current window + +   + + **Step 2: Iterate through the string** + + - For each index `i` from 1 to `n-1`: + - First, check if a new position has entered our "from" window: if `i >= minJump` and `dp[i - minJump]` is true, increment `reachable` + - Then, check if a position has left the window: if `i > maxJump` and `dp[i - maxJump - 1]` is true, decrement `reachable` + - If `reachable > 0` and `s[i] == '0'`, mark `dp[i] = True` + +   + + **Step 3: Return the result** + + - Return `dp[n - 1]` — whether the last index is reachable + +   + + This approach works because the sliding window maintains a count of all reachable positions that could potentially jump to the current index. We add positions as they enter the jump range and remove them as they exit. + + common_pitfalls: + - title: The BFS/DFS Timeout + description: | + A natural approach is to use BFS or DFS to explore all reachable positions. However, with a string of length `10^5` and a jump range that could span thousands of indices, this approach degenerates to O(n * (maxJump - minJump)) which is potentially O(n^2). + + For example, with `n = 100000`, `minJump = 1`, and `maxJump = 50000`, each position could have 50,000 neighbors to explore! + wrong_approach: "Plain BFS/DFS exploring all neighbors in jump range" + correct_approach: "BFS with visited tracking and early termination, or DP with sliding window" + + - title: Checking Every Source Position + description: | + For each index `i`, you might think to loop through all `j` from `i - maxJump` to `i - minJump` to check if any `dp[j]` is true. This is O(n * range) which is too slow. + + The sliding window/prefix sum optimization reduces this to O(1) per index by maintaining a running count of reachable positions in the valid range. + wrong_approach: "For each i, loop through all j in jump range" + correct_approach: "Maintain sliding window count of reachable positions" + + - title: Off-by-One Errors in Window Bounds + description: | + The jump constraints are `i + minJump <= j <= i + maxJump`. When working backwards (asking "can I reach index j?"), the valid source range is `j - maxJump <= i <= j - minJump`. + + Be careful when a position enters and exits the window: + - Position `j - minJump` enters the window when we reach index `j` + - Position `j - maxJump - 1` exits the window when we reach index `j` + wrong_approach: "Incorrect window boundary calculations" + correct_approach: "Carefully track when positions enter (at i - minJump) and exit (at i - maxJump - 1)" + + key_takeaways: + - "**Sliding window for range queries**: When you need to check if *any* value in a range satisfies a condition, maintain a running count as the window slides" + - "**Prefix sum for cumulative queries**: This pattern appears frequently — counting elements in ranges can be done in O(1) with preprocessing" + - "**Optimizing DP transitions**: When DP transitions involve checking a range of previous states, look for ways to avoid the inner loop" + - "**Jump Game series**: This problem extends the classic Jump Game pattern — earlier versions use greedy, this one requires DP with optimization" + + time_complexity: "O(n). We iterate through the string once, and each position enters and exits the sliding window exactly once." + space_complexity: "O(n). We use a DP array of size `n` to track reachability of each position." + +solutions: + - approach_name: DP with Sliding Window + is_optimal: true + code: | + def can_reach(s: str, min_jump: int, max_jump: int) -> bool: + n = len(s) + # dp[i] = True if we can reach index i + dp = [False] * n + dp[0] = True # Start at index 0 + + # Count of reachable positions in the current jump window + reachable = 0 + + for i in range(1, n): + # Position (i - min_jump) just entered our "can jump from" window + if i >= min_jump and dp[i - min_jump]: + reachable += 1 + + # Position (i - max_jump - 1) just left the window + if i > max_jump and dp[i - max_jump - 1]: + reachable -= 1 + + # If any reachable position can jump here and it's a '0', mark reachable + if reachable > 0 and s[i] == '0': + dp[i] = True + + return dp[n - 1] + explanation: | + **Time Complexity:** O(n) — Single pass through the string with O(1) work per index. + + **Space Complexity:** O(n) — DP array to track reachability. + + The sliding window maintains a count of positions that could jump to the current index. As we move right, positions enter the window when they're exactly `minJump` away, and exit when they're more than `maxJump` away. + + - approach_name: BFS with Optimization + is_optimal: false + code: | + from collections import deque + + def can_reach(s: str, min_jump: int, max_jump: int) -> bool: + n = len(s) + if s[n - 1] == '1': + return False + + queue = deque([0]) + # Track the farthest index we've added to avoid duplicates + farthest = 0 + + while queue: + i = queue.popleft() + + # Start of jump range: don't re-explore already visited indices + start = max(i + min_jump, farthest + 1) + end = min(i + max_jump, n - 1) + + for j in range(start, end + 1): + if s[j] == '0': + if j == n - 1: + return True + queue.append(j) + + # Update farthest to avoid revisiting + farthest = max(farthest, i + max_jump) + + return False + explanation: | + **Time Complexity:** O(n) — Each index is added to the queue at most once due to the `farthest` optimization. + + **Space Complexity:** O(n) — Queue can hold up to n positions in the worst case. + + This BFS approach uses a `farthest` pointer to avoid re-exploring indices. When processing position `i`, we only explore indices beyond what we've already added. This ensures each index is visited at most once, giving linear time. + + - approach_name: Brute Force DP + is_optimal: false + code: | + def can_reach(s: str, min_jump: int, max_jump: int) -> bool: + n = len(s) + dp = [False] * n + dp[0] = True + + for i in range(1, n): + if s[i] == '1': + continue + + # Check all positions that could jump to i + for j in range(max(0, i - max_jump), i - min_jump + 1): + if dp[j]: + dp[i] = True + break + + return dp[n - 1] + explanation: | + **Time Complexity:** O(n * (maxJump - minJump)) — For each position, we check up to `maxJump - minJump + 1` previous positions. + + **Space Complexity:** O(n) — DP array. + + This approach is correct but too slow for large inputs. With `n = 10^5` and a large jump range, this becomes O(n^2) and will TLE. Included to illustrate why the sliding window optimization is necessary. diff --git a/backend/data/questions/jump-game.yaml b/backend/data/questions/jump-game.yaml new file mode 100644 index 0000000..005f42d --- /dev/null +++ b/backend/data/questions/jump-game.yaml @@ -0,0 +1,182 @@ +title: Jump Game +slug: jump-game +difficulty: medium +leetcode_id: 55 +leetcode_url: https://leetcode.com/problems/jump-game/ +categories: + - arrays + - dynamic-programming +patterns: + - greedy + - dynamic-programming + +description: | + You are given an integer array `nums`. You are initially positioned at the array's **first index**, and each element in the array represents your maximum jump length at that position. + + Return `true` *if you can reach the last index*, or `false` *otherwise*. + +constraints: | + - `1 <= nums.length <= 10^4` + - `0 <= nums[i] <= 10^5` + +examples: + - input: "nums = [2,3,1,1,4]" + output: "true" + explanation: "Jump 1 step from index 0 to 1, then 3 steps to the last index." + - input: "nums = [3,2,1,0,4]" + output: "false" + explanation: "You will always arrive at index 3 no matter what. Its maximum jump length is 0, which makes it impossible to reach the last index." + +explanation: + intuition: | + Imagine you're hopping across stepping stones to reach the other side of a river. Each stone tells you the *maximum* distance you can jump from it — but you can choose to jump any shorter distance too. + + The key insight is that you don't need to track every possible path. Instead, think about it this way: **what's the farthest position you can possibly reach?** As you walk through the array, each position extends your reach. If at any point you find yourself stuck (your current position is beyond your maximum reach), you know you'll never make it. + + Think of it like filling a gas tank: at each position, you're potentially adding "fuel" (jump range) to extend how far you can go. The question becomes: can you keep extending your reach until it covers the finish line? + + This greedy approach works because we only care about *whether* we can reach the end, not *how* we reach it. If we can reach position `i`, and from `i` we can jump to position `j`, then we can definitely reach `j` — we don't need to track the exact path. + + approach: | + We solve this using a **Greedy Approach** by tracking the maximum reachable index: + + **Step 1: Initialise the maximum reach** + + - `max_reach`: Set to `0` initially (we start at index 0, which we can trivially reach) + +   + + **Step 2: Iterate through the array** + + - For each index `i`, first check if `i > max_reach` + - If yes, we're stuck — we can't even reach this position, so return `false` + - If no, calculate how far we can reach from here: `i + nums[i]` + - Update `max_reach` to be the maximum of its current value and `i + nums[i]` + - This ensures we always track the farthest point we could possibly reach + +   + + **Step 3: Return the result** + + - If we complete the loop without getting stuck, return `true` + - We know we can reach the end because `max_reach` must be at least `n - 1` + +   + + The greedy choice at each step (always extend our reach as far as possible) guarantees we find a solution if one exists. + + common_pitfalls: + - title: Simulating Every Possible Jump Path + description: | + A common first instinct is to use recursion or BFS to explore all possible jump sequences. This leads to **exponential time complexity** because from each position, you have up to `nums[i]` choices of where to jump next. + + With `nums.length <= 10^4` and `nums[i] <= 10^5`, this approach will cause a **Time Limit Exceeded (TLE)** error. The greedy approach avoids this by recognising that we only need to track the maximum reach, not every individual path. + wrong_approach: "Recursively exploring all jump combinations" + correct_approach: "Track maximum reachable index in single pass" + + - title: Forgetting to Check Reachability Before Updating + description: | + A subtle bug occurs when you update `max_reach` without first checking if the current index is reachable. Consider `nums = [0, 2, 3]`: + + - At index 0: `max_reach = 0 + 0 = 0` + - At index 1: If you don't check reachability, you'd calculate `max_reach = 1 + 2 = 3` + - But index 1 was never reachable from index 0! + + Always check `i <= max_reach` before processing position `i`. + wrong_approach: "Update max_reach without checking if current index is reachable" + correct_approach: "Check i <= max_reach before processing each position" + + - title: Off-by-One Errors with Array Length + description: | + Remember that the goal is to reach the **last index** (position `n - 1`), not to jump beyond the array. Your condition should check whether `max_reach >= n - 1`, not `max_reach >= n`. + + For a single-element array `[0]`, you're already at the last index, so the answer is `true` even though you can't jump anywhere. + + key_takeaways: + - "**Greedy reachability**: When you only need to know *if* a destination is reachable (not *how*), tracking the maximum reachable position is often sufficient" + - "**Single-pass efficiency**: By maintaining running state (`max_reach`), we avoid expensive path enumeration and achieve O(n) time" + - "**Foundation for Jump Game II**: This problem extends to finding the *minimum* number of jumps (LeetCode #45), which uses a similar greedy interval approach" + - "**Early termination**: The greedy approach allows us to return `false` as soon as we detect we're stuck, avoiding unnecessary computation" + + time_complexity: "O(n). We traverse the array exactly once, performing constant-time operations at each index." + space_complexity: "O(1). We only use a single variable (`max_reach`) regardless of input size." + +solutions: + - approach_name: Greedy (Maximum Reach) + is_optimal: true + code: | + def can_jump(nums: list[int]) -> bool: + # Track the farthest index we can reach + max_reach = 0 + + for i in range(len(nums)): + # If current index is beyond our reach, we're stuck + if i > max_reach: + return False + + # Update max reach from current position + # We can jump up to nums[i] steps from index i + max_reach = max(max_reach, i + nums[i]) + + # If we processed all indices, we can reach the end + return True + explanation: | + **Time Complexity:** O(n) — Single pass through the array. + + **Space Complexity:** O(1) — Only one variable used. + + We iterate through each index, checking if it's reachable and updating our maximum reach. If we ever find ourselves at an unreachable position, we return `false`. Otherwise, completing the loop means the end is reachable. + + - approach_name: Greedy (Backward) + is_optimal: true + code: | + def can_jump(nums: list[int]) -> bool: + # Start with the goal at the last index + goal = len(nums) - 1 + + # Work backwards through the array + for i in range(len(nums) - 2, -1, -1): + # If we can reach the goal from position i, + # then position i becomes our new goal + if i + nums[i] >= goal: + goal = i + + # If goal moved all the way to index 0, we can reach the end + return goal == 0 + explanation: | + **Time Complexity:** O(n) — Single pass through the array (backwards). + + **Space Complexity:** O(1) — Only one variable used. + + This alternative greedy approach works backwards from the end. We ask: "What's the leftmost position that can reach my current goal?" Each time we find such a position, it becomes the new goal. If the goal reaches index 0, we know the end is reachable from the start. + + - approach_name: Dynamic Programming + is_optimal: false + code: | + def can_jump(nums: list[int]) -> bool: + n = len(nums) + # dp[i] indicates whether index i is reachable + dp = [False] * n + dp[0] = True # Starting position is always reachable + + for i in range(n): + # Skip unreachable positions + if not dp[i]: + continue + + # Mark all positions reachable from i + for j in range(1, nums[i] + 1): + if i + j < n: + dp[i + j] = True + + # Early exit if we've reached the end + if dp[n - 1]: + return True + + return dp[n - 1] + explanation: | + **Time Complexity:** O(n × max(nums[i])) — For each position, we may mark up to `nums[i]` subsequent positions. + + **Space Complexity:** O(n) — Boolean array tracking reachability of each position. + + This DP approach explicitly tracks which positions are reachable. While correct, it's slower than the greedy approach because it does redundant work marking positions that have already been marked. Included to illustrate the progression from DP thinking to greedy optimisation. diff --git a/backend/data/questions/k-closest-points-to-origin.yaml b/backend/data/questions/k-closest-points-to-origin.yaml new file mode 100644 index 0000000..0b8fe62 --- /dev/null +++ b/backend/data/questions/k-closest-points-to-origin.yaml @@ -0,0 +1,197 @@ +title: K Closest Points to Origin +slug: k-closest-points-to-origin +difficulty: medium +leetcode_id: 973 +leetcode_url: https://leetcode.com/problems/k-closest-points-to-origin/ +categories: + - arrays + - heap + - sorting +patterns: + - heap + +description: | + Given an array of `points` where `points[i] = [x_i, y_i]` represents a point on the **X-Y** plane and an integer `k`, return the `k` closest points to the origin `(0, 0)`. + + The distance between two points on the **X-Y** plane is the Euclidean distance (i.e., `sqrt((x1 - x2)^2 + (y1 - y2)^2)`). + + You may return the answer in **any order**. The answer is **guaranteed** to be **unique** (except for the order that it is in). + +constraints: | + - `1 <= k <= points.length <= 10^4` + - `-10^4 <= x_i, y_i <= 10^4` + +examples: + - input: "points = [[1,3],[-2,2]], k = 1" + output: "[[-2,2]]" + explanation: "The distance between (1, 3) and the origin is sqrt(10). The distance between (-2, 2) and the origin is sqrt(8). Since sqrt(8) < sqrt(10), (-2, 2) is closer to the origin. We only want the closest k = 1 points from the origin, so the answer is just [[-2,2]]." + - input: "points = [[3,3],[5,-1],[-2,4]], k = 2" + output: "[[3,3],[-2,4]]" + explanation: "The answer [[-2,4],[3,3]] would also be accepted since any order is valid." + +explanation: + intuition: | + Imagine you have a map with several pins representing locations, and you're standing at the center (the origin). You need to find the `k` pins closest to you. + + The **core insight** is that we need to efficiently select the k smallest values from a collection — this is the classic **top-k problem**. While sorting all points would work, it's more work than necessary. We don't need full ordering; we just need to identify which k points are closest. + + Think of it like this: imagine you're a bouncer at a club with a capacity of `k` people. As people (points) arrive, you only let them in if there's room or if they're "better" (closer) than someone already inside. You don't need to rank everyone perfectly — you just need to maintain the best k at any moment. + + A **max-heap of size k** is perfect for this. The heap always holds the k closest points seen so far. When we encounter a new point, we compare it to the *farthest* point in our heap (the max). If the new point is closer, we evict the farthest and add the new one. + + **Key optimization**: Since we only care about *relative* distances, we can compare squared distances (`x^2 + y^2`) instead of actual Euclidean distances. This avoids expensive square root calculations without affecting correctness. + + approach: | + We solve this using a **Max-Heap of Size K** approach: + + **Step 1: Define a distance function** + + - Create a helper to compute squared Euclidean distance: `x^2 + y^2` + - We use squared distance to avoid the expensive `sqrt()` operation — comparing `d1^2` vs `d2^2` gives the same ordering as `d1` vs `d2` + +   + + **Step 2: Build a max-heap of size k** + + - Iterate through each point in the input + - Push each point onto a max-heap (in Python, negate the distance for a max-heap using `heapq`) + - If the heap size exceeds `k`, pop the largest (farthest) point + +   + + **Step 3: Extract results from the heap** + + - After processing all points, the heap contains exactly the k closest points + - Extract and return these points + +   + + **Why this works**: By maintaining a max-heap of size k, the root is always the *farthest* among our k candidates. When a closer point arrives, it replaces the farthest, ensuring we always have the k closest. This is more efficient than sorting when `k << n`. + + common_pitfalls: + - title: Computing Actual Euclidean Distance + description: | + A common mistake is to compute the actual Euclidean distance using `sqrt(x^2 + y^2)` for every point. + + While mathematically correct, the `sqrt()` function is computationally expensive. Since we only need to *compare* distances (not their exact values), squared distances work just as well: if `a^2 < b^2` and both are positive, then `a < b`. + + This optimisation can provide a noticeable performance boost, especially with `10^4` points. + wrong_approach: "Using sqrt(x^2 + y^2) for distance" + correct_approach: "Using x^2 + y^2 for distance comparison" + + - title: Sorting All Points + description: | + The naive approach is to sort all n points by distance and take the first k. + + This gives O(n log n) time complexity regardless of k. When k is small (e.g., k = 10 with n = 10,000), we're doing far more work than necessary. + + The heap approach is O(n log k), which is significantly faster when `k << n`. For k = 10 and n = 10,000, that's roughly 3x fewer operations. + wrong_approach: "Sort all points, take first k" + correct_approach: "Use a max-heap of size k" + + - title: Using a Min-Heap Instead of Max-Heap + description: | + If you use a min-heap and push all n points, you'd need to pop k times at the end. This requires O(n) space for the full heap. + + A max-heap of size k is more memory-efficient (O(k) space) and naturally evicts the farthest point when a closer one arrives. In Python, since `heapq` is a min-heap by default, negate the distances to simulate a max-heap. + wrong_approach: "Min-heap with all n points" + correct_approach: "Max-heap limited to size k" + + key_takeaways: + - "**Top-k pattern**: When you need the k smallest/largest elements, a heap of size k is often optimal — O(n log k) beats O(n log n) sorting when `k << n`" + - "**Squared distance optimisation**: Avoid `sqrt()` when comparing distances — squared distances preserve ordering and are faster to compute" + - "**Max-heap for k smallest**: Use a max-heap to track k smallest values; the root lets you quickly check if a new element belongs" + - "**Related problems**: This pattern applies to Kth Largest Element, Top K Frequent Elements, and similar selection problems" + + time_complexity: "O(n log k). We iterate through all n points, and each heap operation (push/pop) takes O(log k) time since the heap is capped at size k." + space_complexity: "O(k). The heap stores at most k points at any time." + +solutions: + - approach_name: Max-Heap + is_optimal: true + code: | + import heapq + + def k_closest(points: list[list[int]], k: int) -> list[list[int]]: + # Max-heap to store k closest points (negate distance for max-heap) + max_heap = [] + + for x, y in points: + # Squared distance avoids expensive sqrt() + dist = x * x + y * y + # Push negative distance to simulate max-heap + heapq.heappush(max_heap, (-dist, [x, y])) + + # If heap exceeds size k, remove the farthest point + if len(max_heap) > k: + heapq.heappop(max_heap) + + # Extract the k closest points from the heap + return [point for _, point in max_heap] + explanation: | + **Time Complexity:** O(n log k) — We process n points, each heap operation is O(log k). + + **Space Complexity:** O(k) — The heap stores at most k elements. + + By maintaining a max-heap of size k, we efficiently track the k closest points. The negative distance trick converts Python's min-heap into a max-heap, ensuring the farthest point is always at the root for quick comparison and removal. + + - approach_name: Sort by Distance + is_optimal: false + code: | + def k_closest(points: list[list[int]], k: int) -> list[list[int]]: + # Sort all points by squared distance from origin + points.sort(key=lambda p: p[0] * p[0] + p[1] * p[1]) + + # Return the first k points + return points[:k] + explanation: | + **Time Complexity:** O(n log n) — Sorting dominates the complexity. + + **Space Complexity:** O(1) to O(n) — Depends on the sorting algorithm used. + + This approach is simpler to implement and may be preferred when k is close to n. However, for small k values relative to n, the heap approach is more efficient. The simplicity makes this a good choice when optimisation isn't critical. + + - approach_name: Quickselect + is_optimal: false + code: | + import random + + def k_closest(points: list[list[int]], k: int) -> list[list[int]]: + def dist(point: list[int]) -> int: + return point[0] * point[0] + point[1] * point[1] + + def partition(left: int, right: int, pivot_idx: int) -> int: + pivot_dist = dist(points[pivot_idx]) + # Move pivot to end + points[pivot_idx], points[right] = points[right], points[pivot_idx] + store_idx = left + + # Move all closer points to the left + for i in range(left, right): + if dist(points[i]) < pivot_dist: + points[store_idx], points[i] = points[i], points[store_idx] + store_idx += 1 + + # Move pivot to its final position + points[store_idx], points[right] = points[right], points[store_idx] + return store_idx + + left, right = 0, len(points) - 1 + while left < right: + pivot_idx = random.randint(left, right) + pivot_idx = partition(left, right, pivot_idx) + + if pivot_idx == k: + break + elif pivot_idx < k: + left = pivot_idx + 1 + else: + right = pivot_idx - 1 + + return points[:k] + explanation: | + **Time Complexity:** O(n) average, O(n^2) worst case — Quickselect has linear average time. + + **Space Complexity:** O(1) — In-place partitioning. + + Quickselect is theoretically optimal with O(n) average time. It partitions the array around a pivot, similar to quicksort, but only recurses into the partition containing the k-th element. The randomised pivot selection helps avoid worst-case scenarios. However, the heap approach is often preferred in practice due to its guaranteed O(n log k) bound. diff --git a/backend/data/questions/koko-eating-bananas.yaml b/backend/data/questions/koko-eating-bananas.yaml new file mode 100644 index 0000000..e96242b --- /dev/null +++ b/backend/data/questions/koko-eating-bananas.yaml @@ -0,0 +1,174 @@ +title: Koko Eating Bananas +slug: koko-eating-bananas +difficulty: medium +leetcode_id: 875 +leetcode_url: https://leetcode.com/problems/koko-eating-bananas/ +categories: + - arrays + - binary-search +patterns: + - binary-search + +description: | + Koko loves to eat bananas. There are `n` piles of bananas, the ith pile has `piles[i]` bananas. The guards have gone and will come back in `h` hours. + + Koko can decide her bananas-per-hour eating speed of `k`. Each hour, she chooses some pile of bananas and eats `k` bananas from that pile. If the pile has less than `k` bananas, she eats all of them instead and will not eat any more bananas during that hour. + + Koko likes to eat slowly but still wants to finish eating all the bananas before the guards return. + + Return *the minimum integer* `k` *such that she can eat all the bananas within* `h` *hours*. + +constraints: | + - `1 <= piles.length <= 10^4` + - `piles.length <= h <= 10^9` + - `1 <= piles[i] <= 10^9` + +examples: + - input: "piles = [3,6,7,11], h = 8" + output: "4" + explanation: "At speed k = 4, Koko takes ceil(3/4) + ceil(6/4) + ceil(7/4) + ceil(11/4) = 1 + 2 + 2 + 3 = 8 hours, which exactly meets the deadline." + - input: "piles = [30,11,23,4,20], h = 5" + output: "30" + explanation: "With only 5 hours and 5 piles, Koko must finish each pile in exactly 1 hour. She needs k = 30 (the largest pile) to eat any pile in one hour." + - input: "piles = [30,11,23,4,20], h = 6" + output: "23" + explanation: "With 6 hours for 5 piles, Koko has one extra hour. At k = 23, the pile of 30 takes ceil(30/23) = 2 hours, while all others take 1 hour each, totaling 6 hours." + +explanation: + intuition: | + Imagine you're given a dial that controls Koko's eating speed. Turn it up, and she finishes faster but eats more per hour than necessary. Turn it down, and she enjoys smaller bites but risks running out of time. + + The key insight is that this dial creates a **monotonic relationship**: if Koko can finish at speed `k`, she can definitely finish at any speed greater than `k`. Conversely, if she can't finish at speed `k`, she can't finish at any slower speed either. + + This monotonicity is the hallmark of a **binary search on the answer** problem. Instead of searching for an element in an array, we're searching for the minimum valid value in a range of possible speeds. + + Think of it like this: imagine all possible speeds from `1` to `max(piles)` laid out on a number line. At some point, there's a boundary — speeds below it fail (too slow), and speeds at or above it succeed. Binary search efficiently finds that boundary. + + approach: | + We use **Binary Search on the Answer** to find the minimum valid eating speed: + + **Step 1: Define the search space** + + - `left`: Set to `1` — the minimum possible speed (eating at least one banana per hour) + - `right`: Set to `max(piles)` — eating faster than the largest pile is wasteful since Koko can only eat from one pile per hour + +   + + **Step 2: Implement the feasibility check** + + - For a given speed `k`, calculate the total hours needed to eat all piles + - For each pile, the hours needed is `ceil(pile / k)`, which equals `(pile + k - 1) // k` using integer math + - If total hours `<= h`, the speed is feasible + +   + + **Step 3: Binary search for the minimum valid speed** + + - Calculate `mid = (left + right) // 2` + - If `mid` is a feasible speed, it might be our answer, but there could be a smaller valid speed — search left by setting `right = mid` + - If `mid` is not feasible, we need a faster speed — search right by setting `left = mid + 1` + - Continue until `left == right` + +   + + **Step 4: Return the result** + + - Return `left` (or `right`) as the minimum speed that allows Koko to finish on time + + common_pitfalls: + - title: Trying All Speeds Linearly + description: | + A naive approach checks every speed from `1` to `max(piles)` and returns the first one that works. + + With `max(piles)` up to `10^9`, this linear search performs up to a billion iterations — far too slow. + + Binary search reduces this to at most `log2(10^9) ≈ 30` iterations. + wrong_approach: "Linear search from 1 to max(piles)" + correct_approach: "Binary search on the speed range" + + - title: Incorrect Ceiling Division + description: | + When calculating hours for a pile, we need `ceil(pile / k)`. Using regular integer division `pile // k` gives the floor, which undercounts. + + For example, `pile = 7, k = 4`: floor is `1`, but Koko actually needs `2` hours (one hour for 4 bananas, another for the remaining 3). + + Use the formula `(pile + k - 1) // k` or Python's `math.ceil(pile / k)` for correct results. + wrong_approach: "pile // k (floor division)" + correct_approach: "(pile + k - 1) // k (ceiling division)" + + - title: Wrong Search Space Bounds + description: | + Setting `right` too high (e.g., `sum(piles)` or `h`) wastes iterations. The maximum useful speed is `max(piles)` because eating faster doesn't help — Koko still spends one hour per pile regardless. + + Setting `left` to `0` causes division by zero errors. The minimum meaningful speed is `1`. + wrong_approach: "left = 0 or right = sum(piles)" + correct_approach: "left = 1, right = max(piles)" + + - title: Off-by-One in Binary Search + description: | + When searching for a minimum valid value: + - If `mid` works, set `right = mid` (not `mid - 1`) because `mid` could be the answer + - If `mid` fails, set `left = mid + 1` + + Using `right = mid - 1` when `mid` is valid might skip the answer. The loop condition `left < right` ensures we converge correctly. + wrong_approach: "right = mid - 1 when mid is feasible" + correct_approach: "right = mid when mid is feasible" + + key_takeaways: + - "**Binary search on the answer**: When asked to find the minimum/maximum value satisfying a condition, and the condition is monotonic, binary search applies" + - "**Monotonicity is key**: If a speed `k` works, all larger speeds work too — this sorted property enables binary search" + - "**Ceiling division pattern**: `(a + b - 1) // b` computes `ceil(a / b)` using only integers, avoiding floating-point issues" + - "**Similar problems**: This pattern applies to Capacity To Ship Packages Within D Days, Split Array Largest Sum, and Magnetic Force Between Two Balls" + + time_complexity: "O(n log m). Binary search runs `O(log m)` iterations where `m = max(piles)`, and each feasibility check scans all `n` piles." + space_complexity: "O(1). We only use a constant number of variables for the search bounds and hour calculations." + +solutions: + - approach_name: Binary Search on Answer + is_optimal: true + code: | + def min_eating_speed(piles: list[int], h: int) -> int: + # Search space: minimum speed 1, maximum speed is largest pile + left, right = 1, max(piles) + + while left < right: + mid = (left + right) // 2 + + # Calculate total hours needed at speed mid + hours_needed = sum((pile + mid - 1) // mid for pile in piles) + + if hours_needed <= h: + # Speed mid works, but maybe we can go slower + right = mid + else: + # Too slow, need to eat faster + left = mid + 1 + + return left + explanation: | + **Time Complexity:** O(n log m) — Binary search over `m = max(piles)` speeds, each iteration scans `n` piles. + + **Space Complexity:** O(1) — Only constant extra space used. + + We binary search for the minimum speed where Koko can finish on time. The feasibility check sums up the hours needed for each pile using ceiling division. + + - approach_name: Linear Search + is_optimal: false + code: | + def min_eating_speed(piles: list[int], h: int) -> int: + # Try every speed from 1 up to max pile + for k in range(1, max(piles) + 1): + # Calculate hours needed at this speed + hours_needed = sum((pile + k - 1) // k for pile in piles) + + # Return first speed that works + if hours_needed <= h: + return k + + return max(piles) + explanation: | + **Time Complexity:** O(n × m) — Checks up to `m = max(piles)` speeds, each requiring O(n) time. + + **Space Complexity:** O(1) — Only constant extra space used. + + This brute force approach tries every possible speed starting from 1. While correct, it times out on large inputs where `max(piles)` can be up to `10^9`. Included to illustrate why binary search is essential. diff --git a/backend/data/questions/kth-largest-element-in-a-stream.yaml b/backend/data/questions/kth-largest-element-in-a-stream.yaml new file mode 100644 index 0000000..723c553 --- /dev/null +++ b/backend/data/questions/kth-largest-element-in-a-stream.yaml @@ -0,0 +1,186 @@ +title: Kth Largest Element in a Stream +slug: kth-largest-element-in-a-stream +difficulty: easy +leetcode_id: 703 +leetcode_url: https://leetcode.com/problems/kth-largest-element-in-a-stream/ +categories: + - heap + - arrays +patterns: + - heap + +description: | + Design a class to find the `k`th largest element in a stream. + + Note that it is the `k`th largest element in the sorted order, not the `k`th distinct element. + + Implement the `KthLargest` class: + + - `KthLargest(int k, int[] nums)` — Initializes the object with the integer `k` and the stream of test scores `nums`. + - `int add(int val)` — Adds a new test score `val` to the stream and returns the element representing the `k`th largest element in the pool of test scores so far. + +constraints: | + - `0 <= nums.length <= 10^4` + - `1 <= k <= nums.length + 1` + - `-10^4 <= nums[i] <= 10^4` + - `-10^4 <= val <= 10^4` + - At most `10^4` calls will be made to `add` + +examples: + - input: | + ["KthLargest", "add", "add", "add", "add", "add"] + [[3, [4, 5, 8, 2]], [3], [5], [10], [9], [4]] + output: "[null, 4, 5, 5, 8, 8]" + explanation: | + KthLargest kthLargest = new KthLargest(3, [4, 5, 8, 2]); + kthLargest.add(3); // return 4 (stream: [2,3,4,5,8], 3rd largest = 4) + kthLargest.add(5); // return 5 (stream: [2,3,4,5,5,8], 3rd largest = 5) + kthLargest.add(10); // return 5 (stream: [2,3,4,5,5,8,10], 3rd largest = 5) + kthLargest.add(9); // return 8 (stream: [2,3,4,5,5,8,9,10], 3rd largest = 8) + kthLargest.add(4); // return 8 (stream: [2,3,4,4,5,5,8,9,10], 3rd largest = 8) + - input: | + ["KthLargest", "add", "add", "add", "add"] + [[4, [7, 7, 7, 7, 8, 3]], [2], [10], [9], [9]] + output: "[null, 7, 7, 7, 8]" + explanation: | + KthLargest kthLargest = new KthLargest(4, [7, 7, 7, 7, 8, 3]); + kthLargest.add(2); // return 7 (4th largest = 7) + kthLargest.add(10); // return 7 (4th largest = 7) + kthLargest.add(9); // return 7 (4th largest = 7) + kthLargest.add(9); // return 8 (4th largest = 8) + +explanation: + intuition: | + Imagine you're running a leaderboard for the top `k` players in a game. You don't need to track *everyone* — just the top `k`. When a new player joins: + + - If they're not good enough to crack the top `k`, you ignore them + - If they are, they bump out the current `k`th place player + + The key insight is: **the `k`th largest element is always the smallest element in the top `k` group**. If we maintain exactly `k` elements (the `k` largest seen so far), the minimum of this group is our answer. + + A **min-heap** of size `k` is perfect for this. The heap property guarantees the smallest element sits at the top. After each insertion, if our heap grows beyond `k` elements, we pop the smallest — ensuring we always keep exactly the `k` largest values, with the `k`th largest conveniently sitting at the heap's root. + + Think of it like a bouncer at an exclusive club: the venue only holds `k` people. When someone new arrives, if they're more important than the least important person inside, they swap places. The bouncer (heap root) always knows who's on the bubble. + + approach: | + We use a **Min-Heap of size k** to solve this efficiently: + + **Step 1: Initialise the heap** + + - Create an empty min-heap + - Add all elements from the initial `nums` array to the heap + - After adding each element, if heap size exceeds `k`, pop the minimum + +   + + **Step 2: Implement the add operation** + + - Push the new value onto the heap + - If heap size exceeds `k`, pop the minimum (it's no longer in the top `k`) + - Return the heap's minimum — this is the `k`th largest + +   + + **Why this works:** + + - The heap always contains exactly the `k` largest elements seen so far + - The min-heap property ensures the smallest of these (the `k`th largest overall) is at the root + - We only pop elements smaller than the `k`th largest, preserving correctness + + common_pitfalls: + - title: Using a Max-Heap Instead of Min-Heap + description: | + A max-heap gives you the *largest* element at the root, but we need the `k`th largest. With a max-heap of size `k`, you'd have to traverse to find the minimum. + + The trick is counter-intuitive: use a **min-heap** of size `k`. The root gives you the minimum of the `k` largest elements — which is exactly the `k`th largest overall. + wrong_approach: "Max-heap of size k" + correct_approach: "Min-heap of size k" + + - title: Keeping All Elements + description: | + Storing all `n` elements and sorting to find the `k`th largest gives O(n log n) per query. With up to `10^4` calls to `add`, this becomes too slow. + + By maintaining only `k` elements in the heap, each `add` operation is O(log k), which is much faster when `k << n`. + wrong_approach: "Sort all elements on each query" + correct_approach: "Maintain a fixed-size heap of k elements" + + - title: Forgetting to Handle Initial Array + description: | + The constructor receives an initial array `nums` that may have more than `k` elements. You must process these through the heap first, trimming down to size `k` before any `add` calls. + + If you skip this step, your heap won't be properly initialised and the first few `add` calls will return wrong results. + wrong_approach: "Ignore nums in constructor" + correct_approach: "Heapify nums and trim to size k in constructor" + + key_takeaways: + - "**Min-heap for k largest**: A min-heap of size `k` efficiently tracks the `k`th largest element — it's the heap's root" + - "**Bounded heap pattern**: Maintain a fixed-size heap by popping after each push when size exceeds `k`" + - "**O(log k) vs O(log n)**: Limiting heap size to `k` gives faster operations than keeping all elements" + - "**Foundation for streaming problems**: This pattern applies to any 'top k' problem in a data stream (e.g., top k frequent, k closest points)" + + time_complexity: "O(n log k) for initialisation where `n` is the size of `nums`, and O(log k) for each `add` call. Each heap operation (push/pop) takes O(log k) time since the heap never exceeds size `k`." + space_complexity: "O(k). We only store at most `k` elements in the heap at any time, regardless of how many elements are added to the stream." + +solutions: + - approach_name: Min-Heap + is_optimal: true + code: | + import heapq + + class KthLargest: + def __init__(self, k: int, nums: list[int]): + self.k = k + self.heap = [] + + # Add initial elements to the heap + for num in nums: + heapq.heappush(self.heap, num) + # Keep only the k largest elements + if len(self.heap) > k: + heapq.heappop(self.heap) + + def add(self, val: int) -> int: + # Add new value to the heap + heapq.heappush(self.heap, val) + + # If heap exceeds k, remove the smallest + if len(self.heap) > self.k: + heapq.heappop(self.heap) + + # The root of min-heap is the kth largest + return self.heap[0] + explanation: | + **Time Complexity:** O(n log k) for constructor, O(log k) per `add` call — heap operations on a heap of size `k`. + + **Space Complexity:** O(k) — the heap stores at most `k` elements. + + We maintain a min-heap of exactly `k` elements. The smallest element in this heap (the root) is the `k`th largest overall. When adding a new element, if the heap grows beyond `k`, we pop the smallest — it's no longer in the top `k`. + + - approach_name: Sorted List + is_optimal: false + code: | + import bisect + + class KthLargest: + def __init__(self, k: int, nums: list[int]): + self.k = k + # Keep a sorted list of the k largest elements + self.sorted_list = sorted(nums, reverse=True)[:k] + self.sorted_list.reverse() # Ascending order for bisect + + def add(self, val: int) -> int: + # Insert in sorted position + bisect.insort(self.sorted_list, val) + + # Keep only k largest (remove smallest if needed) + if len(self.sorted_list) > self.k: + self.sorted_list.pop(0) + + # Return the kth largest (smallest in our k-size list) + return self.sorted_list[0] + explanation: | + **Time Complexity:** O(n log n) for constructor, O(k) per `add` call — `bisect.insort` is O(k) due to shifting elements. + + **Space Complexity:** O(k) — stores at most `k` elements. + + This approach uses a sorted list with binary search insertion. While the space is the same, the O(k) insertion time makes it slower than the heap approach for large `k`. The heap's O(log k) operations are more efficient. diff --git a/backend/data/questions/kth-largest-element-in-an-array.yaml b/backend/data/questions/kth-largest-element-in-an-array.yaml new file mode 100644 index 0000000..8ab2aff --- /dev/null +++ b/backend/data/questions/kth-largest-element-in-an-array.yaml @@ -0,0 +1,211 @@ +title: Kth Largest Element in an Array +slug: kth-largest-element-in-an-array +difficulty: medium +leetcode_id: 215 +leetcode_url: https://leetcode.com/problems/kth-largest-element-in-an-array/ +categories: + - arrays + - sorting + - heap +patterns: + - heap + - binary-search + +description: | + Given an integer array `nums` and an integer `k`, return *the* `k`th *largest element in the array*. + + Note that it is the `k`th largest element in the sorted order, not the `k`th distinct element. + + Can you solve it without sorting? + +constraints: | + - `1 <= k <= nums.length <= 10^5` + - `-10^4 <= nums[i] <= 10^4` + +examples: + - input: "nums = [3,2,1,5,6,4], k = 2" + output: "5" + explanation: "The sorted array is [1,2,3,4,5,6]. The 2nd largest element is 5." + - input: "nums = [3,2,3,1,2,4,5,5,6], k = 4" + output: "4" + explanation: "The sorted array is [1,2,2,3,3,4,5,5,6]. The 4th largest element is 4." + +explanation: + intuition: | + Imagine you have a collection of exam scores and you want to find the student who ranked `k`th from the top. The most straightforward approach would be to sort all scores and pick the `k`th one from the end — but can we do better? + + Think of it like this: if you only need to find *one* specific ranking, do you really need to sort *everything*? This is similar to finding the tallest person in a room versus sorting everyone by height — the first task is much simpler. + + The key insight is that we don't need a fully sorted array. We only need to find the element that would be at position `n - k` if the array were sorted (0-indexed). This opens the door to more efficient approaches: + + 1. **Heap approach**: Maintain a "top k" collection using a min-heap of size `k`. Any element smaller than our current `k`th largest can be discarded. + + 2. **Quickselect approach**: Use the partitioning logic from quicksort, but only recurse into the half that contains our target position. + + Both avoid the full `O(n log n)` cost of sorting when we only need partial ordering. + + approach: | + We'll focus on the **Min-Heap approach** as the primary solution due to its consistent performance and clarity: + + **Step 1: Understand the heap strategy** + + - We maintain a min-heap of size `k` + - The min-heap always contains the `k` largest elements seen so far + - The root of the heap (minimum of these `k` elements) is our answer + +   + + **Step 2: Initialise the heap** + + - Create an empty min-heap + - We'll use Python's `heapq` which implements a min-heap + +   + + **Step 3: Process each element** + + - For each number in the array: + - If the heap has fewer than `k` elements, push the number + - Otherwise, if the number is larger than the heap's minimum (root), replace the root with this number + - This ensures we always keep the `k` largest elements + +   + + **Step 4: Return the result** + + - The root of the heap is the `k`th largest element + - Return `heap[0]` + +   + + **Why this works**: By keeping exactly `k` elements and always removing the smallest when we exceed capacity, we guarantee that the smallest element in our heap is larger than all discarded elements — making it exactly the `k`th largest overall. + + common_pitfalls: + - title: Off-by-One with Heap Size + description: | + A common mistake is confusion about when to push vs. replace in the heap. + + If you always push and then pop when size exceeds `k`, you might accidentally pop the element you just added if it's the smallest. The correct approach is to check if the new element is larger than the heap's minimum *before* deciding to add it. + + Alternatively, you can push unconditionally and pop if size exceeds `k` — this is simpler and works correctly, though slightly less efficient. + wrong_approach: "Complex conditional logic that's easy to get wrong" + correct_approach: "Push then pop if size > k, or use heappushpop for efficiency" + + - title: Using Max-Heap Incorrectly + description: | + Some attempt to use a max-heap of the entire array and pop `k-1` times. While correct, this is inefficient: + + - Building a max-heap: `O(n)` + - Popping `k` times: `O(k log n)` + - Total: `O(n + k log n)` + + With a min-heap of size `k`, we get `O(n log k)`, which is better when `k` is small relative to `n`. + wrong_approach: "Max-heap of all elements, pop k-1 times" + correct_approach: "Min-heap of size k, maintaining the k largest" + + - title: Forgetting Python's heapq is Min-Heap Only + description: | + Python's `heapq` only provides a min-heap. To simulate a max-heap, you must negate values when pushing and negate again when popping. + + For this problem, a min-heap is actually what we want — we keep the `k` largest elements by discarding elements smaller than our current `k`th largest. + wrong_approach: "Assuming heapq has a max-heap option" + correct_approach: "Use min-heap directly for finding kth largest" + + key_takeaways: + - "**Partial ordering insight**: When you only need one specific rank, you don't need to sort everything — use a heap or quickselect instead" + - "**Min-heap for top-k**: A min-heap of size `k` naturally maintains the `k` largest elements, with the `k`th largest at the root" + - "**Trade-off awareness**: Heap gives `O(n log k)` guaranteed; Quickselect gives `O(n)` average but `O(n^2)` worst case" + - "**Foundation pattern**: This technique applies to streaming data, top-k frequent elements, and many ranking problems" + + time_complexity: "O(n log k). We iterate through all `n` elements, and each heap operation (push/pop) takes `O(log k)` time since the heap size is bounded by `k`." + space_complexity: "O(k). We maintain a heap containing at most `k` elements." + +solutions: + - approach_name: Min-Heap + is_optimal: true + code: | + import heapq + + def find_kth_largest(nums: list[int], k: int) -> int: + # Min-heap to store the k largest elements + heap = [] + + for num in nums: + # Add current number to heap + heapq.heappush(heap, num) + + # If heap exceeds size k, remove the smallest + # This ensures we keep only the k largest elements + if len(heap) > k: + heapq.heappop(heap) + + # The root of min-heap is the kth largest + return heap[0] + explanation: | + **Time Complexity:** O(n log k) — We process each of `n` elements with heap operations costing `O(log k)`. + + **Space Complexity:** O(k) — The heap stores at most `k` elements. + + This approach maintains a min-heap of the `k` largest elements seen so far. By keeping the heap size at `k` and using a min-heap, the smallest element in our collection (the root) is always the `k`th largest overall. + + - approach_name: Quickselect + is_optimal: true + code: | + import random + + def find_kth_largest(nums: list[int], k: int) -> int: + # Convert kth largest to index in sorted array + # kth largest = element at index (n - k) in ascending order + target_index = len(nums) - k + + def quickselect(left: int, right: int) -> int: + # Random pivot to avoid worst-case on sorted input + pivot_idx = random.randint(left, right) + pivot = nums[pivot_idx] + + # Move pivot to end + nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx] + + # Partition: elements < pivot go to the left + store_idx = left + for i in range(left, right): + if nums[i] < pivot: + nums[store_idx], nums[i] = nums[i], nums[store_idx] + store_idx += 1 + + # Move pivot to its final sorted position + nums[store_idx], nums[right] = nums[right], nums[store_idx] + + # Check if we found the target + if store_idx == target_index: + return nums[store_idx] + elif store_idx < target_index: + # Target is in the right partition + return quickselect(store_idx + 1, right) + else: + # Target is in the left partition + return quickselect(left, store_idx - 1) + + return quickselect(0, len(nums) - 1) + explanation: | + **Time Complexity:** O(n) average, O(n^2) worst case — Average case is linear because we only recurse into one half. Random pivot selection makes worst case very unlikely. + + **Space Complexity:** O(log n) average for recursion stack, O(n) worst case. + + Quickselect uses the partitioning logic from quicksort but only recurses into the partition containing our target index. This reduces the expected work from `O(n log n)` to `O(n)`. + + - approach_name: Sorting + is_optimal: false + code: | + def find_kth_largest(nums: list[int], k: int) -> int: + # Sort in descending order + nums.sort(reverse=True) + + # Return the kth element (0-indexed, so k-1) + return nums[k - 1] + explanation: | + **Time Complexity:** O(n log n) — Dominated by the sorting step. + + **Space Complexity:** O(1) to O(n) — Depends on the sorting algorithm used (in-place vs. not). + + The simplest approach: sort and index. While not optimal for this specific problem, it's worth knowing as a baseline. For small arrays or when `k` is close to `n`, the practical difference may be negligible. diff --git a/backend/data/questions/kth-smallest-element-in-a-bst.yaml b/backend/data/questions/kth-smallest-element-in-a-bst.yaml new file mode 100644 index 0000000..dfc953d --- /dev/null +++ b/backend/data/questions/kth-smallest-element-in-a-bst.yaml @@ -0,0 +1,212 @@ +title: Kth Smallest Element in a BST +slug: kth-smallest-element-in-a-bst +difficulty: medium +leetcode_id: 230 +leetcode_url: https://leetcode.com/problems/kth-smallest-element-in-a-bst/ +categories: + - trees + - recursion +patterns: + - dfs + - tree-traversal + +description: | + Given the `root` of a binary search tree, and an integer `k`, return the `k`th smallest value (**1-indexed**) of all the values of the nodes in the tree. + +constraints: | + - The number of nodes in the tree is `n` + - `1 <= k <= n <= 10^4` + - `0 <= Node.val <= 10^4` + +examples: + - input: "root = [3,1,4,null,2], k = 1" + output: "1" + explanation: "The inorder traversal of the tree is [1, 2, 3, 4]. The 1st smallest element is 1." + - input: "root = [5,3,6,2,4,null,null,1], k = 3" + output: "3" + explanation: "The inorder traversal is [1, 2, 3, 4, 5, 6]. The 3rd smallest element is 3." + +explanation: + intuition: | + The key insight comes from the fundamental property of a **Binary Search Tree (BST)**: for any node, all values in its left subtree are smaller, and all values in its right subtree are larger. + + What does this mean for traversal? If you visit nodes in **inorder** order (left → current → right), you get all values in **sorted ascending order**! + + Imagine walking through the tree: you go as far left as possible first (smallest values), then visit the current node, then explore the right subtree. This naturally produces a sorted sequence. + + So finding the kth smallest becomes simple: perform an inorder traversal and return the kth element you encounter. No need to collect all values first — you can count as you go and stop early once you've found it. + + Think of it like this: the BST is already "pre-sorted" by its structure. The inorder traversal simply reads this sorted order. + + approach: | + We solve this using **Inorder Traversal with Early Termination**: + + **Step 1: Understand the traversal pattern** + + - Inorder traversal visits: left subtree → current node → right subtree + - For a BST, this visits nodes in ascending sorted order + - We count each node we visit until we reach the kth one + +   + + **Step 2: Set up tracking variables** + + - `count`: Track how many nodes we've visited (starts at `0`) + - `result`: Store the kth smallest value once found + +   + + **Step 3: Perform inorder traversal** + + - Recursively traverse the left subtree + - Increment `count` when visiting the current node + - If `count == k`, we've found our answer — store it and stop + - Otherwise, recursively traverse the right subtree + +   + + **Step 4: Early termination** + + - Once we've found the kth element, there's no need to continue traversing + - We can use a flag or check if result is set to stop recursion early + +   + + This approach leverages the BST property to avoid sorting, achieving O(H + k) time where H is the tree height. + + common_pitfalls: + - title: Collecting All Values Then Sorting + description: | + A naive approach collects all node values into a list, sorts it, and returns the kth element. + + This works but wastes both time and space. With n nodes, you'd use O(n) space for the list and O(n log n) time for sorting. + + Since BST's inorder traversal is already sorted, we can get O(H + k) time and O(H) space instead. + wrong_approach: "Collect all values, sort, return kth" + correct_approach: "Use inorder traversal property — already sorted" + + - title: Not Using Early Termination + description: | + Even with inorder traversal, visiting all n nodes is wasteful when k is small. If k = 1, why traverse the entire tree? + + Always check if you've found the kth element and stop early. This improves average case performance significantly, especially when k << n. + wrong_approach: "Complete full inorder traversal" + correct_approach: "Stop as soon as kth element is found" + + - title: Off-by-One Errors with 1-Indexed k + description: | + The problem states k is **1-indexed**, meaning k = 1 refers to the smallest element, not the second smallest. + + Be careful with your counter: if you start counting at 0, the kth element is found when `count == k`, not `count == k - 1`. + + Always clarify the indexing in your head before implementing. + wrong_approach: "Returning element at index k-1 after 0-indexed counting" + correct_approach: "Increment count first, then check if count equals k" + + key_takeaways: + - "**BST inorder = sorted order**: This fundamental property is the key to many BST problems" + - "**Early termination**: Stop traversing once you have the answer — don't process unnecessary nodes" + - "**Iterative vs recursive**: Both work; iterative uses an explicit stack and can be easier to control for early termination" + - "**Follow-up insight**: For frequent queries with modifications, augment nodes with subtree sizes for O(H) queries" + + time_complexity: "O(H + k). We descend H levels to reach the leftmost node, then visit k nodes. For a balanced tree, this is O(log n + k)." + space_complexity: "O(H). The recursion stack or explicit stack holds at most H nodes, where H is the tree height — O(log n) for balanced, O(n) for skewed." + +solutions: + - approach_name: Inorder Traversal (Recursive) + is_optimal: true + code: | + class TreeNode: + def __init__(self, val=0, left=None, right=None): + self.val = val + self.left = left + self.right = right + + def kth_smallest(root: TreeNode | None, k: int) -> int: + count = 0 + result = 0 + + def inorder(node: TreeNode | None) -> bool: + nonlocal count, result + + if not node: + return False + + # Traverse left subtree first (smaller values) + if inorder(node.left): + return True # Already found, stop early + + # Visit current node + count += 1 + if count == k: + result = node.val + return True # Found the kth smallest + + # Traverse right subtree (larger values) + return inorder(node.right) + + inorder(root) + return result + explanation: | + **Time Complexity:** O(H + k) — Descend to leftmost node (H steps), then visit k nodes. + + **Space Complexity:** O(H) — Recursion stack depth equals tree height. + + We perform inorder traversal, counting nodes as we visit them. The `True` return value signals that we've found the answer and should stop recursing. This early termination avoids unnecessary traversal when k is small. + + - approach_name: Inorder Traversal (Iterative with Stack) + is_optimal: true + code: | + def kth_smallest(root: TreeNode | None, k: int) -> int: + stack = [] + current = root + count = 0 + + while stack or current: + # Go as far left as possible + while current: + stack.append(current) + current = current.left + + # Process the leftmost unvisited node + current = stack.pop() + count += 1 + + # Check if this is the kth smallest + if count == k: + return current.val + + # Move to the right subtree + current = current.right + + return -1 # Should never reach here if k is valid + explanation: | + **Time Complexity:** O(H + k) — Same as recursive approach. + + **Space Complexity:** O(H) — Explicit stack replaces recursion stack. + + The iterative approach uses an explicit stack to simulate recursion. We push nodes while going left, pop to visit, then move right. This gives more control over termination and avoids potential stack overflow for very deep trees. + + - approach_name: Collect and Sort (Suboptimal) + is_optimal: false + code: | + def kth_smallest(root: TreeNode | None, k: int) -> int: + values = [] + + def collect(node: TreeNode | None) -> None: + if not node: + return + # Collect all values via any traversal + values.append(node.val) + collect(node.left) + collect(node.right) + + collect(root) + values.sort() + return values[k - 1] # k is 1-indexed + explanation: | + **Time Complexity:** O(n log n) — Collecting is O(n), sorting is O(n log n). + + **Space Complexity:** O(n) — Store all n values. + + This approach ignores the BST property entirely. It collects all values, sorts them, and returns the kth element. While correct, it's inefficient and doesn't leverage the fact that inorder traversal of a BST is already sorted. Included to illustrate why understanding data structure properties matters. diff --git a/backend/data/questions/largest-rectangle-in-histogram.yaml b/backend/data/questions/largest-rectangle-in-histogram.yaml new file mode 100644 index 0000000..d5e6417 --- /dev/null +++ b/backend/data/questions/largest-rectangle-in-histogram.yaml @@ -0,0 +1,204 @@ +title: Largest Rectangle in Histogram +slug: largest-rectangle-in-histogram +difficulty: hard +leetcode_id: 84 +leetcode_url: https://leetcode.com/problems/largest-rectangle-in-histogram/ +categories: + - arrays + - stack +patterns: + - monotonic-stack + +description: | + Given an array of integers `heights` representing the histogram's bar height where the width of each bar is `1`, return *the area of the largest rectangle in the histogram*. + +constraints: | + - `1 <= heights.length <= 10^5` + - `0 <= heights[i] <= 10^4` + +examples: + - input: "heights = [2,1,5,6,2,3]" + output: "10" + explanation: "The largest rectangle is formed using bars at indices 2 and 3 (heights 5 and 6), with width 2 and height 5, giving area = 10 units." + - input: "heights = [2,4]" + output: "4" + explanation: "The largest rectangle is the single bar of height 4 with width 1, giving area = 4 units." + +explanation: + intuition: | + Imagine you're standing at each bar in the histogram, trying to figure out how far you can extend a rectangle horizontally while keeping that bar's height as the minimum. + + For any bar at position `i`, the largest rectangle that uses this bar's height extends: + - **Left**: until we hit a bar shorter than `heights[i]` + - **Right**: until we hit a bar shorter than `heights[i]` + + The area is then `height[i] * (right_boundary - left_boundary - 1)`. + + The brute force approach would check every bar and scan left and right to find boundaries — but that's O(n^2). The key insight is that a **monotonic stack** can find these boundaries efficiently. + + Think of it like this: as you scan left-to-right, maintain a stack of bar indices in **increasing order of height**. When you encounter a bar shorter than the stack's top, you've found the right boundary for all taller bars on the stack. Pop them off and calculate their areas — the new stack top gives you their left boundary. + + This works because the stack invariant guarantees that for any bar we pop, all bars between its left boundary (the new stack top) and right boundary (current index) are at least as tall. + + approach: | + We solve this using a **Monotonic Stack**: + + **Step 1: Initialise variables** + + - `stack`: Empty list to store indices of bars in increasing height order + - `max_area`: Set to `0` to track the largest rectangle found + +   + + **Step 2: Iterate through each bar** + + - For each index `i` with height `heights[i]`: + - While the stack is non-empty AND the current height is less than the height at the stack's top index: + - Pop the top index as `height_idx` — this bar can't extend further right + - Calculate the width: if stack is empty, width is `i` (bar extends to the beginning); otherwise width is `i - stack[-1] - 1` + - Calculate area as `heights[height_idx] * width` + - Update `max_area` if this area is larger + - Push current index `i` onto the stack + +   + + **Step 3: Process remaining bars in the stack** + + - After the loop, bars remaining in the stack extend all the way to the right end + - Pop each remaining index and calculate its area using `n` as the right boundary + - Width calculation: if stack becomes empty, width is `n`; otherwise width is `n - stack[-1] - 1` + +   + + **Step 4: Return the result** + + - Return `max_area` + +   + + The monotonic stack ensures each bar is pushed and popped at most once, giving O(n) time complexity. + + common_pitfalls: + - title: Forgetting to Process Remaining Stack + description: | + After iterating through all bars, some indices may still be on the stack. These represent bars that extend all the way to the right edge of the histogram. + + Forgetting to process these remaining bars will miss valid rectangles. For example, with `heights = [2, 4]`, after the loop the stack contains both indices. Without processing, you'd return `0` instead of `4`. + wrong_approach: "Only calculate areas during the main loop" + correct_approach: "Process remaining stack elements using array length as right boundary" + + - title: Incorrect Width Calculation + description: | + When popping a bar from the stack, its left boundary isn't always the immediately preceding bar — it's the bar at the new stack top (or the start if stack is empty). + + A common mistake is calculating width as `i - popped_index`, but this is wrong. The correct width is `i - stack[-1] - 1` (or just `i` if stack is empty). + + For example, in `[1, 5, 6, 2]`, when we pop index 2 (height 6) at index 3, the width isn't `3 - 2 = 1`. The bar of height 6 can extend left to where height 5 is, so width is `3 - 1 - 1 = 1`. But when we pop index 1 (height 5), the width is `3 - 0 - 1 = 2`. + wrong_approach: "Width = current_index - popped_index" + correct_approach: "Width = current_index - new_stack_top - 1 (or current_index if stack empty)" + + - title: Brute Force Time Limit Exceeded + description: | + The naive O(n^2) approach — for each bar, scan left and right to find boundaries — will cause TLE with `heights.length <= 10^5`. + + 10^5 elements means up to 10^10 operations, which is far too slow. The monotonic stack reduces this to O(n) by computing all boundaries in a single pass. + wrong_approach: "For each bar, scan left and right to find boundaries" + correct_approach: "Use monotonic stack to find boundaries in O(n)" + + key_takeaways: + - "**Monotonic stack pattern**: When you need to find the next smaller/larger element for all positions, a monotonic stack provides O(n) efficiency" + - "**Width calculation insight**: The left boundary for a popped element is always the new stack top, not the previous element" + - "**Process the stack after iteration**: Elements remaining in the stack have implicit right boundaries at the array end" + - "**Foundation for related problems**: This technique extends to problems like Maximal Rectangle, Trapping Rain Water, and daily temperatures" + + time_complexity: "O(n). Each bar index is pushed onto and popped from the stack at most once, giving 2n operations total." + space_complexity: "O(n). In the worst case (strictly increasing heights), all n indices are stored on the stack simultaneously." + +solutions: + - approach_name: Monotonic Stack + is_optimal: true + code: | + def largest_rectangle_area(heights: list[int]) -> int: + stack = [] # Store indices of bars in increasing height order + max_area = 0 + n = len(heights) + + for i in range(n): + # Pop bars that can't extend further right + while stack and heights[i] < heights[stack[-1]]: + height_idx = stack.pop() + height = heights[height_idx] + # Width extends from new stack top to current index + width = i if not stack else i - stack[-1] - 1 + max_area = max(max_area, height * width) + stack.append(i) + + # Process remaining bars - they extend to the right edge + while stack: + height_idx = stack.pop() + height = heights[height_idx] + # Width extends from new stack top to array end + width = n if not stack else n - stack[-1] - 1 + max_area = max(max_area, height * width) + + return max_area + explanation: | + **Time Complexity:** O(n) — Each index is pushed and popped at most once. + + **Space Complexity:** O(n) — Stack may contain all indices in worst case. + + We maintain a stack of indices in increasing height order. When we encounter a shorter bar, we pop taller bars and calculate their areas since they can't extend further right. The stack top after popping gives the left boundary. + + - approach_name: Monotonic Stack with Sentinel + is_optimal: true + code: | + def largest_rectangle_area(heights: list[int]) -> int: + # Add sentinel bars: 0-height at start and end + heights = [0] + heights + [0] + stack = [0] # Stack starts with left sentinel + max_area = 0 + + for i in range(1, len(heights)): + while heights[i] < heights[stack[-1]]: + height = heights[stack.pop()] + # Width between current index and new stack top + width = i - stack[-1] - 1 + max_area = max(max_area, height * width) + stack.append(i) + + return max_area + explanation: | + **Time Complexity:** O(n) — Same as basic approach. + + **Space Complexity:** O(n) — Stack plus modified heights array. + + Adding sentinel bars of height 0 at both ends eliminates edge case handling. The left sentinel ensures the stack is never empty, and the right sentinel forces all remaining bars to be processed. This leads to cleaner code at the cost of a slightly modified input array. + + - approach_name: Brute Force + is_optimal: false + code: | + def largest_rectangle_area(heights: list[int]) -> int: + max_area = 0 + n = len(heights) + + for i in range(n): + height = heights[i] + # Find left boundary - first bar shorter than current + left = i + while left > 0 and heights[left - 1] >= height: + left -= 1 + # Find right boundary - first bar shorter than current + right = i + while right < n - 1 and heights[right + 1] >= height: + right += 1 + # Calculate area with current bar's height + width = right - left + 1 + max_area = max(max_area, height * width) + + return max_area + explanation: | + **Time Complexity:** O(n^2) — For each bar, we scan left and right. + + **Space Complexity:** O(1) — Only tracking indices and area. + + For each bar, we expand left and right while adjacent bars are at least as tall. This finds the maximum width rectangle using each bar's height. While correct, this approach is too slow for large inputs (TLE on LeetCode) because it re-scans the same regions repeatedly. diff --git a/backend/data/questions/last-stone-weight-ii.yaml b/backend/data/questions/last-stone-weight-ii.yaml new file mode 100644 index 0000000..490005f --- /dev/null +++ b/backend/data/questions/last-stone-weight-ii.yaml @@ -0,0 +1,200 @@ +title: Last Stone Weight II +slug: last-stone-weight-ii +difficulty: medium +leetcode_id: 1049 +leetcode_url: https://leetcode.com/problems/last-stone-weight-ii/ +categories: + - dynamic-programming + - arrays +patterns: + - dynamic-programming + +description: | + You are given an array of integers `stones` where `stones[i]` is the weight of the ith stone. + + We are playing a game with the stones. On each turn, we choose any two stones and smash them together. Suppose the stones have weights `x` and `y` with `x <= y`. The result of this smash is: + + - If `x == y`, both stones are destroyed, and + - If `x != y`, the stone of weight `x` is destroyed, and the stone of weight `y` has new weight `y - x`. + + At the end of the game, there is **at most one** stone left. + + Return *the smallest possible weight of the left stone*. If there are no stones left, return `0`. + +constraints: | + - `1 <= stones.length <= 30` + - `1 <= stones[i] <= 100` + +examples: + - input: "stones = [2,7,4,1,8,1]" + output: "1" + explanation: "We can combine 2 and 4 to get 2, so the array converts to [2,7,1,8,1] then, we can combine 7 and 8 to get 1, so the array converts to [2,1,1,1] then, we can combine 2 and 1 to get 1, so the array converts to [1,1,1] then, we can combine 1 and 1 to get 0, so the array converts to [1], then that's the optimal value." + - input: "stones = [31,26,33,21,40]" + output: "5" + explanation: "One way is to smash 31 and 33 to get 2, then smash 26 and 21 to get 5, then smash 40 and 5 to get 35, then smash 35 and 2 to get 33, then smash 33 and 5 to get 28... Actually, the minimum achievable is 5 by optimal partitioning." + +explanation: + intuition: | + At first glance, this looks like a simulation problem — keep smashing stones until one remains. But simulating every possible order of smashing would be exponentially complex. There must be a deeper insight. + + Here's the key realisation: **smashing is equivalent to assigning signs**. When we smash stones `x` and `y`, we get `|y - x|`. If we keep smashing the results, we're essentially computing `|(a - b) - c|` = `|a - b - c|` or similar expressions. In the end, each original stone contributes either positively or negatively to the final result. + + Think of it like this: imagine labelling each stone with `+` or `-`. The final stone's weight equals `|sum of stones with + signs| - |sum of stones with - signs|`. We want to minimise this difference. + + This transforms the problem into: **partition the stones into two groups such that the absolute difference between their sums is minimised**. This is the classic "minimum subset sum difference" problem, which is a variant of the 0/1 knapsack. + + If the total sum is `S`, and one group has sum `subset_sum`, the other has sum `S - subset_sum`. The difference is `|S - 2 * subset_sum|`. To minimise this, we want `subset_sum` as close to `S / 2` as possible. + + approach: | + We solve this using **0/1 Knapsack DP** to find the closest achievable sum to half the total: + + **Step 1: Calculate the target** + + - Compute `total = sum(stones)` + - Our goal: find the largest `subset_sum <= total // 2` that we can form + - The answer will be `total - 2 * subset_sum` + +   + + **Step 2: Create the DP set** + + - Use a set `dp` to track all achievable sums + - Initialise with `{0}` — we can always achieve sum 0 (empty subset) + +   + + **Step 3: Process each stone** + + - For each stone, we can either include it or not (0/1 knapsack) + - For each existing sum `s` in our set, `s + stone` is now also achievable + - Add these new sums to our set (but only up to `total // 2` to save space) + +   + + **Step 4: Find the best partition** + + - The largest value in `dp` that doesn't exceed `total // 2` is our best `subset_sum` + - Return `total - 2 * subset_sum` + +   + + The set-based approach is elegant and efficient for this problem's constraints. With `stones.length <= 30` and `stones[i] <= 100`, the maximum total is 3000, making this approach very practical. + + common_pitfalls: + - title: Trying to Simulate Smashing + description: | + The naive approach of simulating all possible smashing orders has exponential complexity. With 30 stones, there are far too many orderings to try. + + The key insight is recognising this as a partitioning problem, not a simulation problem. Once you see that smashing assigns implicit +/- signs, the path to DP becomes clear. + wrong_approach: "Recursively try all pairs of stones to smash" + correct_approach: "Reduce to minimum partition difference using DP" + + - title: Using Unbounded Knapsack + description: | + Unlike Coin Change where each coin can be used infinitely, here each stone can only be used **once**. This is 0/1 knapsack, not unbounded knapsack. + + If you iterate incorrectly, you might count the same stone multiple times, leading to wrong answers. + wrong_approach: "for stone in stones: for s in dp: add s + stone" + correct_approach: "Process stones one at a time, updating dp carefully" + + - title: Forgetting to Limit the Target + description: | + Since we want subset_sum closest to `total / 2`, we only need to track sums up to `total // 2`. Tracking larger sums is redundant — if one group has sum `> total / 2`, the other has sum `< total / 2`, and we'd already have that smaller sum. + + This optimisation keeps memory usage reasonable. + wrong_approach: "Track all possible sums up to total" + correct_approach: "Only track sums up to total // 2" + + key_takeaways: + - "**Problem reduction**: Recognising that smashing stones = partitioning with +/- signs transforms an intractable simulation into a classic DP problem" + - "**0/1 Knapsack pattern**: Each stone can be used at most once — this is the defining characteristic of 0/1 knapsack" + - "**Minimum partition difference**: Finding two subsets with minimum sum difference is equivalent to finding one subset closest to half the total" + - "**Set-based DP**: Using a set to track achievable sums is clean and efficient for moderate constraints" + + time_complexity: "O(n × S) where n is the number of stones and S is the total sum. For each stone, we potentially add up to S/2 new sums." + space_complexity: "O(S) where S is the total sum. The set stores at most S/2 + 1 achievable sums." + +solutions: + - approach_name: Set-Based DP + is_optimal: true + code: | + def last_stone_weight_ii(stones: list[int]) -> int: + total = sum(stones) + target = total // 2 + + # dp stores all achievable subset sums + dp = {0} + + for stone in stones: + # For each existing sum, we can add this stone + # Create new set to avoid modifying during iteration + new_sums = set() + for s in dp: + if s + stone <= target: + new_sums.add(s + stone) + dp.update(new_sums) + + # Find largest achievable sum <= target + best_sum = max(dp) + + # Difference between two groups: (total - best_sum) - best_sum + return total - 2 * best_sum + explanation: | + **Time Complexity:** O(n × S) — For each of n stones, we iterate through sums up to S/2. + + **Space Complexity:** O(S) — Set stores achievable sums up to S/2. + + We track all achievable subset sums using a set. For each stone, we compute new achievable sums by adding it to existing sums. The largest sum we can achieve that doesn't exceed half the total gives us the closest partition to equal, minimising the leftover stone weight. + + - approach_name: Boolean Array DP + is_optimal: true + code: | + def last_stone_weight_ii(stones: list[int]) -> int: + total = sum(stones) + target = total // 2 + + # dp[i] = True if sum i is achievable + dp = [False] * (target + 1) + dp[0] = True # Empty subset has sum 0 + + for stone in stones: + # Iterate backwards to avoid using same stone twice + for s in range(target, stone - 1, -1): + if dp[s - stone]: + dp[s] = True + + # Find largest achievable sum + for s in range(target, -1, -1): + if dp[s]: + return total - 2 * s + + return total # Fallback (shouldn't reach here) + explanation: | + **Time Complexity:** O(n × S) — Same as set-based approach. + + **Space Complexity:** O(S) — Boolean array of size S/2 + 1. + + This uses a classic 0/1 knapsack boolean array. The key is iterating **backwards** when updating — this ensures each stone is only counted once per subset. If we iterated forwards, we'd potentially add the same stone multiple times. + + - approach_name: Brute Force (Exponential) + is_optimal: false + code: | + def last_stone_weight_ii(stones: list[int]) -> int: + def find_min_diff(index: int, sum1: int, sum2: int) -> int: + # Base case: all stones assigned + if index == len(stones): + return abs(sum1 - sum2) + + # Try putting current stone in group 1 or group 2 + put_in_group1 = find_min_diff(index + 1, sum1 + stones[index], sum2) + put_in_group2 = find_min_diff(index + 1, sum1, sum2 + stones[index]) + + return min(put_in_group1, put_in_group2) + + return find_min_diff(0, 0, 0) + explanation: | + **Time Complexity:** O(2^n) — Each stone has 2 choices, giving 2^n subsets. + + **Space Complexity:** O(n) — Recursion stack depth. + + This brute force tries all possible partitions by assigning each stone to either group. While correct, it's far too slow for n=30 (over a billion combinations). Included to illustrate the problem structure and why DP is necessary. diff --git a/backend/data/questions/last-stone-weight.yaml b/backend/data/questions/last-stone-weight.yaml new file mode 100644 index 0000000..117f3ff --- /dev/null +++ b/backend/data/questions/last-stone-weight.yaml @@ -0,0 +1,165 @@ +title: Last Stone Weight +slug: last-stone-weight +difficulty: easy +leetcode_id: 1046 +leetcode_url: https://leetcode.com/problems/last-stone-weight/ +categories: + - arrays + - heap +patterns: + - heap + +description: | + You are given an array of integers `stones` where `stones[i]` is the weight of the ith stone. + + We are playing a game with the stones. On each turn, we choose the **heaviest two stones** and smash them together. Suppose the heaviest two stones have weights `x` and `y` with `x <= y`. The result of this smash is: + + - If `x == y`, both stones are destroyed, and + - If `x != y`, the stone of weight `x` is destroyed, and the stone of weight `y` has new weight `y - x`. + + At the end of the game, there is **at most one** stone left. + + Return *the weight of the last remaining stone*. If there are no stones left, return `0`. + +constraints: | + - `1 <= stones.length <= 30` + - `1 <= stones[i] <= 1000` + +examples: + - input: "stones = [2,7,4,1,8,1]" + output: "1" + explanation: "We combine 7 and 8 to get 1 so the array converts to [2,4,1,1,1], then we combine 2 and 4 to get 2 so the array converts to [2,1,1,1], then we combine 2 and 1 to get 1 so the array converts to [1,1,1], then we combine 1 and 1 to get 0 so the array converts to [1]. That's the value of the last stone." + - input: "stones = [1]" + output: "1" + explanation: "There is only one stone, so we return its weight directly." + +explanation: + intuition: | + Imagine you have a collection of rocks and you keep smashing the two largest ones together. After each collision, either both rocks disappear (if equal weight) or you're left with a smaller rock (the difference in weights). + + The key insight is that we always need quick access to the **two heaviest stones**. After smashing, we might need to put a new stone back and find the next two heaviest. This screams for a **max heap** (priority queue) — a data structure designed for exactly this: efficiently finding and removing the maximum element. + + Think of it like this: the heap is a "smart pile" that always keeps the biggest stone on top. When you grab the top two stones, smash them, and toss the result back in, the pile automatically rearranges to put the new biggest stone on top. + + Without a heap, you'd have to re-sort the array after each smash, which is inefficient. The heap lets you do the same thing in logarithmic time per operation. + + approach: | + We solve this using a **Max Heap** approach: + + **Step 1: Build a max heap from the stones** + + - Python's `heapq` module implements a *min* heap, so we negate all values to simulate a max heap + - Use `heapify()` to convert the list into a heap in O(n) time + +   + + **Step 2: Simulate the smashing process** + + - While the heap has more than one stone: + - Pop the two largest stones (negate to get actual values) + - If they're not equal, push the difference back (negated) + - If they're equal, both are destroyed (don't push anything back) + +   + + **Step 3: Return the result** + + - If the heap is empty, return `0` (all stones destroyed each other) + - Otherwise, return the remaining stone's weight (negated back to positive) + +   + + This simulation directly follows the problem rules, and the heap ensures we always grab the two heaviest stones efficiently. + + common_pitfalls: + - title: Using Sort Instead of Heap + description: | + A tempting approach is to sort the array, take the two largest, and re-sort after each operation. While this works, it's inefficient: + + - Sorting takes O(n log n) per smash + - With up to n smashes, total time becomes O(n^2 log n) + + The heap approach does each operation in O(log n), giving O(n log n) total. + + For this problem's small constraints (n <= 30), sorting works fine, but heaps are the right tool for this pattern. + wrong_approach: "Re-sorting after each smash" + correct_approach: "Use a max heap for O(log n) insert/extract" + + - title: Forgetting Python Uses Min Heap + description: | + Python's `heapq` is a min heap, not a max heap. If you push positive values, `heappop()` gives you the *smallest* element, not the largest. + + The fix is to negate values: push `-stone` and negate again when popping. This "flips" the ordering so the largest original value becomes the smallest negated value (and thus pops first). + wrong_approach: "Using heapq with positive values" + correct_approach: "Negate values to simulate max heap" + + - title: Not Handling the Empty Heap Case + description: | + When all stones perfectly cancel out (e.g., `[2, 2]`), the heap becomes empty. You must check if the heap is empty before trying to return the last element. + + Returning `0` when the heap is empty is specified in the problem: "If there are no stones left, return `0`." + + key_takeaways: + - "**Max heap pattern**: When you need repeated access to the maximum (or minimum) element with insertions, use a heap" + - "**Python heap trick**: Negate values to convert `heapq` (min heap) into a max heap" + - "**Simulation problems**: Sometimes the solution is just carefully implementing the rules with the right data structure" + - "**Foundation for harder problems**: This pattern extends to problems like merging stones with costs, scheduling, or any greedy selection of extremes" + + time_complexity: "O(n log n). We perform at most `n` heap operations (pop and push), and each operation takes O(log n) time." + space_complexity: "O(n). We store all stones in the heap initially. In-place heapify uses no extra space beyond the input." + +solutions: + - approach_name: Max Heap + is_optimal: true + code: | + import heapq + + def last_stone_weight(stones: list[int]) -> int: + # Negate values to simulate max heap (Python's heapq is min heap) + heap = [-s for s in stones] + heapq.heapify(heap) + + # Smash stones until one or none remain + while len(heap) > 1: + # Pop two heaviest stones (negate to get actual values) + first = -heapq.heappop(heap) + second = -heapq.heappop(heap) + + # If they're not equal, push the difference back + if first != second: + heapq.heappush(heap, -(first - second)) + + # Return last stone or 0 if none left + return -heap[0] if heap else 0 + explanation: | + **Time Complexity:** O(n log n) — Each of the up to n-1 smash operations involves two pops and at most one push, each O(log n). + + **Space Complexity:** O(n) — The heap stores all n stones initially. + + We use a max heap to efficiently find and remove the two heaviest stones. After each smash, if there's a remainder, we push it back. The process continues until at most one stone remains. + + - approach_name: Sorting (Simulation) + is_optimal: false + code: | + def last_stone_weight(stones: list[int]) -> int: + # Keep smashing until one or none left + while len(stones) > 1: + # Sort to get heaviest at the end + stones.sort() + + # Pop two heaviest + first = stones.pop() + second = stones.pop() + + # If not equal, push remainder back + if first != second: + stones.append(first - second) + + # Return last stone or 0 if none + return stones[0] if stones else 0 + explanation: | + **Time Complexity:** O(n^2 log n) — We sort (O(n log n)) up to n times. + + **Space Complexity:** O(1) — We modify the input list in-place (or O(n) if counting sort's internal space). + + This approach sorts the array each iteration to find the two heaviest stones. It's simpler to understand but less efficient. Works fine for the small constraints (n <= 30) but doesn't scale well. The heap approach is preferred for interview settings to demonstrate knowledge of efficient data structures. diff --git a/backend/data/questions/lemonade-change.yaml b/backend/data/questions/lemonade-change.yaml new file mode 100644 index 0000000..bfb5435 --- /dev/null +++ b/backend/data/questions/lemonade-change.yaml @@ -0,0 +1,177 @@ +title: Lemonade Change +slug: lemonade-change +difficulty: easy +leetcode_id: 860 +leetcode_url: https://leetcode.com/problems/lemonade-change/ +categories: + - arrays +patterns: + - greedy + +description: | + At a lemonade stand, each lemonade costs `$5`. Customers are standing in a queue to buy from you and order one at a time (in the order specified by `bills`). Each customer will only buy one lemonade and pay with either a `$5`, `$10`, or `$20` bill. You must provide the correct change to each customer so that the net transaction is that the customer pays `$5`. + + Note that you do not have any change in hand at first. + + Given an integer array `bills` where `bills[i]` is the bill the ith customer pays, return `true` *if you can provide every customer with the correct change*, or `false` *otherwise*. + +constraints: | + - `1 <= bills.length <= 10^5` + - `bills[i]` is either `5`, `10`, or `20` + +examples: + - input: "bills = [5,5,5,10,20]" + output: "true" + explanation: "From the first 3 customers, we collect three $5 bills. From the fourth customer, we collect a $10 bill and give back a $5. From the fifth customer, we give a $10 bill and a $5 bill. Since all customers got correct change, we output true." + - input: "bills = [5,5,10,10,20]" + output: "false" + explanation: "From the first two customers, we collect two $5 bills. For the next two customers, we collect a $10 bill and give back a $5 bill each. For the last customer, we cannot give the change of $15 back because we only have two $10 bills (no $5 bills left)." + +explanation: + intuition: | + Imagine you're actually running a lemonade stand with a cash register. You start with an empty register and customers line up to pay. + + The key insight is that **$5 bills are the most valuable** — not because of their face value, but because they're the most *versatile* for making change. A $5 bill can be used to: + - Give change for a $10 bill (need one $5) + - Give change for a $20 bill (need one $10 + one $5, OR three $5s) + + Think of it like this: when a customer pays with $20, you have two options for the $15 change: + 1. One $10 + one $5 (preferred — uses the less versatile $10) + 2. Three $5 bills (backup — depletes your precious $5s) + + The greedy choice is always to **preserve your $5 bills** when possible. Use $10 bills first when giving change for $20, because $10 bills can only be used for one purpose (change for $20), while $5 bills can be used for both $10 and $20 transactions. + + approach: | + We solve this using a **Greedy Simulation**: + + **Step 1: Initialise counters** + + - `fives`: Count of $5 bills in hand, starts at `0` + - `tens`: Count of $10 bills in hand, starts at `0` + - We don't need to track $20 bills — they can never be used as change + +   + + **Step 2: Process each customer in order** + + - **If customer pays $5**: No change needed. Increment `fives` by 1 + - **If customer pays $10**: Need to give $5 change. If `fives == 0`, return `false`. Otherwise, decrement `fives`, increment `tens` + - **If customer pays $20**: Need to give $15 change. Try the greedy choice first: + - If we have at least one $10 and one $5: use them (decrement both) + - Else if we have at least three $5s: use them (decrement `fives` by 3) + - Else: return `false` — we can't make change + +   + + **Step 3: Return the result** + + - If we process all customers successfully, return `true` + +   + + The greedy strategy works because using a $10 bill when available always leaves us in a better (or equal) position than using three $5 bills. + + common_pitfalls: + - title: Not Prioritising $10 Bills for $20 Change + description: | + When giving change for $20, some might randomly choose between using three $5s or one $10 + one $5. This can lead to failure. + + For example, with `bills = [5,5,10,20,5,5,5,10,20,20]`: + - If you use three $5s for the first $20 instead of $10 + $5, you might run out of $5 bills later when a $10 customer arrives + + Always prefer using $10 bills for $20 change — $5 bills are more versatile. + wrong_approach: "Random or first-available bill selection" + correct_approach: "Greedy: prefer $10 + $5 over three $5s" + + - title: Tracking $20 Bills + description: | + It's tempting to track all bill types, but $20 bills are useless for making change. You can never give a $20 bill back to a customer (since the most change needed is $15). + + Tracking $20 bills wastes space and adds unnecessary complexity. + wrong_approach: "Maintaining a counter for $20 bills" + correct_approach: "Only track $5 and $10 bills" + + - title: Forgetting the Empty Register Start + description: | + The problem states you start with no change. If the first customer pays with anything other than $5, you immediately fail. + + For input `bills = [10,5,5,5,5]`, the answer is `false` because you can't give change to the very first customer. + wrong_approach: "Assuming some initial change is available" + correct_approach: "Start with fives = 0 and tens = 0" + + key_takeaways: + - "**Greedy pattern**: When multiple valid choices exist, prefer the one that keeps more options open (preserve $5 bills)" + - "**Simulation**: Sometimes the best approach is to simulate the process step by step" + - "**Track only what matters**: $20 bills are never used as change, so don't track them" + - "**Order matters**: The greedy choice ($10 + $5 over three $5s) ensures we handle future customers optimally" + + time_complexity: "O(n). We process each customer exactly once with O(1) operations per customer." + space_complexity: "O(1). We only use two integer counters (`fives` and `tens`) regardless of input size." + +solutions: + - approach_name: Greedy Simulation + is_optimal: true + code: | + def lemonade_change(bills: list[int]) -> bool: + # Track only $5 and $10 bills (we never use $20 for change) + fives = 0 + tens = 0 + + for bill in bills: + if bill == 5: + # No change needed, just collect the $5 + fives += 1 + + elif bill == 10: + # Need to give $5 change + if fives == 0: + return False # Can't make change + fives -= 1 + tens += 1 + + else: # bill == 20 + # Need to give $15 change + # Greedy: prefer using $10 + $5 to preserve $5 bills + if tens > 0 and fives > 0: + tens -= 1 + fives -= 1 + elif fives >= 3: + fives -= 3 + else: + return False # Can't make change + + return True # Successfully served all customers + explanation: | + **Time Complexity:** O(n) — Single pass through the bills array. + + **Space Complexity:** O(1) — Only two counters used. + + We simulate serving each customer, making the greedy choice to preserve $5 bills when possible. The key insight is that $10 + $5 is always preferred over three $5s for $20 change, because $5 bills are needed for both $10 and $20 transactions while $10 bills are only useful for $20 transactions. + + - approach_name: Brute Force Simulation + is_optimal: false + code: | + def lemonade_change(bills: list[int]) -> bool: + # Track all bills (including $20, though unnecessary) + cash = {5: 0, 10: 0, 20: 0} + + for bill in bills: + cash[bill] += 1 # Receive payment + change_needed = bill - 5 + + # Try to make change using largest bills first + for denomination in [20, 10, 5]: + while change_needed >= denomination and cash[denomination] > 0: + change_needed -= denomination + cash[denomination] -= 1 + + if change_needed > 0: + return False # Couldn't make exact change + + return True + explanation: | + **Time Complexity:** O(n) — Still linear, but with more operations per customer. + + **Space Complexity:** O(1) — Fixed-size dictionary. + + This approach uses a general change-making algorithm: try to use the largest bills first. While it works, it's unnecessarily complex for this specific problem. It also tracks $20 bills (which are never used) and uses a loop where direct conditionals suffice. The greedy simulation above is cleaner and more efficient in practice. diff --git a/backend/data/questions/letter-combinations-of-a-phone-number.yaml b/backend/data/questions/letter-combinations-of-a-phone-number.yaml new file mode 100644 index 0000000..1fb3509 --- /dev/null +++ b/backend/data/questions/letter-combinations-of-a-phone-number.yaml @@ -0,0 +1,211 @@ +title: Letter Combinations of a Phone Number +slug: letter-combinations-of-a-phone-number +difficulty: medium +leetcode_id: 17 +leetcode_url: https://leetcode.com/problems/letter-combinations-of-a-phone-number/ +categories: + - strings + - hash-tables + - recursion +patterns: + - backtracking + +description: | + Given a string containing digits from `2-9` inclusive, return all possible letter combinations that the number could represent. Return the answer in **any order**. + + A mapping of digits to letters (just like on the telephone buttons) is given below. Note that `1` does not map to any letters. + + | Digit | Letters | + |-------|---------| + | 2 | a, b, c | + | 3 | d, e, f | + | 4 | g, h, i | + | 5 | j, k, l | + | 6 | m, n, o | + | 7 | p, q, r, s | + | 8 | t, u, v | + | 9 | w, x, y, z | + +constraints: | + - `0 <= digits.length <= 4` + - `digits[i]` is a digit in the range `['2', '9']` + +examples: + - input: 'digits = "23"' + output: '["ad","ae","af","bd","be","bf","cd","ce","cf"]' + explanation: "Digit 2 maps to 'abc' and digit 3 maps to 'def'. Combining each letter from 2 with each letter from 3 gives 9 combinations." + - input: 'digits = ""' + output: "[]" + explanation: "Empty input returns an empty list." + - input: 'digits = "2"' + output: '["a","b","c"]' + explanation: "Digit 2 maps to 'abc', so we return all three letters." + +explanation: + intuition: | + Think of this problem like an old-school phone where you had to press buttons multiple times to type letters. Each digit opens up a **set of choices**, and we need to explore every possible combination of those choices. + + Imagine you're at a crossroads where each path branches into multiple smaller paths. For the input `"23"`: + - First, you see three paths: `a`, `b`, `c` (from digit `2`) + - From each of those paths, you see three more paths: `d`, `e`, `f` (from digit `3`) + - You must walk down every possible route to collect all combinations + + This is a classic **backtracking** scenario: build a solution character by character, explore all possibilities at each step, and backtrack to try other options. + + The key insight is that we're essentially computing a **Cartesian product** of letter sets. For `n` digits where each digit maps to `k` letters on average, we'll generate roughly `k^n` combinations — and we need to visit them all. + + approach: | + We solve this using **Backtracking (DFS)**: + + **Step 1: Handle the edge case** + + - If the input `digits` is empty, return an empty list immediately + - This avoids unnecessary processing and edge case bugs + +   + + **Step 2: Create the digit-to-letter mapping** + + - Build a dictionary mapping each digit (`'2'` through `'9'`) to its corresponding letters + - Example: `'2'` → `'abc'`, `'7'` → `'pqrs'` + +   + + **Step 3: Define a recursive backtracking function** + + - `backtrack(index, current_combination)`: + - `index`: which digit we're currently processing + - `current_combination`: the string built so far + +   + + **Step 4: Base case — combination complete** + + - If `index == len(digits)`, we've processed all digits + - Add `current_combination` to our results list + +   + + **Step 5: Recursive case — explore all letters for current digit** + + - Get the letters corresponding to `digits[index]` + - For each letter: + - Append it to `current_combination` + - Recursively call `backtrack(index + 1, ...)` + - The recursion naturally "backtracks" when it returns + +   + + **Step 6: Return all collected combinations** + + - After the recursion completes, return the results list + + common_pitfalls: + - title: Forgetting the Empty Input Case + description: | + If `digits = ""`, you should return `[]`, not `[""]`. + + A common mistake is initializing the result with an empty string and building from there, which would incorrectly return `[""]` for empty input. + + Always check for empty input at the start and return an empty list. + wrong_approach: "Returning [''] for empty input" + correct_approach: "Check if digits is empty and return [] immediately" + + - title: Using Iteration Instead of Backtracking + description: | + While you can solve this iteratively by building combinations level by level, it's harder to visualise and more error-prone. + + The iterative approach works but misses the opportunity to practice the fundamental backtracking pattern that's essential for harder problems like N-Queens, permutations, and subsets. + wrong_approach: "Complex iterative logic with nested loops" + correct_approach: "Clean recursive backtracking with clear base case" + + - title: String Concatenation in Loops + description: | + In Python, repeatedly concatenating strings with `+` in a loop creates new string objects each time, leading to O(n^2) behaviour. + + For this problem with `digits.length <= 4`, it's not a performance issue. But for larger inputs, use a list and `''.join()` at the end. + wrong_approach: "current = current + letter in tight loops" + correct_approach: "Use list append and join, or accept small overhead for clarity" + + key_takeaways: + - "**Backtracking template**: This problem demonstrates the core backtracking pattern — make a choice, explore, unmake the choice (implicitly via recursion)" + - "**Cartesian product**: Combining elements from multiple sets is a fundamental operation that backtracking handles elegantly" + - "**Hash map for mapping**: Using a dictionary to map digits to letters keeps the code clean and extensible" + - "**Foundation for harder problems**: This exact pattern scales to permutations, combinations, subsets, and constraint satisfaction problems" + + time_complexity: "O(4^n * n) where `n` is the length of `digits`. In the worst case (all 7s or 9s), each digit maps to 4 letters. We generate up to 4^n combinations, and each combination takes O(n) time to build." + space_complexity: "O(n) for the recursion stack depth, not counting the output. The maximum recursion depth equals the number of digits." + +solutions: + - approach_name: Backtracking (DFS) + is_optimal: true + code: | + def letter_combinations(digits: str) -> list[str]: + # Edge case: empty input + if not digits: + return [] + + # Mapping of digits to letters (like a phone keypad) + phone_map = { + '2': 'abc', '3': 'def', '4': 'ghi', '5': 'jkl', + '6': 'mno', '7': 'pqrs', '8': 'tuv', '9': 'wxyz' + } + + result = [] + + def backtrack(index: int, current: str) -> None: + # Base case: we've processed all digits + if index == len(digits): + result.append(current) + return + + # Get letters for current digit + letters = phone_map[digits[index]] + + # Try each letter and recurse + for letter in letters: + backtrack(index + 1, current + letter) + + # Start backtracking from index 0 with empty string + backtrack(0, "") + return result + explanation: | + **Time Complexity:** O(4^n * n) — We generate up to 4^n combinations (when digits are 7 or 9), and building each string takes O(n). + + **Space Complexity:** O(n) — Recursion stack depth equals the number of digits. + + The backtracking approach naturally explores all paths in the decision tree. Each recursive call handles one digit, trying all its letters before returning. This pattern is fundamental to many combinatorial problems. + + - approach_name: Iterative (BFS-like) + is_optimal: false + code: | + def letter_combinations(digits: str) -> list[str]: + # Edge case: empty input + if not digits: + return [] + + phone_map = { + '2': 'abc', '3': 'def', '4': 'ghi', '5': 'jkl', + '6': 'mno', '7': 'pqrs', '8': 'tuv', '9': 'wxyz' + } + + # Start with empty combination + result = [""] + + # Process each digit + for digit in digits: + letters = phone_map[digit] + # Build new combinations by appending each letter + new_result = [] + for combination in result: + for letter in letters: + new_result.append(combination + letter) + result = new_result + + return result + explanation: | + **Time Complexity:** O(4^n * n) — Same as backtracking, we still generate all combinations. + + **Space Complexity:** O(4^n) — We store all intermediate combinations at each level. + + This iterative approach builds combinations level by level. While it works, it uses more space than backtracking and doesn't teach the fundamental recursive pattern. It's included to show an alternative perspective. diff --git a/backend/data/questions/lfu-cache.yaml b/backend/data/questions/lfu-cache.yaml new file mode 100644 index 0000000..b255bb0 --- /dev/null +++ b/backend/data/questions/lfu-cache.yaml @@ -0,0 +1,342 @@ +title: LFU Cache +slug: lfu-cache +difficulty: hard +leetcode_id: 460 +leetcode_url: https://leetcode.com/problems/lfu-cache/ +categories: + - hash-tables + - linked-lists +patterns: + - heap + +description: | + Design and implement a data structure for a **Least Frequently Used (LFU)** cache. + + Implement the `LFUCache` class: + + - `LFUCache(int capacity)` Initialises the object with the `capacity` of the data structure. + - `int get(int key)` Gets the value of the `key` if the `key` exists in the cache. Otherwise, returns `-1`. + - `void put(int key, int value)` Update the value of the `key` if present, or inserts the `key` if not already present. When the cache reaches its `capacity`, it should invalidate and remove the **least frequently used** key before inserting a new item. For this problem, when there is a **tie** (i.e., two or more keys with the same frequency), the **least recently used** key would be invalidated. + + To determine the least frequently used key, a **use counter** is maintained for each key in the cache. The key with the smallest **use counter** is the least frequently used key. + + When a key is first inserted into the cache, its **use counter** is set to `1` (due to the `put` operation). The **use counter** for a key in the cache is incremented when either a `get` or `put` operation is called on it. + + The functions `get` and `put` must each run in **O(1)** average time complexity. + +constraints: | + - `1 <= capacity <= 10^4` + - `0 <= key <= 10^5` + - `0 <= value <= 10^9` + - At most `2 * 10^5` calls will be made to `get` and `put`. + +examples: + - input: | + ["LFUCache", "put", "put", "get", "put", "get", "get", "put", "get", "get", "get"] + [[2], [1, 1], [2, 2], [1], [3, 3], [2], [3], [4, 4], [1], [3], [4]] + output: "[null, null, null, 1, null, -1, 3, null, -1, 3, 4]" + explanation: | + LFUCache lfu = new LFUCache(2); + lfu.put(1, 1); // cache=[1,_], cnt(1)=1 + lfu.put(2, 2); // cache=[2,1], cnt(2)=1, cnt(1)=1 + lfu.get(1); // return 1, cache=[1,2], cnt(1)=2 + lfu.put(3, 3); // 2 is the LFU key because cnt(2)=1 is smallest, invalidate 2 + lfu.get(2); // return -1 (not found) + lfu.get(3); // return 3, cache=[3,1], cnt(3)=2 + lfu.put(4, 4); // Both 1 and 3 have cnt=2, but 1 is LRU, invalidate 1 + lfu.get(1); // return -1 (not found) + lfu.get(3); // return 3, cnt(3)=3 + lfu.get(4); // return 4, cnt(4)=2 + +explanation: + intuition: | + Think of the LFU cache like a **library with limited shelf space**. Each book has two properties: how many times it's been checked out (frequency) and when it was last touched (recency). When the shelves are full and a new book arrives, you remove the book with the fewest checkouts. If two books tie on checkout count, you remove the one that was touched longer ago. + + The challenge is achieving O(1) operations. A naive approach might scan all items to find the minimum frequency, but that's O(n). The key insight is to **group items by their frequency** using a clever data structure combination: + + 1. **Hash map for key lookup** — Instant access to any item's data and frequency + 2. **Frequency buckets** — Group all items with the same frequency together + 3. **Ordered list within each bucket** — Track recency order for tie-breaking + + When an item's frequency increases (from access), we simply move it from one bucket to the next. When we need to evict, we go to the lowest frequency bucket and remove the oldest item (the tail of that bucket's list). + + The trick to maintaining O(1) is tracking the `min_freq` variable. It only ever increases by 1 (when items are accessed) or resets to 1 (when new items are inserted). We never need to search for the minimum. + + approach: | + We solve this using **Two Hash Maps + Doubly-Linked Lists**: + + **Step 1: Define the data structures** + + - `key_to_node`: Hash map from key to node (stores value, frequency, and list position) + - `freq_to_list`: Hash map from frequency to a doubly-linked list of nodes with that frequency + - `min_freq`: Integer tracking the current minimum frequency in the cache + - `capacity`: Maximum number of items the cache can hold + - `size`: Current number of items in the cache + +   + + **Step 2: Implement the `get` operation** + + - If key doesn't exist, return `-1` + - If key exists: + - Remove the node from its current frequency list + - Increment the node's frequency + - Add the node to the new frequency list (at the head, marking it as most recently used) + - Update `min_freq` if the old frequency list is now empty and was the minimum + - Return the value + +   + + **Step 3: Implement the `put` operation** + + - If capacity is 0, do nothing + - If key exists: + - Update the value + - Call the same "touch" logic as `get` to update frequency + - If key doesn't exist: + - If cache is at capacity, evict the LFU item (tail of `freq_to_list[min_freq]`) + - Create a new node with frequency 1 + - Add to `key_to_node` and `freq_to_list[1]` + - Reset `min_freq` to 1 (new items always have the lowest possible frequency) + +   + + **Step 4: Implement helper for moving nodes between frequency lists** + + - Remove node from old frequency's list + - If that list becomes empty and it was `min_freq`, increment `min_freq` + - Add node to new frequency's list at the head (most recently used position) + +   + + Using doubly-linked lists allows O(1) removal from anywhere and O(1) insertion at head. The hash maps provide O(1) key lookup and O(1) access to any frequency bucket. + + common_pitfalls: + - title: Using a Min-Heap for Frequency Tracking + description: | + A natural instinct is to use a min-heap to always know the minimum frequency. However, heaps have O(log n) operations for insertion and deletion. + + The problem requires O(1) average time. Instead, track `min_freq` as a simple integer that only changes in predictable ways: it resets to 1 on insert, and may increment by 1 when we access items and empty a frequency bucket. + wrong_approach: "Min-heap to find lowest frequency" + correct_approach: "Track min_freq integer, only increments or resets to 1" + + - title: Not Handling the Tie-Breaker Correctly + description: | + When multiple keys have the same frequency, the **least recently used** among them should be evicted. Within each frequency bucket, you need an ordered structure. + + Using a set or unordered collection loses recency information. A doubly-linked list with newest items at the head and oldest at the tail provides O(1) access to the LRU item for eviction. + wrong_approach: "Set or unordered collection for frequency groups" + correct_approach: "Doubly-linked list with head=MRU, tail=LRU" + + - title: Forgetting to Update min_freq on Access + description: | + When you access an item and increment its frequency, you might empty its old frequency bucket. If that bucket was the minimum, you need to update `min_freq`. + + For example, if `min_freq=2` and the only item with frequency 2 gets accessed (now frequency 3), `min_freq` should become 3. Forgetting this leads to evicting items from empty buckets. + wrong_approach: "Only update min_freq on eviction" + correct_approach: "Check if old frequency bucket is empty after access" + + - title: Zero Capacity Edge Case + description: | + The constraints allow `capacity >= 1`, but some implementations forget to handle the boundary. With capacity 0, all `put` operations should be no-ops and all `get` operations should return -1. + + Always check `if capacity == 0` at the start of `put`. + + key_takeaways: + - "**Compound data structures**: Complex cache problems often require combining multiple data structures (hash maps + linked lists) to achieve O(1) for different operations" + - "**Frequency bucketing**: Grouping items by frequency and tracking the minimum avoids expensive searches" + - "**Doubly-linked lists for O(1) removal**: When you need to remove items from the middle of a sequence in O(1), doubly-linked lists are the answer" + - "**LFU vs LRU**: LRU only tracks recency; LFU tracks frequency with recency as tie-breaker. LFU is more complex but can be more cache-efficient for certain access patterns" + + time_complexity: "O(1) for both `get` and `put` operations. Hash map lookups, linked list insertions/deletions, and frequency updates are all constant time." + space_complexity: "O(capacity). We store at most `capacity` items, each with constant overhead for hash map entries and list nodes." + +solutions: + - approach_name: Two Hash Maps with Doubly-Linked Lists + is_optimal: true + code: | + class Node: + """Doubly-linked list node storing key, value, and frequency.""" + def __init__(self, key: int, value: int): + self.key = key + self.value = value + self.freq = 1 # New items start with frequency 1 + self.prev = None + self.next = None + + + class DoublyLinkedList: + """Doubly-linked list with sentinel nodes for O(1) operations.""" + def __init__(self): + # Sentinel nodes simplify edge cases + self.head = Node(0, 0) # Dummy head (MRU side) + self.tail = Node(0, 0) # Dummy tail (LRU side) + self.head.next = self.tail + self.tail.prev = self.head + self.size = 0 + + def add_first(self, node: Node) -> None: + """Add node right after head (most recently used position).""" + node.next = self.head.next + node.prev = self.head + self.head.next.prev = node + self.head.next = node + self.size += 1 + + def remove(self, node: Node) -> None: + """Remove a node from anywhere in the list in O(1).""" + node.prev.next = node.next + node.next.prev = node.prev + self.size -= 1 + + def remove_last(self) -> Node: + """Remove and return the tail node (least recently used).""" + if self.size == 0: + return None + last = self.tail.prev + self.remove(last) + return last + + def is_empty(self) -> bool: + return self.size == 0 + + + class LFUCache: + def __init__(self, capacity: int): + self.capacity = capacity + self.size = 0 + self.min_freq = 0 + # Maps key -> Node + self.key_to_node: dict[int, Node] = {} + # Maps frequency -> DoublyLinkedList of nodes with that frequency + self.freq_to_list: dict[int, DoublyLinkedList] = {} + + def _update_freq(self, node: Node) -> None: + """Move node from current frequency bucket to next frequency bucket.""" + freq = node.freq + # Remove from current frequency list + self.freq_to_list[freq].remove(node) + + # If this was the min frequency list and it's now empty, increment min_freq + if freq == self.min_freq and self.freq_to_list[freq].is_empty(): + self.min_freq += 1 + + # Increment frequency and add to new list + node.freq += 1 + if node.freq not in self.freq_to_list: + self.freq_to_list[node.freq] = DoublyLinkedList() + self.freq_to_list[node.freq].add_first(node) + + def get(self, key: int) -> int: + if key not in self.key_to_node: + return -1 + + node = self.key_to_node[key] + # Update frequency (this also marks it as most recently used) + self._update_freq(node) + return node.value + + def put(self, key: int, value: int) -> None: + if self.capacity == 0: + return + + if key in self.key_to_node: + # Key exists: update value and frequency + node = self.key_to_node[key] + node.value = value + self._update_freq(node) + else: + # New key: check if we need to evict + if self.size >= self.capacity: + # Evict LFU (and LRU among ties) + lfu_list = self.freq_to_list[self.min_freq] + evicted = lfu_list.remove_last() + del self.key_to_node[evicted.key] + self.size -= 1 + + # Insert new node with frequency 1 + new_node = Node(key, value) + self.key_to_node[key] = new_node + if 1 not in self.freq_to_list: + self.freq_to_list[1] = DoublyLinkedList() + self.freq_to_list[1].add_first(new_node) + self.min_freq = 1 # New items always have the minimum frequency + self.size += 1 + explanation: | + **Time Complexity:** O(1) for both `get` and `put`. + + - Hash map lookups: O(1) + - Doubly-linked list add/remove: O(1) + - Frequency bucket access: O(1) + + **Space Complexity:** O(capacity). + + We maintain at most `capacity` nodes, each stored once in `key_to_node` and once in a frequency list. The number of frequency buckets is bounded by the number of operations, but nodes are shared references. + + - approach_name: OrderedDict per Frequency (Python-Specific) + is_optimal: true + code: | + from collections import OrderedDict, defaultdict + + + class LFUCache: + def __init__(self, capacity: int): + self.capacity = capacity + self.min_freq = 0 + # Maps key -> (value, frequency) + self.key_to_val_freq: dict[int, tuple[int, int]] = {} + # Maps frequency -> OrderedDict of keys (maintains insertion order) + # OrderedDict gives us O(1) move_to_end and popitem + self.freq_to_keys: dict[int, OrderedDict] = defaultdict(OrderedDict) + + def _update_freq(self, key: int) -> None: + """Increment frequency of key and move to appropriate bucket.""" + value, freq = self.key_to_val_freq[key] + + # Remove from current frequency bucket + del self.freq_to_keys[freq][key] + + # Update min_freq if we emptied the minimum bucket + if not self.freq_to_keys[freq] and freq == self.min_freq: + self.min_freq += 1 + + # Add to next frequency bucket + new_freq = freq + 1 + self.freq_to_keys[new_freq][key] = None # Value doesn't matter + self.key_to_val_freq[key] = (value, new_freq) + + def get(self, key: int) -> int: + if key not in self.key_to_val_freq: + return -1 + + self._update_freq(key) + return self.key_to_val_freq[key][0] + + def put(self, key: int, value: int) -> None: + if self.capacity == 0: + return + + if key in self.key_to_val_freq: + # Update existing key + _, freq = self.key_to_val_freq[key] + self.key_to_val_freq[key] = (value, freq) + self._update_freq(key) + else: + # Evict if at capacity + if len(self.key_to_val_freq) >= self.capacity: + # popitem(last=False) removes oldest (LRU) from min freq bucket + evicted_key, _ = self.freq_to_keys[self.min_freq].popitem(last=False) + del self.key_to_val_freq[evicted_key] + + # Insert new key with frequency 1 + self.key_to_val_freq[key] = (value, 1) + self.freq_to_keys[1][key] = None + self.min_freq = 1 + explanation: | + **Time Complexity:** O(1) average for both `get` and `put`. + + Python's `OrderedDict` maintains insertion order and provides O(1) `popitem()` and `move_to_end()`. We use it as a pseudo-linked-list where order represents recency. + + **Space Complexity:** O(capacity). + + This approach is more Pythonic and concise, leveraging built-in data structures. The trade-off is that it's language-specific and relies on Python's `OrderedDict` implementation details. diff --git a/backend/data/questions/linked-list-cycle.yaml b/backend/data/questions/linked-list-cycle.yaml new file mode 100644 index 0000000..4fc5d78 --- /dev/null +++ b/backend/data/questions/linked-list-cycle.yaml @@ -0,0 +1,190 @@ +title: Linked List Cycle +slug: linked-list-cycle +difficulty: easy +leetcode_id: 141 +leetcode_url: https://leetcode.com/problems/linked-list-cycle/ +categories: + - linked-lists + - two-pointers + - hash-tables +patterns: + - fast-slow-pointers + +description: | + Given `head`, the head of a linked list, determine if the linked list has a cycle in it. + + There is a cycle in a linked list if there is some node in the list that can be reached again by continuously following the `next` pointer. Internally, `pos` is used to denote the index of the node that tail's `next` pointer is connected to. **Note that `pos` is not passed as a parameter**. + + Return `true` *if there is a cycle in the linked list*. Otherwise, return `false`. + +constraints: | + - `0 <= number of nodes <= 10^4` + - `-10^5 <= Node.val <= 10^5` + - `pos` is `-1` or a valid index in the linked list + +examples: + - input: "head = [3,2,0,-4], pos = 1" + output: "true" + explanation: "There is a cycle in the linked list, where the tail connects to the 1st node (0-indexed)." + - input: "head = [1,2], pos = 0" + output: "true" + explanation: "There is a cycle in the linked list, where the tail connects to the 0th node." + - input: "head = [1], pos = -1" + output: "false" + explanation: "There is no cycle in the linked list." + +explanation: + intuition: | + Imagine two runners on a circular track: one runs twice as fast as the other. + + If the track is truly circular (has a cycle), the fast runner will eventually "lap" the slow runner and they'll meet. If the track has an end (no cycle), the fast runner will simply reach the finish line without ever meeting the slow runner again. + + This is the core insight behind **Floyd's Cycle Detection Algorithm** (also called the "tortoise and hare" algorithm). We use two pointers moving at different speeds: + - The **slow pointer** moves one step at a time + - The **fast pointer** moves two steps at a time + + If there's a cycle, the fast pointer will eventually catch up to the slow pointer from behind (they'll meet inside the cycle). If there's no cycle, the fast pointer will reach `null` and we know the list terminates. + + Why does this work? Once both pointers enter the cycle, the fast pointer gains one node on the slow pointer with each iteration. Since the cycle has finite length, they're guaranteed to meet. + + approach: | + We solve this using **Floyd's Cycle Detection (Fast-Slow Pointers)**: + + **Step 1: Handle edge cases** + + - If the list is empty (`head` is `null`) or has only one node with no cycle, return `false` + +   + + **Step 2: Initialise two pointers** + + - `slow`: Starts at `head`, moves one node at a time + - `fast`: Starts at `head`, moves two nodes at a time + +   + + **Step 3: Traverse the list** + + - While `fast` and `fast.next` are not `null`: + - Move `slow` forward by one: `slow = slow.next` + - Move `fast` forward by two: `fast = fast.next.next` + - If `slow == fast`, we've found a cycle — return `true` + +   + + **Step 4: Return the result** + + - If the loop exits (fast reached `null`), there's no cycle — return `false` + +   + + The beauty of this approach is that it uses constant space while guaranteeing detection if a cycle exists. + + common_pitfalls: + - title: Using Extra Space with Hash Set + description: | + A straightforward approach is to use a hash set to track visited nodes: + - Traverse the list, adding each node to a set + - If you encounter a node already in the set, there's a cycle + + This works correctly with **O(n) time**, but uses **O(n) space** for the hash set. The follow-up asks for O(1) space, which the fast-slow pointer approach achieves. + wrong_approach: "Hash set to track visited nodes (O(n) space)" + correct_approach: "Fast-slow pointers (O(1) space)" + + - title: Checking Node Values Instead of References + description: | + A common mistake is comparing `slow.val == fast.val` instead of `slow == fast`. + + Node *values* can be duplicated (the constraint allows values from `-10^5` to `10^5`), but node *references* (memory addresses) are unique. Two different nodes might have the same value, so comparing values could give false positives. + + Always compare the node references themselves, not their values. + wrong_approach: "Comparing slow.val == fast.val" + correct_approach: "Comparing slow == fast (reference equality)" + + - title: Null Pointer Exceptions + description: | + When moving the fast pointer two steps, you must check both `fast` and `fast.next` before accessing `fast.next.next`. + + If `fast` is `null`, accessing `fast.next` throws an error. If `fast.next` is `null`, accessing `fast.next.next` throws an error. + + The loop condition `while fast and fast.next` ensures both checks are satisfied before moving. + wrong_approach: "Moving fast without null checks" + correct_approach: "Check fast and fast.next before moving" + + key_takeaways: + - "**Floyd's algorithm**: The fast-slow pointer technique detects cycles in O(n) time and O(1) space — a fundamental pattern for linked list problems" + - "**Why they meet**: In a cycle, the fast pointer gains one position per iteration on the slow pointer, guaranteeing they meet within one cycle length" + - "**Reference vs value**: Always compare node references, not values, when checking for the same node" + - "**Foundation for harder problems**: This same technique extends to finding the cycle start point (LeetCode 142) and finding the middle of a linked list" + + time_complexity: "O(n). In the worst case, both pointers traverse the entire list. If there's a cycle, they meet within O(n) steps." + space_complexity: "O(1). We only use two pointer variables (`slow` and `fast`), regardless of the list size." + +solutions: + - approach_name: Floyd's Cycle Detection (Fast-Slow Pointers) + is_optimal: true + code: | + class ListNode: + def __init__(self, val: int = 0, next: 'ListNode | None' = None): + self.val = val + self.next = next + + def has_cycle(head: ListNode | None) -> bool: + # Handle empty list + if not head: + return False + + # Initialise slow and fast pointers + slow = head + fast = head + + # Traverse until fast reaches the end + while fast and fast.next: + # Move slow one step + slow = slow.next + # Move fast two steps + fast = fast.next.next + + # If they meet, there's a cycle + if slow == fast: + return True + + # Fast reached the end — no cycle + return False + explanation: | + **Time Complexity:** O(n) — Each node is visited at most twice (once by slow, potentially twice by fast). + + **Space Complexity:** O(1) — Only two pointer variables are used. + + The fast pointer moves twice as fast as the slow pointer. If there's a cycle, the fast pointer will eventually catch up to the slow pointer inside the cycle. If there's no cycle, the fast pointer reaches the end. + + - approach_name: Hash Set + is_optimal: false + code: | + class ListNode: + def __init__(self, val: int = 0, next: 'ListNode | None' = None): + self.val = val + self.next = next + + def has_cycle(head: ListNode | None) -> bool: + # Track visited nodes + visited: set[ListNode] = set() + + current = head + while current: + # If we've seen this node before, there's a cycle + if current in visited: + return True + + # Mark this node as visited + visited.add(current) + current = current.next + + # Reached the end — no cycle + return False + explanation: | + **Time Complexity:** O(n) — We traverse each node once. + + **Space Complexity:** O(n) — We store up to n node references in the hash set. + + This approach is intuitive: track every node you visit, and if you see the same node twice, there's a cycle. While correct, it uses extra space that the fast-slow pointer approach avoids. diff --git a/backend/data/questions/longest-common-prefix.yaml b/backend/data/questions/longest-common-prefix.yaml new file mode 100644 index 0000000..9632ae6 --- /dev/null +++ b/backend/data/questions/longest-common-prefix.yaml @@ -0,0 +1,226 @@ +title: Longest Common Prefix +slug: longest-common-prefix +difficulty: easy +leetcode_id: 14 +leetcode_url: https://leetcode.com/problems/longest-common-prefix/ +categories: + - strings + - arrays +patterns: + - two-pointers + +function_signature: "def longest_common_prefix(strs: list[str]) -> str:" + +test_cases: + visible: + - input: { strs: ["flower", "flow", "flight"] } + expected: "fl" + - input: { strs: ["dog", "racecar", "car"] } + expected: "" + hidden: + - input: { strs: ["a"] } + expected: "a" + - input: { strs: ["", "b"] } + expected: "" + - input: { strs: ["abc", "abc", "abc"] } + expected: "abc" + - input: { strs: ["ab", "a"] } + expected: "a" + - input: { strs: ["cir", "car"] } + expected: "c" + +description: | + Write a function to find the longest common prefix string amongst an array of strings. + + If there is no common prefix, return an empty string `""`. + +constraints: | + - `1 <= strs.length <= 200` + - `0 <= strs[i].length <= 200` + - `strs[i]` consists of only lowercase English letters if it is non-empty. + +examples: + - input: 'strs = ["flower","flow","flight"]' + output: '"fl"' + explanation: "The first two characters 'f' and 'l' are common to all three strings." + - input: 'strs = ["dog","racecar","car"]' + output: '""' + explanation: "There is no common prefix among the input strings." + +explanation: + intuition: | + Imagine you have a stack of papers, each with a word written on it. You want to find how many letters at the start of each word are exactly the same across all papers. + + Think of it like aligning all the words vertically by their first character: + ``` + f l o w e r + f l o w + f l i g h t + ``` + + You scan column by column from left to right. As long as every word has the same character in that column, you include it in your prefix. The moment you find a mismatch (like 'o' vs 'i' in column 3 above), you stop — everything before that point is your longest common prefix. + + The key insight is that the common prefix can only be as long as the **shortest string** in the array, and we can stop as soon as any character differs. + + approach: | + We solve this using a **Vertical Scanning** approach: + + **Step 1: Handle edge case** + + - If the input array is empty, return an empty string `""` + +   + + **Step 2: Iterate character by character** + + - Use the first string as a reference + - For each character position `i` in the first string, compare it against the character at position `i` in every other string + +   + + **Step 3: Check for mismatches or end of string** + + - If any string is shorter than position `i`, we've reached the end of that string — return the prefix found so far + - If any string has a different character at position `i`, we've found a mismatch — return the prefix found so far + +   + + **Step 4: Build the prefix** + + - If all strings match at position `i`, continue to the next position + - After checking all positions in the first string, return it entirely (it's the common prefix) + +   + + This approach efficiently scans vertically through all strings simultaneously, stopping at the first point of divergence. + + common_pitfalls: + - title: Forgetting the Empty Array Case + description: | + If the input array is empty, there are no strings to compare. Attempting to access `strs[0]` will cause an index error. + + Always check for an empty array first and return `""` immediately. + wrong_approach: "Directly accessing strs[0] without checking array length" + correct_approach: "Check if strs is empty before processing" + + - title: Index Out of Bounds on Shorter Strings + description: | + When comparing character by character, some strings may be shorter than others. For example, with `["ab", "a"]`, checking index 1 on the second string causes an error. + + Always verify that the current index is within bounds for each string before accessing it: `if i >= len(strs[j])`. + wrong_approach: "Accessing strs[j][i] without checking length" + correct_approach: "Check i < len(strs[j]) before accessing the character" + + - title: Using the Horizontal Scanning Inefficiently + description: | + A horizontal approach compares strings pairwise: find the common prefix of strings 1 and 2, then compare that result with string 3, and so on. + + While correct, this can be less efficient in practice. If the first two strings share a long prefix but string 3 is very different, you've done unnecessary work. Vertical scanning stops at the first column with a mismatch across all strings. + wrong_approach: "Pairwise comparison accumulating prefixes" + correct_approach: "Vertical scanning comparing all strings at each position" + + key_takeaways: + - "**Vertical scanning pattern**: When comparing multiple sequences, scanning position-by-position across all sequences simultaneously can be more efficient than pairwise comparison" + - "**Early termination**: Stop as soon as you find a mismatch or reach the end of any string — no need to process further" + - "**Use the shortest string**: The common prefix can never be longer than the shortest string, so checking bounds is essential" + - "**Foundation for string problems**: This pattern of character-by-character comparison appears in many string matching problems" + + time_complexity: "O(S), where S is the sum of all characters in all strings. In the worst case, all strings are identical and we compare every character." + space_complexity: "O(1). We only use a few variables for iteration, not counting the output string." + +solutions: + - approach_name: Vertical Scanning + is_optimal: true + code: | + def longest_common_prefix(strs: list[str]) -> str: + # Handle empty input + if not strs: + return "" + + # Use the first string as reference + for i in range(len(strs[0])): + char = strs[0][i] + + # Compare this character with all other strings + for j in range(1, len(strs)): + # Check if we've reached the end of this string + # or if the characters don't match + if i >= len(strs[j]) or strs[j][i] != char: + # Return prefix up to (but not including) position i + return strs[0][:i] + + # All characters in first string matched all other strings + return strs[0] + explanation: | + **Time Complexity:** O(S) — where S is the sum of all characters in all strings. We compare each character at most once. + + **Space Complexity:** O(1) — only using index variables, not counting the output. + + We scan vertically through all strings at each character position. The moment we find any mismatch or reach the end of any string, we return what we've found so far. + + - approach_name: Horizontal Scanning + is_optimal: false + code: | + def longest_common_prefix(strs: list[str]) -> str: + # Handle empty input + if not strs: + return "" + + # Start with the first string as the initial prefix + prefix = strs[0] + + # Compare prefix with each subsequent string + for i in range(1, len(strs)): + # Shrink prefix until it matches the start of current string + while not strs[i].startswith(prefix): + # Remove last character from prefix + prefix = prefix[:-1] + # No common prefix exists + if not prefix: + return "" + + return prefix + explanation: | + **Time Complexity:** O(S) — where S is the sum of all characters. In the worst case, we compare all characters. + + **Space Complexity:** O(1) — only storing the prefix reference. + + This approach starts with the first string as the candidate prefix and progressively shortens it until it matches the beginning of each subsequent string. While correct, it may do more work than vertical scanning when early strings share a long prefix but later strings diverge early. + + - approach_name: Binary Search + is_optimal: false + code: | + def longest_common_prefix(strs: list[str]) -> str: + # Handle empty input + if not strs: + return "" + + def is_common_prefix(length: int) -> bool: + """Check if first 'length' chars of strs[0] is a prefix of all strings.""" + prefix = strs[0][:length] + return all(s.startswith(prefix) for s in strs) + + # Find the minimum string length + min_len = min(len(s) for s in strs) + + # Binary search for the longest valid prefix length + low, high = 0, min_len + + while low < high: + # Use upper middle to avoid infinite loop + mid = (low + high + 1) // 2 + + if is_common_prefix(mid): + # Prefix of this length works, try longer + low = mid + else: + # Prefix too long, try shorter + high = mid - 1 + + return strs[0][:low] + explanation: | + **Time Complexity:** O(S * log(m)) — where S is the sum of all characters and m is the minimum string length. Binary search runs log(m) iterations, each checking all strings. + + **Space Complexity:** O(1) — only using variables for binary search. + + This approach uses binary search on the length of the prefix. While theoretically interesting, it's generally slower in practice than vertical scanning because it may repeatedly check the same characters. Included to demonstrate how binary search can apply to string problems. diff --git a/backend/data/questions/longest-common-subsequence.yaml b/backend/data/questions/longest-common-subsequence.yaml new file mode 100644 index 0000000..0b4291f --- /dev/null +++ b/backend/data/questions/longest-common-subsequence.yaml @@ -0,0 +1,208 @@ +title: Longest Common Subsequence +slug: longest-common-subsequence +difficulty: medium +leetcode_id: 1143 +leetcode_url: https://leetcode.com/problems/longest-common-subsequence/ +categories: + - strings + - dynamic-programming +patterns: + - dynamic-programming + +description: | + Given two strings `text1` and `text2`, return *the length of their longest **common subsequence***. If there is no **common subsequence**, return `0`. + + A **subsequence** of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters. + + For example, `"ace"` is a subsequence of `"abcde"`. + + A **common subsequence** of two strings is a subsequence that is common to both strings. + +constraints: | + - `1 <= text1.length, text2.length <= 1000` + - `text1` and `text2` consist of only lowercase English characters. + +examples: + - input: 'text1 = "abcde", text2 = "ace"' + output: "3" + explanation: 'The longest common subsequence is "ace" and its length is 3.' + - input: 'text1 = "abc", text2 = "abc"' + output: "3" + explanation: 'The longest common subsequence is "abc" and its length is 3.' + - input: 'text1 = "abc", text2 = "def"' + output: "0" + explanation: "There is no such common subsequence, so the result is 0." + +explanation: + intuition: | + Imagine you're comparing two sequences of characters, trying to find the longest chain of letters that appears in both — not necessarily consecutively, but in the same relative order. + + Think of it like comparing two playlists of songs. You want to find the longest sequence of songs that appears in both playlists, where the songs appear in the same order (though not necessarily back-to-back). You can't rearrange songs — you can only skip ones that don't match. + + The **key insight** is that this problem has **optimal substructure**: if we know the LCS of smaller prefixes of both strings, we can build up to the answer for the full strings. When characters match, we extend our subsequence; when they don't, we take the better result from either excluding the last character of the first string or the second. + + This is a classic **dynamic programming** problem because: + 1. We can break it into overlapping subproblems (comparing prefixes of different lengths) + 2. The solution to larger problems depends on solutions to smaller ones + 3. We can store intermediate results to avoid redundant computation + + approach: | + We solve this using a **2D Dynamic Programming** approach with a table where `dp[i][j]` represents the length of the LCS of `text1[0:i]` and `text2[0:j]`. + + **Step 1: Create the DP table** + + - Create a 2D array `dp` of size `(m+1) x (n+1)` where `m = len(text1)` and `n = len(text2)` + - The extra row and column handle the base case of empty prefixes + - Initialise all values to `0` (the LCS of any string with an empty string is `0`) + +   + + **Step 2: Fill the table using the recurrence relation** + + - Iterate through each cell `dp[i][j]` for `i` from `1` to `m` and `j` from `1` to `n` + - If `text1[i-1] == text2[j-1]`: the characters match, so `dp[i][j] = dp[i-1][j-1] + 1` + - Otherwise: take the maximum of excluding one character from either string: `dp[i][j] = max(dp[i-1][j], dp[i][j-1])` + +   + + **Step 3: Return the result** + + - The answer is in `dp[m][n]`, representing the LCS of the complete strings + +   + + The recurrence works because when characters match, we've found a common element and extend the LCS of the previous prefixes. When they don't match, we take the best LCS we can get by ignoring one character from either string. + + common_pitfalls: + - title: Confusing Subsequence with Substring + description: | + A **substring** must be contiguous (consecutive characters), while a **subsequence** allows gaps. + + For `"abcde"` and `"ace"`: + - The longest common **substring** is `"a"` or `"c"` or `"e"` (length 1) + - The longest common **subsequence** is `"ace"` (length 3) + + Using a substring algorithm (like checking all contiguous windows) will give the wrong answer. LCS requires dynamic programming because we need to track non-contiguous matches. + wrong_approach: "Sliding window for contiguous matches" + correct_approach: "2D DP tracking all prefix combinations" + + - title: The Brute Force Exponential Trap + description: | + A naive approach might try all possible subsequences of one string and check if each exists in the other. + + For a string of length `n`, there are `2^n` possible subsequences. With constraints up to `1000` characters, `2^1000` operations is astronomically impossible. + + Even with recursion and memoisation, without proper caching you'll recompute the same subproblems many times. The DP table ensures each subproblem is solved exactly once. + wrong_approach: "Generate all subsequences and check membership" + correct_approach: "Bottom-up DP with O(m*n) time" + + - title: Off-by-One Index Errors + description: | + The DP table has dimensions `(m+1) x (n+1)` to include the empty prefix base case. + + When comparing characters, use `text1[i-1]` and `text2[j-1]` (not `text1[i]` and `text2[j]`) because `dp[i][j]` represents prefixes of length `i` and `j`. + + A common mistake is using `text1[i]` which causes an index out of bounds error or compares the wrong characters. + wrong_approach: "Compare text1[i] with text2[j] directly" + correct_approach: "Compare text1[i-1] with text2[j-1] when filling dp[i][j]" + + key_takeaways: + - "**Classic DP problem**: LCS is a foundational dynamic programming problem that appears in many variations (edit distance, diff algorithms, DNA sequence alignment)" + - "**2D table pattern**: When comparing two sequences, a 2D DP table where `dp[i][j]` represents the answer for prefixes of length `i` and `j` is a common technique" + - "**Optimal substructure**: Match = extend previous result by 1; no match = take the best of two subproblems" + - "**Space optimisation possible**: Since each row only depends on the previous row, you can reduce space from O(m*n) to O(min(m,n)) using rolling arrays" + + time_complexity: "O(m * n). We fill each cell of the `m x n` DP table exactly once, where `m` and `n` are the lengths of the two strings." + space_complexity: "O(m * n). We use a 2D array of size `(m+1) x (n+1)` to store intermediate results. This can be optimised to O(min(m, n)) using a rolling array since we only need the previous row." + +solutions: + - approach_name: 2D Dynamic Programming + is_optimal: true + code: | + def longest_common_subsequence(text1: str, text2: str) -> int: + m, n = len(text1), len(text2) + + # Create DP table with extra row/col for empty string base case + # dp[i][j] = LCS length of text1[0:i] and text2[0:j] + dp = [[0] * (n + 1) for _ in range(m + 1)] + + # Fill the table row by row + for i in range(1, m + 1): + for j in range(1, n + 1): + if text1[i - 1] == text2[j - 1]: + # Characters match: extend LCS from diagonal + dp[i][j] = dp[i - 1][j - 1] + 1 + else: + # No match: take best of excluding one char from either string + dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]) + + # Answer is LCS of complete strings + return dp[m][n] + explanation: | + **Time Complexity:** O(m * n) — We iterate through every cell in the DP table once. + + **Space Complexity:** O(m * n) — We store the full 2D DP table. + + This bottom-up approach builds the solution systematically. Each cell depends only on already-computed cells (top, left, and diagonal), so we fill row by row. The final cell contains the answer for the complete strings. + + - approach_name: Space-Optimised DP (Rolling Array) + is_optimal: true + code: | + def longest_common_subsequence(text1: str, text2: str) -> int: + # Ensure text2 is the shorter string to minimise space + if len(text1) < len(text2): + text1, text2 = text2, text1 + + m, n = len(text1), len(text2) + + # Only keep two rows: previous and current + prev = [0] * (n + 1) + curr = [0] * (n + 1) + + for i in range(1, m + 1): + for j in range(1, n + 1): + if text1[i - 1] == text2[j - 1]: + # Match: extend from diagonal (prev row, prev column) + curr[j] = prev[j - 1] + 1 + else: + # No match: best of top (prev[j]) or left (curr[j-1]) + curr[j] = max(prev[j], curr[j - 1]) + + # Roll the arrays: current becomes previous for next iteration + prev, curr = curr, prev + + # Answer is in prev (after the swap) + return prev[n] + explanation: | + **Time Complexity:** O(m * n) — Same iteration as the 2D approach. + + **Space Complexity:** O(min(m, n)) — Only two arrays of length `n+1` are used. + + Since each row only depends on the immediately previous row, we can discard older rows. By swapping `prev` and `curr` after each row, we maintain a "rolling window" of just two rows. We also swap strings if needed to ensure we use the shorter length for our arrays. + + - approach_name: Recursive with Memoisation + is_optimal: false + code: | + def longest_common_subsequence(text1: str, text2: str) -> int: + from functools import lru_cache + + @lru_cache(maxsize=None) + def lcs(i: int, j: int) -> int: + # Base case: empty prefix + if i == 0 or j == 0: + return 0 + + # Characters match: include in LCS + if text1[i - 1] == text2[j - 1]: + return lcs(i - 1, j - 1) + 1 + + # No match: try excluding from each string + return max(lcs(i - 1, j), lcs(i, j - 1)) + + return lcs(len(text1), len(text2)) + explanation: | + **Time Complexity:** O(m * n) — Each unique `(i, j)` state is computed once due to memoisation. + + **Space Complexity:** O(m * n) — For the memoisation cache, plus O(m + n) for the recursion stack. + + This top-down approach is more intuitive but uses more memory due to the recursion stack. It's useful for understanding the problem structure but the iterative DP solution is generally preferred in interviews for its predictable space usage and no risk of stack overflow. diff --git a/backend/data/questions/longest-consecutive-sequence.yaml b/backend/data/questions/longest-consecutive-sequence.yaml new file mode 100644 index 0000000..5766c46 --- /dev/null +++ b/backend/data/questions/longest-consecutive-sequence.yaml @@ -0,0 +1,231 @@ +title: Longest Consecutive Sequence +slug: longest-consecutive-sequence +difficulty: medium +leetcode_id: 128 +leetcode_url: https://leetcode.com/problems/longest-consecutive-sequence/ +categories: + - arrays + - hash-tables +patterns: + - union-find + +description: | + Given an unsorted array of integers `nums`, return *the length of the longest consecutive elements sequence*. + + You must write an algorithm that runs in `O(n)` time. + +constraints: | + - `0 <= nums.length <= 10^5` + - `-10^9 <= nums[i] <= 10^9` + +examples: + - input: "nums = [100, 4, 200, 1, 3, 2]" + output: "4" + explanation: "The longest consecutive elements sequence is [1, 2, 3, 4]. Therefore its length is 4." + - input: "nums = [0, 3, 7, 2, 5, 8, 4, 6, 0, 1]" + output: "9" + explanation: "The longest consecutive elements sequence is [0, 1, 2, 3, 4, 5, 6, 7, 8]. Therefore its length is 9." + - input: "nums = [1, 0, 1, 2]" + output: "3" + explanation: "The longest consecutive elements sequence is [0, 1, 2]. Therefore its length is 3." + +explanation: + intuition: | + Imagine you have a collection of scattered puzzle pieces, each with a number on it. Your goal is to find the **longest chain** where each piece connects to the next (consecutive numbers). The naive approach would be to pick up each piece and search through all other pieces for its neighbour — but that's slow. + + The key insight is this: **a consecutive sequence always has a starting point** — a number that has no predecessor (`num - 1` doesn't exist in the array). If we can identify these starting points efficiently, we can then count forward from each one to find the sequence length. + + Think of it like this: instead of blindly searching, we first dump all the puzzle pieces into a bag (a hash set) for O(1) lookups. Then, for each piece, we ask: "Is there a piece with `num - 1`?" If not, this piece is the **start of a potential sequence**. We then count forward: does `num + 1` exist? Does `num + 2` exist? And so on. + + By only counting forward from sequence starts, we ensure each number is visited at most twice (once when added to the set, once when counted in a sequence), giving us O(n) time. + + approach: | + We solve this using a **Hash Set Approach**: + + **Step 1: Handle edge cases** + + - If the array is empty, return `0` + +   + + **Step 2: Build a hash set** + + - Convert the array to a set for O(1) lookups + - This also automatically handles duplicates + +   + + **Step 3: Find sequence starting points** + + - Iterate through each number in the set + - A number is a sequence start if `num - 1` is NOT in the set + - This ensures we only start counting from the beginning of each sequence + +   + + **Step 4: Count consecutive elements** + + - For each starting point, count how many consecutive numbers exist + - Keep checking if `num + 1`, `num + 2`, etc. are in the set + - Track the maximum sequence length found + +   + + **Step 5: Return the result** + + - Return the longest sequence length found + +   + + This approach is efficient because each number is processed at most twice: once to check if it's a starting point, and once when counting a sequence. + + common_pitfalls: + - title: The Sorting Trap + description: | + A natural first instinct is to sort the array and then scan for consecutive elements. While this works correctly, sorting takes **O(n log n)** time. + + The problem explicitly requires O(n) time complexity, so a sorting-based solution would fail this requirement. The hash set approach achieves true O(n) by trading time for space. + wrong_approach: "Sort then scan for consecutive elements" + correct_approach: "Use a hash set for O(1) lookups" + + - title: Counting From Every Number + description: | + If you try to count the sequence length starting from every number in the array, you'll get O(n²) time complexity in the worst case. + + For example, with `nums = [1, 2, 3, 4, 5]`, starting from `5` counts 1 element, from `4` counts 2, from `3` counts 3, and so on — leading to 1 + 2 + 3 + 4 + 5 = O(n²) total work. + + The fix is to **only count from sequence starting points** (numbers where `num - 1` doesn't exist). This ensures each element is counted exactly once across all sequences. + wrong_approach: "Count sequence length from every element" + correct_approach: "Only count from elements where num - 1 is not in the set" + + - title: Not Handling Duplicates + description: | + The array may contain duplicate values (e.g., `[1, 0, 1, 2]`). If you iterate over the original array instead of the set, you might count the same sequence multiple times or get incorrect lengths. + + Using a set automatically deduplicates the input, ensuring each unique number is processed only once. + + key_takeaways: + - "**Hash set for O(1) lookups**: When you need to check membership repeatedly, convert to a set first" + - "**Identify sequence boundaries**: Only start counting from elements that begin a sequence (`num - 1` not present)" + - "**Each element visited once**: Smart iteration ensures O(n) despite nested-looking loops" + - "**Space-time tradeoff**: We use O(n) space to achieve O(n) time instead of O(n log n)" + + time_complexity: "O(n). Each number is visited at most twice — once when checking if it's a sequence start, and once when counting forward from a starting point." + space_complexity: "O(n). We store all unique elements in a hash set." + +solutions: + - approach_name: Hash Set + is_optimal: true + code: | + def longest_consecutive(nums: list[int]) -> int: + if not nums: + return 0 + + # Build a set for O(1) lookups + num_set = set(nums) + longest = 0 + + for num in num_set: + # Only start counting if this is the beginning of a sequence + # (i.e., num - 1 is not in the set) + if num - 1 not in num_set: + current_num = num + current_length = 1 + + # Count consecutive numbers + while current_num + 1 in num_set: + current_num += 1 + current_length += 1 + + # Update the longest sequence found + longest = max(longest, current_length) + + return longest + explanation: | + **Time Complexity:** O(n) — Each number is processed at most twice. + + **Space Complexity:** O(n) — Hash set stores all unique elements. + + The key optimisation is only counting from sequence starting points. When we find a number where `num - 1` doesn't exist, we know it's the start of a new sequence and count forward from there. + + - approach_name: Sorting + is_optimal: false + code: | + def longest_consecutive(nums: list[int]) -> int: + if not nums: + return 0 + + # Sort the array + nums.sort() + + longest = 1 + current_length = 1 + + for i in range(1, len(nums)): + # Skip duplicates + if nums[i] == nums[i - 1]: + continue + + # Check if consecutive + if nums[i] == nums[i - 1] + 1: + current_length += 1 + else: + # Sequence broken, start fresh + longest = max(longest, current_length) + current_length = 1 + + return max(longest, current_length) + explanation: | + **Time Complexity:** O(n log n) — Dominated by the sorting step. + + **Space Complexity:** O(1) or O(n) — Depends on the sorting algorithm used. + + This approach sorts the array first, then scans linearly to find consecutive sequences. While simpler to understand, it doesn't meet the O(n) time requirement specified in the problem. Included here to illustrate the tradeoff between simplicity and optimal complexity. + + - approach_name: Union-Find + is_optimal: false + code: | + def longest_consecutive(nums: list[int]) -> int: + if not nums: + return 0 + + # Map each number to its index + num_to_idx = {} + for i, num in enumerate(nums): + if num not in num_to_idx: + num_to_idx[num] = num # Each number is its own parent initially + + # Union-Find with path compression + def find(x): + if num_to_idx[x] != x: + num_to_idx[x] = find(num_to_idx[x]) + return num_to_idx[x] + + def union(x, y): + root_x, root_y = find(x), find(y) + if root_x != root_y: + # Always point to the larger number + if root_x < root_y: + num_to_idx[root_x] = root_y + else: + num_to_idx[root_y] = root_x + + # Union consecutive numbers + for num in num_to_idx: + if num + 1 in num_to_idx: + union(num, num + 1) + + # Count sequence lengths by finding the root of each number + # and measuring distance to root + longest = 0 + for num in num_to_idx: + root = find(num) + longest = max(longest, root - num + 1) + + return longest + explanation: | + **Time Complexity:** O(n × α(n)) ≈ O(n) — Where α is the inverse Ackermann function. + + **Space Complexity:** O(n) — Storage for the parent mapping. + + Union-Find groups consecutive numbers into the same set. While this is a valid O(n) approach, it's more complex than the hash set solution. The hash set approach is preferred for its simplicity and clarity. diff --git a/backend/data/questions/longest-happy-string.yaml b/backend/data/questions/longest-happy-string.yaml new file mode 100644 index 0000000..a3c4d41 --- /dev/null +++ b/backend/data/questions/longest-happy-string.yaml @@ -0,0 +1,237 @@ +title: Longest Happy String +slug: longest-happy-string +difficulty: medium +leetcode_id: 1405 +leetcode_url: https://leetcode.com/problems/longest-happy-string/ +categories: + - strings + - heap +patterns: + - greedy + - heap + +description: | + A string `s` is called **happy** if it satisfies the following conditions: + + - `s` only contains the letters `'a'`, `'b'`, and `'c'`. + - `s` does not contain any of `"aaa"`, `"bbb"`, or `"ccc"` as a substring. + - `s` contains **at most** `a` occurrences of the letter `'a'`. + - `s` contains **at most** `b` occurrences of the letter `'b'`. + - `s` contains **at most** `c` occurrences of the letter `'c'`. + + Given three integers `a`, `b`, and `c`, return *the **longest possible happy** string*. If there are multiple longest happy strings, return *any of them*. If there is no such string, return *the empty string* `""`. + + A **substring** is a contiguous sequence of characters within a string. + +constraints: | + - `0 <= a, b, c <= 100` + - `a + b + c > 0` + +examples: + - input: "a = 1, b = 1, c = 7" + output: '"ccaccbcc"' + explanation: '"ccbccacc" would also be a correct answer.' + - input: "a = 7, b = 1, c = 0" + output: '"aabaa"' + explanation: "It is the only correct answer in this case." + +explanation: + intuition: | + Imagine you're filling a jar with coloured marbles (a, b, c), but you have a rule: **no more than two marbles of the same colour can sit adjacent to each other**. + + The key insight is that we should always **prioritise using the most abundant character** — but with a critical constraint. If the last two characters in our result are the same, we must pick a *different* character next, even if it's not the most abundant. + + Think of it like a balancing act: we want to "burn through" the character with the highest count as fast as possible (using it twice in a row when allowed), while using less frequent characters as "separators" to break up potential triplets. + + This greedy strategy works because: + 1. By always picking the most frequent valid character, we maximise the length of the result + 2. Using a character twice when possible (aa, bb, cc) is optimal — it depletes the larger counts faster + 3. When forced to use a less frequent character as a separator, we only use it once to minimise "waste" + + A **max-heap** naturally gives us the character with the highest remaining count at each step, making this approach efficient. + + approach: | + We solve this using a **Greedy approach with a Max-Heap**: + + **Step 1: Build the max-heap** + + - Create a max-heap containing tuples of `(count, character)` for each character with count > 0 + - Use negative counts in Python's `heapq` since it's a min-heap by default + +   + + **Step 2: Greedily build the string** + + - While the heap is not empty: + - Pop the character with the highest count + - Check the last two characters of the result + - **If the last two characters are the same as the popped character**: we cannot use it (would create "aaa", "bbb", or "ccc") + - Pop the next most frequent character instead + - Use it once, then push both back + - If no alternative exists, we're done + - **Otherwise**: use the most frequent character + - Use it twice if count >= 2 and it won't create a triplet + - Use it once otherwise + - Push it back if count > 0 + +   + + **Step 3: Return the result** + + - Return the built string + +   + + The greedy choice of always using the most frequent valid character ensures we build the longest possible happy string. + + common_pitfalls: + - title: Always Using the Most Frequent Without Checking + description: | + A naive greedy approach might always pick the most frequent character without checking if it would create a triplet. + + For example, with `a=2, b=2, c=1` and result so far `"aa"`, blindly picking 'a' again would create `"aaa"`. + + You must check the last two characters of the result before deciding which character to append. + wrong_approach: "Always append the most frequent character" + correct_approach: "Check last two characters first, switch to second-most-frequent if needed" + + - title: Using Single Characters When Doubles Are Safe + description: | + When building the string, if the most frequent character doesn't match the last two, we can safely append it **twice** (if count >= 2). + + Using only one character at a time when two are safe means we don't deplete the larger counts fast enough, potentially leaving characters unused. + + For example, with `c=7, a=1, b=1`: optimal is "ccaccbcc" (length 8), not "cacbccc" (length 7). + wrong_approach: "Always append just one character at a time" + correct_approach: "Append two characters when it's safe and count allows" + + - title: Not Handling the "No Valid Character" Case + description: | + When the last two characters are the same as the most frequent, and there's no second character available, the string is complete. + + Failing to handle this edge case can cause infinite loops or index errors. + + For example, with `a=3, b=0, c=0`, the answer is `"aa"` — we cannot use all three 'a's. + wrong_approach: "Assume there's always a valid character to append" + correct_approach: "Check if heap is empty after skipping the blocked character" + + key_takeaways: + - "**Greedy with constraints**: Always pick the locally optimal choice (most frequent), but respect the constraint (no triplets)" + - "**Max-heap for dynamic priorities**: When the 'best' option changes as you consume resources, a heap keeps priorities efficiently updated" + - "**Double usage optimisation**: When allowed, use the most frequent character twice to deplete large counts faster" + - "**Pattern recognition**: This problem combines greedy character selection with the 'reorganise string' pattern seen in problems like Task Scheduler" + + time_complexity: "O((a + b + c) * log 3) = O(n). Each character is pushed and popped from the heap at most once, and heap operations on 3 elements are O(log 3) = O(1)." + space_complexity: "O(a + b + c) = O(n). The result string stores up to `a + b + c` characters. The heap uses O(1) space since it contains at most 3 elements." + +solutions: + - approach_name: Greedy with Max-Heap + is_optimal: true + code: | + import heapq + + def longest_diverse_string(a: int, b: int, c: int) -> str: + # Max-heap: use negative counts for max-heap behaviour + heap = [] + if a > 0: + heapq.heappush(heap, (-a, 'a')) + if b > 0: + heapq.heappush(heap, (-b, 'b')) + if c > 0: + heapq.heappush(heap, (-c, 'c')) + + result = [] + + while heap: + # Get the most frequent character + count1, char1 = heapq.heappop(heap) + + # Check if last two chars are the same as char1 + if len(result) >= 2 and result[-1] == char1 and result[-2] == char1: + # Can't use char1, try the second most frequent + if not heap: + break # No alternative, we're done + + count2, char2 = heapq.heappop(heap) + result.append(char2) # Use only once as separator + count2 += 1 # Decrement (negative, so add 1) + + if count2 < 0: + heapq.heappush(heap, (count2, char2)) + # Push char1 back unchanged + heapq.heappush(heap, (count1, char1)) + else: + # Safe to use char1 — use twice if possible + if -count1 >= 2: + result.append(char1) + result.append(char1) + count1 += 2 + else: + result.append(char1) + count1 += 1 + + if count1 < 0: + heapq.heappush(heap, (count1, char1)) + + return ''.join(result) + explanation: | + **Time Complexity:** O(n) where n = a + b + c — Each character is used at most once, and heap operations on at most 3 elements are O(1). + + **Space Complexity:** O(n) — The result string can be up to length n. + + We greedily select the most frequent valid character at each step. When the most frequent would create a triplet, we use the second-most-frequent as a separator. Using characters twice when safe maximises output length. + + - approach_name: Greedy Without Heap + is_optimal: false + code: | + def longest_diverse_string(a: int, b: int, c: int) -> str: + result = [] + counts = [a, b, c] + chars = ['a', 'b', 'c'] + + while True: + # Find the character with max count that won't create triplet + # Sort indices by count descending + order = sorted(range(3), key=lambda i: -counts[i]) + + added = False + for i in order: + if counts[i] == 0: + continue + + # Check if this char would create a triplet + if (len(result) >= 2 and + result[-1] == chars[i] and + result[-2] == chars[i]): + continue + + # Safe to add this character + result.append(chars[i]) + counts[i] -= 1 + + # Try to add a second one if safe + if (counts[i] > 0 and + (len(result) < 2 or + result[-1] != chars[i] or + result[-2] != chars[i])): + # Check if adding another would still be safe + # (won't create triplet with what follows) + # We only add two if this char has the max count + # to deplete it faster + if i == order[0]: + result.append(chars[i]) + counts[i] -= 1 + + added = True + break + + if not added: + break + + return ''.join(result) + explanation: | + **Time Complexity:** O(n) — Each iteration adds 1-2 characters, and sorting 3 elements is O(1). + + **Space Complexity:** O(n) — The result string can be up to length n. + + This approach manually tracks counts and sorts to find the most frequent valid character. While functionally equivalent, it's less elegant than the heap solution and slightly harder to generalise to more characters. diff --git a/backend/data/questions/longest-increasing-path-in-a-matrix.yaml b/backend/data/questions/longest-increasing-path-in-a-matrix.yaml new file mode 100644 index 0000000..baaf8c9 --- /dev/null +++ b/backend/data/questions/longest-increasing-path-in-a-matrix.yaml @@ -0,0 +1,271 @@ +title: Longest Increasing Path in a Matrix +slug: longest-increasing-path-in-a-matrix +difficulty: hard +leetcode_id: 329 +leetcode_url: https://leetcode.com/problems/longest-increasing-path-in-a-matrix/ +categories: + - graphs + - dynamic-programming + - arrays +patterns: + - dfs + - dynamic-programming + - matrix-traversal + +description: | + Given an `m x n` integers `matrix`, return *the length of the longest increasing path in* `matrix`. + + From each cell, you can either move in four directions: left, right, up, or down. You **may not** move **diagonally** or move **outside the boundary** (i.e., wrap-around is not allowed). + +constraints: | + - `m == matrix.length` + - `n == matrix[i].length` + - `1 <= m, n <= 200` + - `0 <= matrix[i][j] <= 2^31 - 1` + +examples: + - input: "matrix = [[9,9,4],[6,6,8],[2,1,1]]" + output: "4" + explanation: "The longest increasing path is [1, 2, 6, 9]." + - input: "matrix = [[3,4,5],[3,2,6],[2,2,1]]" + output: "4" + explanation: "The longest increasing path is [3, 4, 5, 6]. Moving diagonally is not allowed." + - input: "matrix = [[1]]" + output: "1" + explanation: "A single cell forms a path of length 1." + +explanation: + intuition: | + Imagine the matrix as a landscape where each cell's value represents its elevation. You're trying to find the longest route where you're always climbing uphill. + + The key insight is that this problem has **optimal substructure**: the longest path starting from any cell equals 1 (the cell itself) plus the maximum of the longest paths from its valid neighbours (neighbours with strictly greater values). + + Think of it like water flowing downhill. If you flip the perspective and consider paths going from higher to lower values, water from any cell can only flow to cells with smaller values. The longest path from a cell is determined by where its "downstream" neighbours can reach. + + Here's why memoisation works so well: once you've computed the longest increasing path starting from cell `(i, j)`, that answer never changes. No matter which cell you're exploring later, if it can move to `(i, j)`, you already know the best path from there. This turns what would be exponential exploration into a linear traversal of the matrix. + + The directed acyclic graph (DAG) structure is crucial. Since we can only move to strictly greater values, there are no cycles. This guarantees that our DFS will terminate and that dynamic programming is applicable. + + approach: | + We solve this using **DFS with Memoisation**: + + **Step 1: Set up the memoisation cache** + + - Create a 2D array `memo` of the same dimensions as the matrix + - `memo[i][j]` will store the longest increasing path starting from cell `(i, j)` + - Initialise all values to `0` (or use a dictionary for sparse storage) + +   + + **Step 2: Define the DFS function** + + - For a cell `(i, j)`, if `memo[i][j]` is already computed (non-zero), return it immediately + - Otherwise, explore all four neighbours (up, down, left, right) + - For each neighbour `(ni, nj)` where `matrix[ni][nj] > matrix[i][j]`: + - Recursively compute the longest path from `(ni, nj)` + - Track the maximum path length among all valid neighbours + - Set `memo[i][j] = 1 + max_neighbour_path` (1 for the current cell plus the best continuation) + - Return `memo[i][j]` + +   + + **Step 3: Iterate through all cells** + + - For each cell in the matrix, call the DFS function + - Track the global maximum path length across all starting cells + - Cells with cached results will return immediately, ensuring each cell is fully computed only once + +   + + **Step 4: Return the result** + + - Return the maximum path length found + +   + + The memoisation ensures that each cell is visited and computed exactly once, giving us optimal time complexity. The DFS naturally handles the dependency order since smaller-value cells depend on larger-value cells, and there are no cycles. + + common_pitfalls: + - title: Brute Force Without Memoisation + description: | + A naive DFS that doesn't cache results will recompute paths from the same cell multiple times. + + Consider a matrix where many paths converge to the same cell. Without memoisation, you'd compute the path from that cell once for every path that reaches it. + + With a `200 x 200` matrix, this can lead to exponential time complexity, causing **Time Limit Exceeded** errors. + wrong_approach: "Plain DFS exploring all paths without caching" + correct_approach: "DFS with memoisation to cache computed path lengths" + + - title: Forgetting Boundary Checks + description: | + When exploring neighbours, you must check that the neighbour indices are within bounds before accessing the matrix. + + Accessing `matrix[-1][0]` or `matrix[m][n]` will cause index errors or incorrect results. + + Always validate `0 <= ni < m` and `0 <= nj < n` before comparing values. + wrong_approach: "Checking only the value condition without bounds" + correct_approach: "Check bounds first, then check if neighbour value is greater" + + - title: Using Non-Strict Inequality + description: | + The path must be **strictly increasing**. Using `>=` instead of `>` when comparing neighbour values can create infinite loops (since equal adjacent values would let you bounce back and forth forever). + + The problem specifies "increasing path", which means each step must go to a strictly larger value. + wrong_approach: "Using matrix[ni][nj] >= matrix[i][j]" + correct_approach: "Using matrix[ni][nj] > matrix[i][j]" + + - title: Modifying the Matrix + description: | + Some solutions attempt to mark visited cells by modifying the matrix values. This breaks the algorithm because: + + 1. You might need to visit the same cell from different starting points + 2. The memoised value depends on the original matrix values + + Use a separate `memo` array instead of modifying the input. + wrong_approach: "Setting matrix[i][j] = -1 to mark as visited" + correct_approach: "Use a separate memo array for caching" + + key_takeaways: + - "**DFS + Memoisation pattern**: When exploring paths in a DAG structure, memoisation converts exponential brute force into polynomial time" + - "**Recognising DAG structure**: The strictly increasing constraint ensures no cycles, making dynamic programming applicable" + - "**Top-down vs bottom-up**: This problem is naturally suited to top-down DP (DFS with memo) since we explore from arbitrary starting points" + - "**Matrix traversal foundation**: This pattern extends to many grid problems where you need to find optimal paths with constraints" + + time_complexity: "O(m * n). Each cell is computed exactly once and cached. The DFS visits each cell at most once for computation, with O(1) lookups for cached results." + space_complexity: "O(m * n). We use a 2D memo array of the same size as the input matrix. The recursion stack can also reach O(m * n) depth in the worst case (e.g., a strictly increasing snake path)." + +solutions: + - approach_name: DFS with Memoisation + is_optimal: true + code: | + def longest_increasing_path(matrix: list[list[int]]) -> int: + if not matrix or not matrix[0]: + return 0 + + m, n = len(matrix), len(matrix[0]) + # Cache to store longest path starting from each cell + memo = [[0] * n for _ in range(m)] + # Four directions: up, down, left, right + directions = [(-1, 0), (1, 0), (0, -1), (0, 1)] + + def dfs(i: int, j: int) -> int: + # Return cached result if already computed + if memo[i][j] != 0: + return memo[i][j] + + # At minimum, the path length is 1 (the cell itself) + max_length = 1 + + # Explore all four neighbours + for di, dj in directions: + ni, nj = i + di, j + dj + # Check bounds and strictly increasing condition + if 0 <= ni < m and 0 <= nj < n and matrix[ni][nj] > matrix[i][j]: + # Recurse and track the maximum path + max_length = max(max_length, 1 + dfs(ni, nj)) + + # Cache the result before returning + memo[i][j] = max_length + return max_length + + # Try starting from every cell and track global maximum + result = 0 + for i in range(m): + for j in range(n): + result = max(result, dfs(i, j)) + + return result + explanation: | + **Time Complexity:** O(m * n) — Each cell is computed exactly once due to memoisation. + + **Space Complexity:** O(m * n) — For the memo array and recursion stack. + + The DFS explores paths starting from each cell, but memoisation ensures we never recompute. The strictly increasing constraint guarantees no cycles, making this a DAG traversal problem perfectly suited for dynamic programming. + + - approach_name: Topological Sort (BFS) + is_optimal: true + code: | + from collections import deque + + def longest_increasing_path(matrix: list[list[int]]) -> int: + if not matrix or not matrix[0]: + return 0 + + m, n = len(matrix), len(matrix[0]) + directions = [(-1, 0), (1, 0), (0, -1), (0, 1)] + + # outdegree[i][j] = count of neighbours with greater values + outdegree = [[0] * n for _ in range(m)] + + # Calculate outdegree for each cell + for i in range(m): + for j in range(n): + for di, dj in directions: + ni, nj = i + di, j + dj + if 0 <= ni < m and 0 <= nj < n and matrix[ni][nj] > matrix[i][j]: + outdegree[i][j] += 1 + + # Start BFS from cells with outdegree 0 (local maxima) + queue = deque() + for i in range(m): + for j in range(n): + if outdegree[i][j] == 0: + queue.append((i, j)) + + # BFS layer by layer, counting the number of layers + path_length = 0 + while queue: + path_length += 1 + # Process all cells at current level + for _ in range(len(queue)): + i, j = queue.popleft() + # Check all neighbours with smaller values + for di, dj in directions: + ni, nj = i + di, j + dj + if 0 <= ni < m and 0 <= nj < n and matrix[ni][nj] < matrix[i][j]: + outdegree[ni][nj] -= 1 + # If all larger neighbours processed, add to queue + if outdegree[ni][nj] == 0: + queue.append((ni, nj)) + + return path_length + explanation: | + **Time Complexity:** O(m * n) — Each cell is processed exactly once. + + **Space Complexity:** O(m * n) — For the outdegree array and queue. + + This approach treats the matrix as a DAG where edges point from smaller to larger values. We use topological sort starting from "sink" nodes (local maxima with no outgoing edges). The number of BFS layers equals the longest path length. This is an elegant alternative that avoids recursion. + + - approach_name: Brute Force DFS + is_optimal: false + code: | + def longest_increasing_path(matrix: list[list[int]]) -> int: + if not matrix or not matrix[0]: + return 0 + + m, n = len(matrix), len(matrix[0]) + directions = [(-1, 0), (1, 0), (0, -1), (0, 1)] + + def dfs(i: int, j: int) -> int: + max_length = 1 + + for di, dj in directions: + ni, nj = i + di, j + dj + if 0 <= ni < m and 0 <= nj < n and matrix[ni][nj] > matrix[i][j]: + # No caching - recomputes every time + max_length = max(max_length, 1 + dfs(ni, nj)) + + return max_length + + result = 0 + for i in range(m): + for j in range(n): + result = max(result, dfs(i, j)) + + return result + explanation: | + **Time Complexity:** O(4^(m*n)) worst case — Exponential due to repeated exploration. + + **Space Complexity:** O(m * n) — Recursion stack depth. + + This naive approach recomputes paths from the same cell multiple times. While correct, it's far too slow for the given constraints and will result in TLE. Included to illustrate why memoisation is essential. diff --git a/backend/data/questions/longest-increasing-subsequence.yaml b/backend/data/questions/longest-increasing-subsequence.yaml new file mode 100644 index 0000000..aaab516 --- /dev/null +++ b/backend/data/questions/longest-increasing-subsequence.yaml @@ -0,0 +1,234 @@ +title: Longest Increasing Subsequence +slug: longest-increasing-subsequence +difficulty: medium +leetcode_id: 300 +leetcode_url: https://leetcode.com/problems/longest-increasing-subsequence/ +categories: + - arrays + - dynamic-programming + - binary-search +patterns: + - dynamic-programming + - binary-search + +description: | + Given an integer array `nums`, return *the length of the longest **strictly increasing subsequence***. + + A **subsequence** is a sequence that can be derived from an array by deleting some or no elements without changing the order of the remaining elements. + +constraints: | + - `1 <= nums.length <= 2500` + - `-10^4 <= nums[i] <= 10^4` + +examples: + - input: "nums = [10,9,2,5,3,7,101,18]" + output: "4" + explanation: "The longest increasing subsequence is [2,3,7,101], therefore the length is 4." + - input: "nums = [0,1,0,3,2,3]" + output: "4" + explanation: "One possible longest increasing subsequence is [0,1,2,3]." + - input: "nums = [7,7,7,7,7,7,7]" + output: "1" + explanation: "The longest increasing subsequence is any single element, since all elements are equal and a strictly increasing sequence cannot have duplicates." + +explanation: + intuition: | + Imagine you're building a tower of blocks where each block you add must be larger than the one below it. You have a sequence of blocks laid out in a row, and you must pick blocks from left to right (you can skip blocks, but you can't go backwards). + + The question becomes: what's the tallest tower you can build? + + The key insight is that for each position in the array, we want to know: **"What's the longest increasing subsequence that ends at this position?"** If we can answer this for every position, the answer to the original problem is simply the maximum of all these values. + + Think of it like this: when you're at position `i`, you look back at all previous positions `j` where `nums[j] < nums[i]`. You can extend any increasing subsequence ending at `j` by adding `nums[i]` to it. So the longest subsequence ending at `i` is one more than the longest subsequence ending at any valid `j`. + + For optimal O(n log n) time, we use a different mental model: maintain a "patience sorting" pile where we track the smallest ending element for subsequences of each length. This allows us to use binary search to efficiently find where each new element fits. + + approach: | + We'll cover two approaches: the classic DP solution and the optimised binary search solution. + + **Approach A: Dynamic Programming (O(n^2))** + + **Step 1: Initialise the DP array** + + - Create a `dp` array where `dp[i]` represents the length of the longest increasing subsequence ending at index `i` + - Initialise all values to `1` since every element is a subsequence of length 1 by itself + +   + + **Step 2: Fill the DP array** + + - For each index `i` from `1` to `n-1`, look at all previous indices `j` from `0` to `i-1` + - If `nums[j] < nums[i]`, we can extend the subsequence ending at `j` by adding `nums[i]` + - Update: `dp[i] = max(dp[i], dp[j] + 1)` + +   + + **Step 3: Return the maximum** + + - The answer is `max(dp)` since the longest subsequence might end at any position + +   + + **Approach B: Binary Search with Patience Sorting (O(n log n))** + + **Step 1: Initialise a "tails" array** + + - `tails[i]` represents the smallest ending element of all increasing subsequences of length `i+1` + - Start with an empty array + +   + + **Step 2: Process each element** + + - For each number in `nums`, use binary search to find its position in `tails` + - If the number is larger than all elements in `tails`, append it (we found a longer subsequence) + - Otherwise, replace the first element in `tails` that is >= the current number + - This maintains the invariant that `tails` is always sorted + +   + + **Step 3: Return the length** + + - The length of `tails` is the answer + + common_pitfalls: + - title: Confusing Subsequence with Subarray + description: | + A **subsequence** allows skipping elements while maintaining relative order. A **subarray** must be contiguous. + + For `[10,9,2,5,3,7,101,18]`: + - `[2,5,7,101]` is a valid subsequence (not contiguous, but maintains order) + - `[9,2,5]` is both a subarray and subsequence + + Using a sliding window or subarray approach will give wrong answers since you'd miss non-contiguous increasing sequences. + wrong_approach: "Sliding window for contiguous elements" + correct_approach: "DP considering all previous elements or binary search on tails" + + - title: Forgetting Strictly Increasing + description: | + The problem asks for **strictly increasing**, meaning equal elements don't count. + + For `[1,3,3,5]`, the LIS is `[1,3,5]` with length 3, NOT `[1,3,3,5]` with length 4. + + In the DP approach, use `nums[j] < nums[i]` (strict inequality), not `<=`. + In the binary search approach, use `bisect_left` (not `bisect_right`) to handle duplicates correctly. + wrong_approach: "Using <= instead of < for comparison" + correct_approach: "Strict inequality: nums[j] < nums[i]" + + - title: O(n^2) Time Limit on Large Inputs + description: | + The basic DP solution is O(n^2). With `n = 2500`, this means up to 6.25 million operations, which is acceptable for this problem. + + However, if constraints were larger (e.g., `n = 10^5`), the DP approach would be too slow. The binary search approach scales to O(n log n) for such cases. + + Always check constraints to decide which approach is needed. + wrong_approach: "Always using O(n^2) DP without checking constraints" + correct_approach: "Use binary search for larger inputs" + + - title: Misunderstanding the Tails Array + description: | + The `tails` array in the binary search approach does **not** store an actual LIS. It stores the smallest possible ending element for subsequences of each length. + + For `[10,9,2,5,3,7]`: + - After processing: `tails = [2,3,7]` + - But `[2,3,7]` happens to be a valid LIS here + - For `[3,1,2]`: `tails = [1,2]`, but `[1,2]` is not from the original subsequence `[3,1,2]` as `1` comes after `3` + + The length of `tails` is always correct, but its contents may not form a valid subsequence from the input. + wrong_approach: "Thinking tails array contains the actual LIS" + correct_approach: "Understand tails gives length only, not the actual subsequence" + + key_takeaways: + - "**Classic DP pattern**: When computing properties of subsequences, think about what information you need at each position and how previous positions contribute" + - "**Patience sorting insight**: Maintaining sorted auxiliary structures enables binary search optimisation, reducing O(n^2) to O(n log n)" + - "**Foundation for harder problems**: LIS appears in many variations (longest bitonic subsequence, Russian doll envelopes, box stacking) and understanding both approaches unlocks these" + - "**Subsequence vs subarray**: Always clarify whether the problem allows skipping elements \u2014 this fundamentally changes the approach" + + time_complexity: "O(n^2) for dynamic programming, O(n log n) for binary search. The DP approach compares each element with all previous elements; the binary search approach performs a log n search for each of n elements." + space_complexity: "O(n). Both approaches use an auxiliary array of size n (`dp` array for DP, `tails` array for binary search)." + +solutions: + - approach_name: Binary Search (Patience Sorting) + is_optimal: true + code: | + import bisect + + def length_of_lis(nums: list[int]) -> int: + # tails[i] = smallest ending element of all increasing + # subsequences of length i+1 + tails = [] + + for num in nums: + # Find position where num should be inserted + # bisect_left handles duplicates correctly (strict increase) + pos = bisect.bisect_left(tails, num) + + if pos == len(tails): + # num is larger than all tails - extend longest subsequence + tails.append(num) + else: + # Replace to maintain smallest possible tail + tails[pos] = num + + # Length of tails = length of longest increasing subsequence + return len(tails) + explanation: | + **Time Complexity:** O(n log n) \u2014 For each of n elements, we perform a binary search in O(log n). + + **Space Complexity:** O(n) \u2014 The tails array can grow up to size n. + + This approach maintains an array where `tails[i]` is the smallest ending element of all increasing subsequences of length `i+1`. By keeping tails sorted and using binary search, we efficiently determine whether to extend the longest subsequence or update an existing length's tail. + + - approach_name: Dynamic Programming + is_optimal: false + code: | + def length_of_lis(nums: list[int]) -> int: + n = len(nums) + # dp[i] = length of longest increasing subsequence ending at i + dp = [1] * n # Every element is a subsequence of length 1 + + for i in range(1, n): + # Check all previous elements + for j in range(i): + # If we can extend the subsequence ending at j + if nums[j] < nums[i]: + dp[i] = max(dp[i], dp[j] + 1) + + # LIS can end at any position + return max(dp) + explanation: | + **Time Complexity:** O(n^2) \u2014 Nested loops comparing each element with all previous elements. + + **Space Complexity:** O(n) \u2014 The dp array stores one value per element. + + This classic DP solution builds up the answer by computing, for each position, the longest increasing subsequence that ends at that position. The final answer is the maximum across all positions. While intuitive, this approach is slower than the binary search method for large inputs. + + - approach_name: Brute Force (Recursion with Memoisation) + is_optimal: false + code: | + def length_of_lis(nums: list[int]) -> int: + from functools import lru_cache + + n = len(nums) + + @lru_cache(maxsize=None) + def lis_ending_at(index: int) -> int: + """Return length of LIS ending at index.""" + # Base case: subsequence of just this element + max_length = 1 + + # Try extending from any valid previous position + for prev in range(index): + if nums[prev] < nums[index]: + max_length = max(max_length, 1 + lis_ending_at(prev)) + + return max_length + + # LIS can end at any position + return max(lis_ending_at(i) for i in range(n)) + explanation: | + **Time Complexity:** O(n^2) \u2014 Same as iterative DP due to memoisation. + + **Space Complexity:** O(n) \u2014 Memoisation cache and recursion stack. + + This recursive approach with memoisation is equivalent to the iterative DP but may be more intuitive for some. The recurrence relation is clear: the LIS ending at index `i` is 1 plus the maximum LIS ending at any previous index `j` where `nums[j] < nums[i]`. Included to show the connection between recursion and DP. diff --git a/backend/data/questions/longest-palindromic-substring.yaml b/backend/data/questions/longest-palindromic-substring.yaml new file mode 100644 index 0000000..36c9a04 --- /dev/null +++ b/backend/data/questions/longest-palindromic-substring.yaml @@ -0,0 +1,171 @@ +title: Longest Palindromic Substring +slug: longest-palindromic-substring +difficulty: medium +leetcode_id: 5 +leetcode_url: https://leetcode.com/problems/longest-palindromic-substring/ +categories: + - strings + - dynamic-programming +patterns: + - two-pointers + - dynamic-programming + +description: | + Given a string `s`, return *the longest palindromic substring* in `s`. + + A **palindrome** is a string that reads the same forward and backward. + +constraints: | + - `1 <= s.length <= 1000` + - `s` consists of only digits and English letters + +examples: + - input: 's = "babad"' + output: '"bab"' + explanation: '"aba" is also a valid answer — both have length 3.' + - input: 's = "cbbd"' + output: '"bb"' + explanation: "The longest palindromic substring is \"bb\" with length 2." + +explanation: + intuition: | + Every palindrome has a **center**. For odd-length palindromes like "aba", the center is the middle character 'b'. For even-length palindromes like "abba", the center is the gap between the two 'b's. + + Think of it like this: if we know the center, we can find the full palindrome by **expanding outward** — checking if the characters on both sides match. We keep expanding until they don't match. + + The strategy is simple: try every possible center (each character and each gap between characters), expand to find the longest palindrome for that center, and track the overall longest. + + This "expand around center" approach is intuitive and uses O(1) extra space, making it ideal for interviews. + + approach: | + We solve this using **Expand Around Center**: + + **Step 1: Define the expand helper function** + + - `expand(left, right)` returns the bounds of the longest palindrome centered at this position + - While `left >= 0` and `right < len(s)` and `s[left] == s[right]`: + - Expand: decrement `left`, increment `right` + - Return the bounds of the palindrome (after adjusting for the last failed expansion) + +   + + **Step 2: Try every possible center** + + - For each index `i`: + - Try **odd-length** palindrome: `expand(i, i)` — center is single character + - Try **even-length** palindrome: `expand(i, i+1)` — center is between characters + - Update the best result if either expansion found a longer palindrome + +   + + **Step 3: Return the longest palindrome** + + - Track `start` and `end` indices of the best palindrome found + - Return `s[start:end+1]` + +   + + Why does this work? By checking both odd and even centers at every position, we're guaranteed to find the center of the longest palindrome somewhere. + + common_pitfalls: + - title: Forgetting Even-Length Palindromes + description: | + If you only expand around single characters, you'll miss even-length palindromes like "abba" or "bb". + + Every position needs two expansion attempts: one for odd (center at `i`) and one for even (center between `i` and `i+1`). + wrong_approach: "Only calling expand(i, i)" + correct_approach: "Call both expand(i, i) and expand(i, i+1)" + + - title: Index Out of Bounds During Expansion + description: | + The expansion loop must check bounds **before** accessing characters. A common mistake is checking equality first, which causes an index error. + wrong_approach: "while s[left] == s[right] and left >= 0 and right < len(s)" + correct_approach: "while left >= 0 and right < len(s) and s[left] == s[right]" + + - title: Returning Length Instead of Substring + description: | + The problem asks for the actual substring, not just its length. Track the start and end positions of the best palindrome so you can extract it. + wrong_approach: "return max_length" + correct_approach: "return s[start:end+1]" + + key_takeaways: + - "**Expand around center**: O(n²) time, O(1) space — optimal for interviews" + - "**Handle both odd and even**: Check single-character centers AND gaps between characters" + - "**Track positions, not just length**: You need to return the actual substring" + - "**Manacher's algorithm**: Can solve in O(n) but is complex — not expected in interviews" + + time_complexity: "O(n²). For each of n possible centers, expansion can take up to O(n) time in the worst case." + space_complexity: "O(1). Only a few variables for tracking positions — no additional data structures." + +solutions: + - approach_name: Expand Around Center + is_optimal: true + code: | + def longest_palindrome(s: str) -> str: + def expand(left: int, right: int) -> tuple[int, int]: + """Expand around center and return palindrome bounds.""" + while left >= 0 and right < len(s) and s[left] == s[right]: + left -= 1 + right += 1 + # Return bounds of palindrome (undo last expansion) + return left + 1, right - 1 + + start, end = 0, 0 + + for i in range(len(s)): + # Try odd-length palindrome (single character center) + l1, r1 = expand(i, i) + if r1 - l1 > end - start: + start, end = l1, r1 + + # Try even-length palindrome (center between characters) + l2, r2 = expand(i, i + 1) + if r2 - l2 > end - start: + start, end = l2, r2 + + return s[start:end + 1] + explanation: | + **Time Complexity:** O(n²) — n centers, up to n expansion steps each. + + **Space Complexity:** O(1) — Only tracking start/end positions. + + For each position, we try both odd and even palindrome centers. The expand function returns the bounds of the longest palindrome for that center. We track the overall longest and return it at the end. + + - approach_name: Dynamic Programming + is_optimal: false + code: | + def longest_palindrome(s: str) -> str: + n = len(s) + if n < 2: + return s + + # dp[i][j] = True if s[i:j+1] is a palindrome + dp = [[False] * n for _ in range(n)] + start, max_len = 0, 1 + + # Single characters are palindromes + for i in range(n): + dp[i][i] = True + + # Check substrings of increasing length + for length in range(2, n + 1): + for i in range(n - length + 1): + j = i + length - 1 + + if length == 2: + # Two characters: palindrome if they match + dp[i][j] = s[i] == s[j] + else: + # Longer: palindrome if ends match AND inner is palindrome + dp[i][j] = s[i] == s[j] and dp[i + 1][j - 1] + + if dp[i][j] and length > max_len: + start, max_len = i, length + + return s[start:start + max_len] + explanation: | + **Time Complexity:** O(n²) — Fill n×n table. + + **Space Complexity:** O(n²) — DP table storage. + + Build up palindrome information from smaller to larger substrings. `dp[i][j]` is True if substring from i to j is a palindrome. A substring is a palindrome if its ends match and its inner substring is also a palindrome. Track the longest one found. diff --git a/backend/data/questions/longest-turbulent-subarray.yaml b/backend/data/questions/longest-turbulent-subarray.yaml new file mode 100644 index 0000000..e5cbac3 --- /dev/null +++ b/backend/data/questions/longest-turbulent-subarray.yaml @@ -0,0 +1,237 @@ +title: Longest Turbulent Subarray +slug: longest-turbulent-subarray +difficulty: medium +leetcode_id: 978 +leetcode_url: https://leetcode.com/problems/longest-turbulent-subarray/ +categories: + - arrays + - dynamic-programming +patterns: + - sliding-window + - dynamic-programming + +description: | + Given an integer array `arr`, return *the length of a maximum size turbulent subarray of* `arr`. + + A subarray is **turbulent** if the comparison sign flips between each adjacent pair of elements in the subarray. + + More formally, a subarray `[arr[i], arr[i + 1], ..., arr[j]]` of `arr` is said to be turbulent if and only if: + + - For `i <= k < j`: + - `arr[k] > arr[k + 1]` when `k` is odd, and + - `arr[k] < arr[k + 1]` when `k` is even. + - Or, for `i <= k < j`: + - `arr[k] > arr[k + 1]` when `k` is even, and + - `arr[k] < arr[k + 1]` when `k` is odd. + +constraints: | + - `1 <= arr.length <= 4 * 10^4` + - `0 <= arr[i] <= 10^9` + +examples: + - input: "arr = [9,4,2,10,7,8,8,1,9]" + output: "5" + explanation: "arr[1] > arr[2] < arr[3] > arr[4] < arr[5], which gives the turbulent subarray [4,2,10,7,8] with length 5." + - input: "arr = [4,8,12,16]" + output: "2" + explanation: "All elements are strictly increasing, so the longest turbulent subarray is any pair of adjacent elements." + - input: "arr = [100]" + output: "1" + explanation: "A single element is trivially turbulent." + +explanation: + intuition: | + Imagine a stock price chart that zigzags up and down. A **turbulent subarray** is like finding the longest stretch where the chart alternates direction at every point — up, then down, then up, then down (or vice versa). + + The key insight is that we don't care about the absolute values or even the parity of indices. What matters is whether the **comparison sign flips** between consecutive pairs. If we have `a < b`, the next comparison must be `b > c` for the sequence to remain turbulent. If we ever see `a < b < c` (same direction twice) or `a == b` (no direction), the turbulent sequence breaks. + + Think of it like walking on a wavy path: every step must change direction. The moment you take two steps in the same direction (or stand still), you've left the turbulent zone. + + This naturally leads to a **sliding window** or **dynamic programming** approach: extend the current turbulent sequence while the alternation holds, and reset when it breaks. + + approach: | + We solve this using a **Single Pass with Two Counters** approach: + + **Step 1: Handle the edge case** + + - If the array has only one element, return `1` immediately (a single element is trivially turbulent) + +   + + **Step 2: Initialise tracking variables** + + - `inc`: Length of the longest turbulent subarray ending at the current position where the last comparison was increasing (`arr[i-1] < arr[i]`) + - `dec`: Length of the longest turbulent subarray ending at the current position where the last comparison was decreasing (`arr[i-1] > arr[i]`) + - Both start at `1` since a single element has length 1 + - `result`: Tracks the maximum length seen, initialised to `1` + +   + + **Step 3: Iterate through the array starting from index 1** + + - For each position `i`, compare `arr[i]` with `arr[i-1]`: + - If `arr[i-1] < arr[i]` (increasing): Set `inc = dec + 1` (extend the previous decreasing sequence) and reset `dec = 1` + - If `arr[i-1] > arr[i]` (decreasing): Set `dec = inc + 1` (extend the previous increasing sequence) and reset `inc = 1` + - If `arr[i-1] == arr[i]` (equal): Reset both `inc = 1` and `dec = 1` (turbulence broken) + - Update `result` with `max(result, inc, dec)` + +   + + **Step 4: Return the result** + + - Return `result` after processing all elements + +   + + The key insight is that `inc` and `dec` track complementary states: to extend an increasing comparison, we need the previous comparison to have been decreasing (and vice versa). This is why we set `inc = dec + 1` when we see an increase. + + common_pitfalls: + - title: Misunderstanding the Turbulence Definition + description: | + A common mistake is thinking turbulence requires a specific starting direction or depends on index parity in absolute terms. The definition is actually simpler: **comparisons must alternate**. + + The two cases in the problem description (odd/even rules) just describe the two possible patterns: + - `< > < > ...` (starts with increase) + - `> < > < ...` (starts with decrease) + + Both are valid turbulent sequences. Focus on whether consecutive comparisons flip, not on the index values. + wrong_approach: "Checking if index is odd/even to determine expected comparison" + correct_approach: "Track whether the last comparison was increasing or decreasing, and check if the current one flips" + + - title: Forgetting Equal Elements Break Turbulence + description: | + When `arr[i-1] == arr[i]`, there's no comparison sign — it's neither increasing nor decreasing. This breaks any turbulent sequence. + + For example, in `[9,4,2,10,7,8,8,1,9]`, the sequence `8,8` breaks the turbulence, so we can't connect `[4,2,10,7,8]` with `[8,1,9]`. + + Always reset both counters when encountering equal adjacent elements. + wrong_approach: "Ignoring equal elements or treating them as continuing the pattern" + correct_approach: "Reset both inc and dec to 1 when arr[i-1] == arr[i]" + + - title: Off-by-One in Sequence Length + description: | + A turbulent subarray with `k` elements has `k-1` comparisons. When extending, we add `1` to the previous counter (e.g., `inc = dec + 1`), not to the number of comparisons. + + For `[4, 2, 10]`: After seeing `4 > 2`, `dec = 2` (two elements). After seeing `2 < 10`, `inc = dec + 1 = 3` (three elements). This correctly counts the elements, not comparisons. + wrong_approach: "Counting comparisons instead of elements, or incorrect initialization" + correct_approach: "Initialize counters to 1 (single element) and add 1 when extending" + + key_takeaways: + - "**Dual-state DP**: When a sequence's validity depends on its ending condition, track multiple states (here, `inc` and `dec`) that feed into each other" + - "**Sliding window without explicit pointers**: The counters implicitly maintain a window — resetting to `1` is equivalent to starting a new window" + - "**Alternation patterns**: For problems requiring alternating conditions, track what the *last* state was and check if the *current* state differs" + - "**Pattern recognition**: This is similar to 'Wiggle Subsequence' (LeetCode 376), which asks for the longest *subsequence* (not subarray) with alternating differences" + + time_complexity: "O(n). We traverse the array exactly once, performing constant-time operations at each step." + space_complexity: "O(1). We only use a fixed number of variables (`inc`, `dec`, `result`) regardless of input size." + +solutions: + - approach_name: Single Pass with Two Counters + is_optimal: true + code: | + def max_turbulence_size(arr: list[int]) -> int: + n = len(arr) + if n == 1: + return 1 + + # inc: length of turbulent subarray ending here with last comparison increasing + # dec: length of turbulent subarray ending here with last comparison decreasing + inc = dec = 1 + result = 1 + + for i in range(1, n): + if arr[i - 1] < arr[i]: + # Current is increasing, extend from previous decreasing + inc = dec + 1 + dec = 1 # Reset decreasing counter + elif arr[i - 1] > arr[i]: + # Current is decreasing, extend from previous increasing + dec = inc + 1 + inc = 1 # Reset increasing counter + else: + # Equal elements break turbulence + inc = dec = 1 + + result = max(result, inc, dec) + + return result + explanation: | + **Time Complexity:** O(n) — Single pass through the array. + + **Space Complexity:** O(1) — Only three variables used. + + We maintain two counters that track the length of turbulent subarrays ending at the current position, distinguished by whether the last comparison was increasing or decreasing. When we see an increase, we can extend any previous sequence that ended with a decrease (and vice versa). Equal elements reset both counters since they break turbulence. + + - approach_name: Explicit Sliding Window + is_optimal: true + code: | + def max_turbulence_size(arr: list[int]) -> int: + n = len(arr) + if n == 1: + return 1 + + # Helper to get comparison sign: -1, 0, or 1 + def cmp(a: int, b: int) -> int: + if a < b: + return -1 + elif a > b: + return 1 + return 0 + + result = 1 + left = 0 # Start of current turbulent window + + for right in range(1, n): + c = cmp(arr[right - 1], arr[right]) + + if c == 0: + # Equal elements: start fresh window after this position + left = right + elif right == left + 1: + # Second element of window: any non-zero comparison is valid + result = max(result, 2) + else: + # Check if comparison alternates from previous + prev_c = cmp(arr[right - 2], arr[right - 1]) + if c == prev_c: + # Same direction twice: start new window from previous position + left = right - 1 + + # Window size is right - left + 1 + result = max(result, right - left + 1) + + return result + explanation: | + **Time Complexity:** O(n) — Single pass through the array. + + **Space Complexity:** O(1) — Only tracking window boundaries. + + This version explicitly maintains a sliding window with `left` and `right` pointers. The window shrinks (by moving `left`) when turbulence breaks: either due to equal elements or two consecutive comparisons in the same direction. This approach is conceptually clearer for those familiar with sliding window patterns. + + - approach_name: Dynamic Programming (Explicit) + is_optimal: false + code: | + def max_turbulence_size(arr: list[int]) -> int: + n = len(arr) + if n == 1: + return 1 + + # dp_inc[i]: length of turbulent subarray ending at i with arr[i-1] < arr[i] + # dp_dec[i]: length of turbulent subarray ending at i with arr[i-1] > arr[i] + dp_inc = [1] * n + dp_dec = [1] * n + + for i in range(1, n): + if arr[i - 1] < arr[i]: + dp_inc[i] = dp_dec[i - 1] + 1 + elif arr[i - 1] > arr[i]: + dp_dec[i] = dp_inc[i - 1] + 1 + # If equal, both remain 1 (initialized value) + + return max(max(dp_inc), max(dp_dec)) + explanation: | + **Time Complexity:** O(n) — Single pass to fill DP arrays. + + **Space Complexity:** O(n) — Two arrays of length n. + + This explicit DP formulation makes the recurrence clear: `dp_inc[i]` depends on `dp_dec[i-1]` and vice versa. While correct, it uses O(n) space unnecessarily — since we only need the previous values, we can reduce to O(1) space as shown in the optimal solution. Included here to illustrate the DP structure before optimization. diff --git a/backend/data/questions/longest-valid-parentheses.yaml b/backend/data/questions/longest-valid-parentheses.yaml new file mode 100644 index 0000000..8aef82b --- /dev/null +++ b/backend/data/questions/longest-valid-parentheses.yaml @@ -0,0 +1,259 @@ +title: Longest Valid Parentheses +slug: longest-valid-parentheses +difficulty: hard +leetcode_id: 32 +leetcode_url: https://leetcode.com/problems/longest-valid-parentheses/ +categories: + - strings + - stack + - dynamic-programming +patterns: + - dynamic-programming + - monotonic-stack + +description: | + Given a string containing just the characters `'('` and `')'`, return *the length of the longest valid (well-formed) parentheses substring*. + + A valid parentheses substring is one where every opening parenthesis `'('` has a corresponding closing parenthesis `')'` and they are properly nested. + +constraints: | + - `0 <= s.length <= 3 * 10^4` + - `s[i]` is `'('` or `')'` + +examples: + - input: 's = "(()"' + output: "2" + explanation: "The longest valid parentheses substring is \"()\"." + - input: 's = ")()())"' + output: "4" + explanation: "The longest valid parentheses substring is \"()()\"." + - input: 's = ""' + output: "0" + explanation: "An empty string has no valid parentheses." + +explanation: + intuition: | + Imagine you're reading through a string of parentheses and trying to find the longest stretch where they're perfectly balanced. + + The key insight is that a valid parentheses string can be **broken by unmatched characters**. An unmatched `)` at position `i` means any valid substring must start *after* `i`. Similarly, an unmatched `(` at position `j` means any valid substring ending before `j` cannot extend past it. + + Think of it like this: unmatched parentheses act as **barriers** that divide the string into segments. Within each segment, we need to find how far the valid matching extends. + + There are two elegant ways to approach this: + + 1. **Stack approach**: Use a stack to track indices of unmatched `(` characters. When we see a `)`, we either match it with a `(` (pop from stack) or mark it as a barrier. The stack always holds indices that "break" the valid sequence. + + 2. **Dynamic Programming**: For each position, calculate the length of the longest valid substring *ending* at that position. A `)` at position `i` can extend a valid substring if there's a matching `(` available. + + The stack approach is more intuitive once you see it: we push indices as barriers, and the distance from the current index to the top of the stack gives us the length of the current valid segment. + + approach: | + We'll use the **Stack Approach** as our optimal solution: + + **Step 1: Initialise the stack with a base index** + + - Push `-1` onto the stack as a "floor" or base index + - This handles the edge case where a valid substring starts from index `0` + - The stack will store indices of unmatched `(` characters and barrier positions + +   + + **Step 2: Iterate through each character** + + - For each character at index `i`: + - If it's `'('`: push `i` onto the stack (potential start of valid sequence) + - If it's `')'`: pop from the stack (try to match with a `(`) + +   + + **Step 3: Calculate valid length after each `)`** + + - After popping for a `)`: + - If the stack is **empty**: this `)` is unmatched, push `i` as a new barrier + - If the stack is **not empty**: calculate `i - stack.top()` as the length of the current valid substring + - Update `max_length` with the maximum value seen + +   + + **Step 4: Return the result** + + - Return `max_length` after processing all characters + +   + + **Why this works**: The stack always contains indices that "break" valid sequences. The distance from the current index to the stack top represents how far back the current valid sequence extends. + + common_pitfalls: + - title: Forgetting the Base Index + description: | + Without pushing `-1` initially, the first valid substring starting from index `0` won't be calculated correctly. + + For example, with `s = "()"`: + - At index `0`, push `0` + - At index `1`, pop `0`, stack is now empty + - Without a base, we can't calculate `1 - (-1) = 2` + + Always initialise with `-1` to handle edge cases cleanly. + wrong_approach: "Start with an empty stack" + correct_approach: "Push -1 as the base index before processing" + + - title: Confusing Valid Substring vs Total Matches + description: | + This problem asks for the longest **contiguous** valid substring, not the total number of matched pairs. + + For `s = "()(())"`: + - Total matched pairs: 3 (length 6) + - But the whole string is one valid substring of length 6 + + For `s = "())()"`: + - Total matched pairs: 2 + - But longest valid substring is only 2 (`()` at the end or beginning) + + The unmatched `)` at index 2 breaks the string into separate segments. + wrong_approach: "Count total matched pairs" + correct_approach: "Track longest contiguous valid segment" + + - title: Using O(n) Space When O(1) is Possible + description: | + While the stack solution is intuitive and efficient at O(n) space, there's actually an O(1) space solution using two-pass counting. + + For interviews, the stack approach is typically expected, but knowing the O(1) solution demonstrates deeper understanding. + wrong_approach: "Only knowing the stack approach" + correct_approach: "Understand both stack O(n) and two-pass O(1) approaches" + + - title: Off-by-One Errors in Length Calculation + description: | + When calculating `i - stack.top()`, remember that this gives the length, not the ending index. + + For example, if `i = 5` and `stack.top() = 2`: + - Length = `5 - 2 = 3` (positions 3, 4, 5) + - This represents indices 3 through 5 inclusive + + Make sure your mental model matches: we're measuring distance, not counting indices. + + key_takeaways: + - "**Stack for matching problems**: Using a stack to track indices (not just characters) is a powerful technique for parentheses and bracket matching" + - "**Barrier concept**: Unmatched characters act as barriers that reset the valid substring count" + - "**Base index trick**: Pushing `-1` as a base handles edge cases elegantly without special-casing" + - "**Related problems**: Valid Parentheses (#20), Generate Parentheses (#22), and Minimum Add to Make Parentheses Valid (#921) use similar concepts" + + time_complexity: "O(n). We traverse the string exactly once, and each index is pushed and popped from the stack at most once." + space_complexity: "O(n). In the worst case (all opening parentheses), the stack holds all n indices. The two-pass approach achieves O(1) space." + +solutions: + - approach_name: Stack with Index Tracking + is_optimal: true + code: | + def longest_valid_parentheses(s: str) -> int: + # Stack stores indices of unmatched '(' and barrier positions + # Start with -1 as base to handle valid substring starting at index 0 + stack = [-1] + max_length = 0 + + for i, char in enumerate(s): + if char == '(': + # Push index of '(' as potential start of valid sequence + stack.append(i) + else: + # Pop to match this ')' with a '(' + stack.pop() + + if not stack: + # Stack empty means this ')' is unmatched + # Push current index as new barrier + stack.append(i) + else: + # Calculate length of current valid substring + # Distance from current position to the last barrier + current_length = i - stack[-1] + max_length = max(max_length, current_length) + + return max_length + explanation: | + **Time Complexity:** O(n) — Single pass through the string. + + **Space Complexity:** O(n) — Stack can hold up to n indices in the worst case. + + The stack maintains a "barrier" at its top, representing the rightmost position that breaks valid parentheses. When we find a valid match, the distance from the current index to this barrier gives us the valid substring length. + + - approach_name: Dynamic Programming + is_optimal: false + code: | + def longest_valid_parentheses(s: str) -> int: + if not s: + return 0 + + n = len(s) + # dp[i] = length of longest valid substring ending at index i + dp = [0] * n + max_length = 0 + + for i in range(1, n): + if s[i] == ')': + if s[i - 1] == '(': + # Case 1: "()" pattern - extends previous valid substring + dp[i] = (dp[i - 2] if i >= 2 else 0) + 2 + elif i - dp[i - 1] > 0 and s[i - dp[i - 1] - 1] == '(': + # Case 2: "))" pattern - check if there's matching '(' + # before the valid substring ending at i-1 + dp[i] = dp[i - 1] + 2 + # Add any valid substring before the matching '(' + if i - dp[i - 1] >= 2: + dp[i] += dp[i - dp[i - 1] - 2] + + max_length = max(max_length, dp[i]) + + return max_length + explanation: | + **Time Complexity:** O(n) — Single pass through the string. + + **Space Complexity:** O(n) — DP array of size n. + + For each `)` at position `i`, we determine if it can extend a valid substring: + - If preceded by `(`, we have a `()` pair adding 2 to whatever came before + - If preceded by `)`, we look past the valid substring ending at `i-1` to find a matching `(` + + - approach_name: Two-Pass Counting + is_optimal: false + code: | + def longest_valid_parentheses(s: str) -> int: + # O(1) space solution using two passes + + max_length = 0 + left = right = 0 + + # Left to right pass + for char in s: + if char == '(': + left += 1 + else: + right += 1 + + if left == right: + # Balanced - this is a valid substring + max_length = max(max_length, 2 * right) + elif right > left: + # Too many ')' - reset counters + left = right = 0 + + # Right to left pass (handles excess '(' cases) + left = right = 0 + for char in reversed(s): + if char == '(': + left += 1 + else: + right += 1 + + if left == right: + max_length = max(max_length, 2 * left) + elif left > right: + # Too many '(' - reset counters + left = right = 0 + + return max_length + explanation: | + **Time Complexity:** O(n) — Two passes through the string. + + **Space Complexity:** O(1) — Only uses counter variables. + + This clever approach counts left and right parentheses. When counts match, we have a valid substring. We need two passes because a single pass can't handle both excess `(` and excess `)` cases. Left-to-right handles excess `)`, right-to-left handles excess `(`. diff --git a/backend/data/questions/lowest-common-ancestor-of-a-binary-search-tree.yaml b/backend/data/questions/lowest-common-ancestor-of-a-binary-search-tree.yaml new file mode 100644 index 0000000..bb54833 --- /dev/null +++ b/backend/data/questions/lowest-common-ancestor-of-a-binary-search-tree.yaml @@ -0,0 +1,172 @@ +title: Lowest Common Ancestor of a Binary Search Tree +slug: lowest-common-ancestor-of-a-binary-search-tree +difficulty: medium +leetcode_id: 235 +leetcode_url: https://leetcode.com/problems/lowest-common-ancestor-of-a-binary-search-tree/ +categories: + - trees +patterns: + - tree-traversal + - binary-search + +description: | + Given a binary search tree (BST), find the lowest common ancestor (LCA) node of two given nodes in the BST. + + According to the definition of LCA on Wikipedia: "The lowest common ancestor is defined between two nodes `p` and `q` as the lowest node in `T` that has both `p` and `q` as descendants (where we allow **a node to be a descendant of itself**)." + +constraints: | + - The number of nodes in the tree is in the range `[2, 10^5]` + - `-10^9 <= Node.val <= 10^9` + - All `Node.val` are **unique** + - `p != q` + - `p` and `q` will exist in the BST + +examples: + - input: "root = [6,2,8,0,4,7,9,null,null,3,5], p = 2, q = 8" + output: "6" + explanation: "The LCA of nodes 2 and 8 is 6." + - input: "root = [6,2,8,0,4,7,9,null,null,3,5], p = 2, q = 4" + output: "2" + explanation: "The LCA of nodes 2 and 4 is 2, since a node can be a descendant of itself according to the LCA definition." + - input: "root = [2,1], p = 2, q = 1" + output: "2" + explanation: "The LCA of nodes 2 and 1 is 2." + +explanation: + intuition: | + The key insight is to **leverage the BST property**: for any node, all values in its left subtree are smaller, and all values in its right subtree are larger. + + Think of it like searching for a meeting point. Imagine you're standing at the root, and two people are trying to find each other — one at node `p` and one at node `q`. As you traverse down the tree, at some point you'll reach a node where the two people would need to go in **different directions** to reach their respective nodes. That splitting point is the LCA. + + More concretely: + - If both `p` and `q` are **smaller** than the current node, the LCA must be in the left subtree + - If both `p` and `q` are **larger** than the current node, the LCA must be in the right subtree + - If `p` and `q` are on **opposite sides** (or one equals the current node), then the current node is the LCA + + This is fundamentally different from finding the LCA in a general binary tree, where you'd need to search both subtrees. The BST ordering gives us a guaranteed direction at each step. + + approach: | + We solve this using the **BST Property** to guide our traversal: + + **Step 1: Start at the root** + + - Begin traversal at the root node + - We'll move down the tree based on how `p` and `q` compare to the current node + +   + + **Step 2: Compare values and decide direction** + + - If both `p.val` and `q.val` are **less than** `current.val`, move to the left child + - If both `p.val` and `q.val` are **greater than** `current.val`, move to the right child + - Otherwise, we've found the split point — return the current node + +   + + **Step 3: The split point is the LCA** + + - When `p` and `q` lie on different sides of the current node (or one of them equals the current node), the current node is the lowest common ancestor + - Return this node as the answer + +   + + This works because the BST property guarantees that once `p` and `q` "split" to different subtrees, they can never reunite at a lower node. + + common_pitfalls: + - title: Ignoring the BST Property + description: | + A common mistake is treating this like a general binary tree LCA problem and recursing into both subtrees to find `p` and `q`. + + In a general binary tree, you'd need to search both children and check which subtree contains which node. But in a BST, you can determine which direction to go with a simple value comparison — O(1) per node instead of potentially visiting both subtrees. + + This makes the BST solution O(h) instead of O(n). + wrong_approach: "Search both subtrees like in a general binary tree" + correct_approach: "Use value comparisons to choose one direction at each step" + + - title: Forgetting a Node Can Be Its Own Ancestor + description: | + The problem states that a node can be a descendant of itself. If `p = 2` and `q = 4`, and node 2 is an ancestor of node 4 in the BST, then the LCA is 2, not some parent of 2. + + When checking the split condition, remember to handle the case where the current node equals `p` or `q`. In this case, the current node is the LCA because one node is an ancestor of the other. + wrong_approach: "Only return when p and q are on opposite sides" + correct_approach: "Return when p and q split OR when current equals p or q" + + - title: Incorrect Comparison Logic + description: | + Be careful with the comparison operators. The condition for moving left is when **both** values are less than current. Similarly for moving right. + + A common bug is using OR instead of AND: + - Wrong: `if p.val < current.val or q.val < current.val` (might miss the split) + - Correct: `if p.val < current.val and q.val < current.val` + wrong_approach: "Using OR logic for direction decisions" + correct_approach: "Using AND logic — both must be less/greater to continue" + + key_takeaways: + - "**Exploit BST ordering**: The BST property lets you make O(1) direction decisions, avoiding the need to search both subtrees" + - "**Split point = LCA**: The moment two values would need to go different directions, you've found their common ancestor" + - "**Iterative vs recursive**: Both approaches work, but iterative uses O(1) space vs O(h) for the recursive call stack" + - "**Foundation for harder problems**: This pattern extends to problems like finding paths between nodes or validating BST structure" + + time_complexity: "O(h) where h is the height of the tree. In a balanced BST, h = log(n). In the worst case (skewed tree), h = n." + space_complexity: "O(1) for the iterative solution. We only use a single pointer to traverse the tree, regardless of input size." + +solutions: + - approach_name: Iterative Traversal + is_optimal: true + code: | + class TreeNode: + def __init__(self, val=0, left=None, right=None): + self.val = val + self.left = left + self.right = right + + def lowest_common_ancestor(root: TreeNode, p: TreeNode, q: TreeNode) -> TreeNode: + # Start at the root and traverse down + current = root + + while current: + # Both nodes are in the left subtree + if p.val < current.val and q.val < current.val: + current = current.left + # Both nodes are in the right subtree + elif p.val > current.val and q.val > current.val: + current = current.right + # Split point found — p and q are on different sides + # (or one of them equals current) + else: + return current + + return None # Should never reach here if p and q exist in tree + explanation: | + **Time Complexity:** O(h) — We traverse at most the height of the tree, making one comparison per level. + + **Space Complexity:** O(1) — Only a single pointer variable is used; no recursion stack. + + We exploit the BST property to navigate directly to the LCA. At each node, we compare both `p` and `q` values to decide whether to go left, right, or stop. The moment they would diverge (or one matches the current node), we've found the LCA. + + - approach_name: Recursive Traversal + is_optimal: false + code: | + class TreeNode: + def __init__(self, val=0, left=None, right=None): + self.val = val + self.left = left + self.right = right + + def lowest_common_ancestor(root: TreeNode, p: TreeNode, q: TreeNode) -> TreeNode: + # Both nodes are in the left subtree + if p.val < root.val and q.val < root.val: + return lowest_common_ancestor(root.left, p, q) + + # Both nodes are in the right subtree + if p.val > root.val and q.val > root.val: + return lowest_common_ancestor(root.right, p, q) + + # Split point — this is the LCA + return root + explanation: | + **Time Complexity:** O(h) — Same traversal pattern as iterative, visiting at most h nodes. + + **Space Complexity:** O(h) — Recursive call stack can grow up to the height of the tree. + + This recursive version follows the same logic but uses the call stack instead of a loop. While elegant, it uses more space than the iterative approach. The recursive calls naturally unwind once we find the split point. diff --git a/backend/data/questions/lru-cache.yaml b/backend/data/questions/lru-cache.yaml new file mode 100644 index 0000000..e2c42df --- /dev/null +++ b/backend/data/questions/lru-cache.yaml @@ -0,0 +1,258 @@ +title: LRU Cache +slug: lru-cache +difficulty: medium +leetcode_id: 146 +leetcode_url: https://leetcode.com/problems/lru-cache/ +categories: + - hash-tables + - linked-lists +patterns: + - linkedlist-reversal + +description: | + Design a data structure that follows the constraints of a **Least Recently Used (LRU) cache**. + + Implement the `LRUCache` class: + + - `LRUCache(int capacity)` Initialise the LRU cache with **positive** size `capacity`. + - `int get(int key)` Return the value of the `key` if the key exists, otherwise return `-1`. + - `void put(int key, int value)` Update the value of the `key` if the `key` exists. Otherwise, add the `key-value` pair to the cache. If the number of keys exceeds the `capacity` from this operation, **evict** the least recently used key. + + The functions `get` and `put` must each run in `O(1)` average time complexity. + +constraints: | + - `1 <= capacity <= 3000` + - `0 <= key <= 10^4` + - `0 <= value <= 10^5` + - At most `2 * 10^5` calls will be made to `get` and `put`. + +examples: + - input: | + ["LRUCache", "put", "put", "get", "put", "get", "put", "get", "get", "get"] + [[2], [1, 1], [2, 2], [1], [3, 3], [2], [4, 4], [1], [3], [4]] + output: "[null, null, null, 1, null, -1, null, -1, 3, 4]" + explanation: | + LRUCache lRUCache = new LRUCache(2); + lRUCache.put(1, 1); // cache is {1=1} + lRUCache.put(2, 2); // cache is {1=1, 2=2} + lRUCache.get(1); // return 1 + lRUCache.put(3, 3); // LRU key was 2, evicts key 2, cache is {1=1, 3=3} + lRUCache.get(2); // returns -1 (not found) + lRUCache.put(4, 4); // LRU key was 1, evicts key 1, cache is {4=4, 3=3} + lRUCache.get(1); // return -1 (not found) + lRUCache.get(3); // return 3 + lRUCache.get(4); // return 4 + +explanation: + intuition: | + Imagine a stack of plates in a restaurant kitchen. When a plate is used, it goes back on top of the stack. When you need a clean plate, you always grab from the top. The plate at the **bottom** of the stack is the one that hasn't been touched in the longest time — it's the "least recently used". + + An LRU cache works the same way: we need to track which items were accessed most recently, and when we run out of space, we evict the item that hasn't been touched in the longest time. + + The challenge is the **O(1) time requirement** for both `get` and `put`. A simple list would give us O(n) for finding elements. A hash map gives us O(1) lookup but doesn't track order. We need **both** capabilities simultaneously. + + The key insight is to combine two data structures: + - A **hash map** for O(1) key lookups + - A **doubly linked list** for O(1) insertion, deletion, and reordering + + The hash map points directly to nodes in the linked list, so we can find any element in O(1) time. The doubly linked list maintains the access order — most recently used at the head, least recently used at the tail. When we access an element, we can remove it from its current position and move it to the head in O(1) time because we have direct pointers to adjacent nodes. + + approach: | + We solve this using a **Hash Map + Doubly Linked List** combination: + + **Step 1: Define the node structure** + + - Create a `Node` class with `key`, `value`, `prev`, and `next` pointers + - The key is stored in the node so we can remove entries from the hash map during eviction + +   + + **Step 2: Initialise the data structures** + + - `cache`: A hash map mapping keys to their corresponding nodes + - `capacity`: The maximum number of items allowed + - `head` and `tail`: Dummy sentinel nodes that simplify edge case handling + - Connect `head.next = tail` and `tail.prev = head` initially (empty list between sentinels) + +   + + **Step 3: Implement helper methods** + + - `_remove(node)`: Remove a node from its current position in the doubly linked list + - `_add_to_head(node)`: Insert a node right after the head sentinel (marks it as most recently used) + +   + + **Step 4: Implement get(key)** + + - If key not in cache, return `-1` + - Otherwise, move the node to the head (mark as recently used) and return its value + +   + + **Step 5: Implement put(key, value)** + + - If key exists, update its value and move to head + - If key is new: + - Create a new node and add to head + - Add to the hash map + - If over capacity, remove the node before `tail` (the LRU item) and delete from hash map + +   + + Using sentinel nodes eliminates null checks when removing the first/last real node, making the code cleaner and less error-prone. + + common_pitfalls: + - title: Using a List for Access Tracking + description: | + A common first instinct is to use a regular list or array to track access order. However, moving an element to the front of a list requires O(n) time to shift elements. + + With up to `2 * 10^5` operations, O(n) per operation means up to 4 * 10^10 operations total — this will cause **Time Limit Exceeded (TLE)**. + + The doubly linked list with direct node references allows O(1) removal and insertion. + wrong_approach: "Array or singly linked list for order tracking" + correct_approach: "Doubly linked list with hash map for O(1) node access" + + - title: Forgetting to Store Key in Node + description: | + When evicting the LRU item, you need to remove it from both the linked list AND the hash map. If the node doesn't store its key, you can't efficiently find which hash map entry to delete. + + Always store the key in the node so eviction can update the hash map in O(1) time. + wrong_approach: "Node only stores value" + correct_approach: "Node stores both key and value" + + - title: Not Handling the Update Case + description: | + When `put` is called with an existing key, some implementations add a new node without removing the old one. This corrupts the data structure and leads to incorrect eviction behaviour. + + Always check if the key exists first. If it does, update the existing node's value and move it to the head instead of creating a new node. + wrong_approach: "Always create new node on put" + correct_approach: "Check existence first, update if present" + + - title: Edge Cases with Sentinel Nodes + description: | + Without sentinel (dummy) nodes, removing the first or last real node requires special handling of null pointers. This leads to complex, error-prone code. + + Using dummy `head` and `tail` nodes means the first real node is always `head.next` and the last is always `tail.prev`. Removal logic becomes uniform for all nodes. + + key_takeaways: + - "**Combine data structures**: When one structure doesn't meet all requirements, combine two. Hash map + linked list gives O(1) lookup AND O(1) reordering." + - "**Sentinel nodes simplify edge cases**: Dummy head/tail nodes eliminate null checks and special cases for first/last elements." + - "**Store redundant data when needed**: Keeping the key in the node seems redundant but enables O(1) eviction from the hash map." + - "**Classic interview pattern**: This exact combination (hash map + doubly linked list) appears in many cache and ordering problems." + + time_complexity: "O(1) for both `get` and `put`. Hash map lookup is O(1), and doubly linked list insertion/removal is O(1) with direct node references." + space_complexity: "O(capacity). We store at most `capacity` nodes in the linked list and `capacity` entries in the hash map." + +solutions: + - approach_name: Hash Map + Doubly Linked List + is_optimal: true + code: | + class Node: + """Doubly linked list node storing key-value pair.""" + def __init__(self, key: int = 0, value: int = 0): + self.key = key + self.value = value + self.prev: Node | None = None + self.next: Node | None = None + + + class LRUCache: + def __init__(self, capacity: int): + self.capacity = capacity + self.cache: dict[int, Node] = {} # key -> node + + # Sentinel nodes simplify edge cases + self.head = Node() # Most recently used after head + self.tail = Node() # Least recently used before tail + self.head.next = self.tail + self.tail.prev = self.head + + def _remove(self, node: Node) -> None: + """Remove node from its current position in the list.""" + prev_node = node.prev + next_node = node.next + prev_node.next = next_node + next_node.prev = prev_node + + def _add_to_head(self, node: Node) -> None: + """Add node right after head (marks as most recently used).""" + node.prev = self.head + node.next = self.head.next + self.head.next.prev = node + self.head.next = node + + def get(self, key: int) -> int: + if key not in self.cache: + return -1 + + # Move accessed node to head (most recently used) + node = self.cache[key] + self._remove(node) + self._add_to_head(node) + return node.value + + def put(self, key: int, value: int) -> None: + if key in self.cache: + # Update existing node and move to head + node = self.cache[key] + node.value = value + self._remove(node) + self._add_to_head(node) + else: + # Create new node + new_node = Node(key, value) + self.cache[key] = new_node + self._add_to_head(new_node) + + # Evict LRU if over capacity + if len(self.cache) > self.capacity: + lru_node = self.tail.prev # Node before tail is LRU + self._remove(lru_node) + del self.cache[lru_node.key] # Key stored in node! + explanation: | + **Time Complexity:** O(1) for both operations — hash map lookup and linked list manipulation are constant time. + + **Space Complexity:** O(capacity) — storing up to `capacity` nodes plus the hash map entries. + + The hash map provides instant key lookup, while the doubly linked list maintains access order. Sentinel nodes eliminate edge case handling. When evicting, we access the node before `tail` and use its stored key to clean up the hash map. + + - approach_name: OrderedDict (Python Built-in) + is_optimal: true + code: | + from collections import OrderedDict + + + class LRUCache: + """LRU Cache using Python's OrderedDict which maintains insertion order.""" + + def __init__(self, capacity: int): + self.capacity = capacity + # OrderedDict remembers insertion order + self.cache: OrderedDict[int, int] = OrderedDict() + + def get(self, key: int) -> int: + if key not in self.cache: + return -1 + + # Move to end (most recently used) + self.cache.move_to_end(key) + return self.cache[key] + + def put(self, key: int, value: int) -> None: + if key in self.cache: + # Update and move to end + self.cache.move_to_end(key) + + self.cache[key] = value + + # Evict oldest if over capacity + if len(self.cache) > self.capacity: + # popitem(last=False) removes first (oldest) item + self.cache.popitem(last=False) + explanation: | + **Time Complexity:** O(1) for both operations — `OrderedDict` uses a hash map + doubly linked list internally. + + **Space Complexity:** O(capacity) — same as the manual implementation. + + Python's `OrderedDict` is essentially the same data structure we built manually. Using `move_to_end()` marks an item as recently used, and `popitem(last=False)` removes the oldest item. This is the pragmatic choice in a real Python codebase, but understanding the manual implementation is valuable for interviews.