questions F-L

2025-05-25 11:47:04 +01:00
parent 798e0ba1df
commit 5dbe52df0d
54 changed files with 11235 additions and 0 deletions
--- a/backend/data/questions/kth-largest-element-in-an-array.yaml
+++ b/backend/data/questions/kth-largest-element-in-an-array.yaml
@@ -0,0 +1,211 @@
+title: Kth Largest Element in an Array
+slug: kth-largest-element-in-an-array
+difficulty: medium
+leetcode_id: 215
+leetcode_url: https://leetcode.com/problems/kth-largest-element-in-an-array/
+categories:
+  - arrays
+  - sorting
+  - heap
+patterns:
+  - heap
+  - binary-search
+
+description: |
+  Given an integer array `nums` and an integer `k`, return *the* `k`<sup>th</sup> *largest element in the array*.
+
+  Note that it is the `k`<sup>th</sup> largest element in the sorted order, not the `k`<sup>th</sup> distinct element.
+
+  Can you solve it without sorting?
+
+constraints: |
+  - `1 <= k <= nums.length <= 10^5`
+  - `-10^4 <= nums[i] <= 10^4`
+
+examples:
+  - input: "nums = [3,2,1,5,6,4], k = 2"
+    output: "5"
+    explanation: "The sorted array is [1,2,3,4,5,6]. The 2nd largest element is 5."
+  - input: "nums = [3,2,3,1,2,4,5,5,6], k = 4"
+    output: "4"
+    explanation: "The sorted array is [1,2,2,3,3,4,5,5,6]. The 4th largest element is 4."
+
+explanation:
+  intuition: |
+    Imagine you have a collection of exam scores and you want to find the student who ranked `k`<sup>th</sup> from the top. The most straightforward approach would be to sort all scores and pick the `k`<sup>th</sup> one from the end — but can we do better?
+
+    Think of it like this: if you only need to find *one* specific ranking, do you really need to sort *everything*? This is similar to finding the tallest person in a room versus sorting everyone by height — the first task is much simpler.
+
+    The key insight is that we don't need a fully sorted array. We only need to find the element that would be at position `n - k` if the array were sorted (0-indexed). This opens the door to more efficient approaches:
+
+    1. **Heap approach**: Maintain a "top k" collection using a min-heap of size `k`. Any element smaller than our current `k`<sup>th</sup> largest can be discarded.
+
+    2. **Quickselect approach**: Use the partitioning logic from quicksort, but only recurse into the half that contains our target position.
+
+    Both avoid the full `O(n log n)` cost of sorting when we only need partial ordering.
+
+  approach: |
+    We'll focus on the **Min-Heap approach** as the primary solution due to its consistent performance and clarity:
+
+    **Step 1: Understand the heap strategy**
+
+    - We maintain a min-heap of size `k`
+    - The min-heap always contains the `k` largest elements seen so far
+    - The root of the heap (minimum of these `k` elements) is our answer
+
+    &nbsp;
+
+    **Step 2: Initialise the heap**
+
+    - Create an empty min-heap
+    - We'll use Python's `heapq` which implements a min-heap
+
+    &nbsp;
+
+    **Step 3: Process each element**
+
+    - For each number in the array:
+      - If the heap has fewer than `k` elements, push the number
+      - Otherwise, if the number is larger than the heap's minimum (root), replace the root with this number
+    - This ensures we always keep the `k` largest elements
+
+    &nbsp;
+
+    **Step 4: Return the result**
+
+    - The root of the heap is the `k`<sup>th</sup> largest element
+    - Return `heap[0]`
+
+    &nbsp;
+
+    **Why this works**: By keeping exactly `k` elements and always removing the smallest when we exceed capacity, we guarantee that the smallest element in our heap is larger than all discarded elements — making it exactly the `k`<sup>th</sup> largest overall.
+
+  common_pitfalls:
+    - title: Off-by-One with Heap Size
+      description: |
+        A common mistake is confusion about when to push vs. replace in the heap.
+
+        If you always push and then pop when size exceeds `k`, you might accidentally pop the element you just added if it's the smallest. The correct approach is to check if the new element is larger than the heap's minimum *before* deciding to add it.
+
+        Alternatively, you can push unconditionally and pop if size exceeds `k` — this is simpler and works correctly, though slightly less efficient.
+      wrong_approach: "Complex conditional logic that's easy to get wrong"
+      correct_approach: "Push then pop if size > k, or use heappushpop for efficiency"
+
+    - title: Using Max-Heap Incorrectly
+      description: |
+        Some attempt to use a max-heap of the entire array and pop `k-1` times. While correct, this is inefficient:
+
+        - Building a max-heap: `O(n)`
+        - Popping `k` times: `O(k log n)`
+        - Total: `O(n + k log n)`
+
+        With a min-heap of size `k`, we get `O(n log k)`, which is better when `k` is small relative to `n`.
+      wrong_approach: "Max-heap of all elements, pop k-1 times"
+      correct_approach: "Min-heap of size k, maintaining the k largest"
+
+    - title: Forgetting Python's heapq is Min-Heap Only
+      description: |
+        Python's `heapq` only provides a min-heap. To simulate a max-heap, you must negate values when pushing and negate again when popping.
+
+        For this problem, a min-heap is actually what we want — we keep the `k` largest elements by discarding elements smaller than our current `k`<sup>th</sup> largest.
+      wrong_approach: "Assuming heapq has a max-heap option"
+      correct_approach: "Use min-heap directly for finding kth largest"
+
+  key_takeaways:
+    - "**Partial ordering insight**: When you only need one specific rank, you don't need to sort everything — use a heap or quickselect instead"
+    - "**Min-heap for top-k**: A min-heap of size `k` naturally maintains the `k` largest elements, with the `k`<sup>th</sup> largest at the root"
+    - "**Trade-off awareness**: Heap gives `O(n log k)` guaranteed; Quickselect gives `O(n)` average but `O(n^2)` worst case"
+    - "**Foundation pattern**: This technique applies to streaming data, top-k frequent elements, and many ranking problems"
+
+  time_complexity: "O(n log k). We iterate through all `n` elements, and each heap operation (push/pop) takes `O(log k)` time since the heap size is bounded by `k`."
+  space_complexity: "O(k). We maintain a heap containing at most `k` elements."
+
+solutions:
+  - approach_name: Min-Heap
+    is_optimal: true
+    code: |
+      import heapq
+
+      def find_kth_largest(nums: list[int], k: int) -> int:
+          # Min-heap to store the k largest elements
+          heap = []
+
+          for num in nums:
+              # Add current number to heap
+              heapq.heappush(heap, num)
+
+              # If heap exceeds size k, remove the smallest
+              # This ensures we keep only the k largest elements
+              if len(heap) > k:
+                  heapq.heappop(heap)
+
+          # The root of min-heap is the kth largest
+          return heap[0]
+    explanation: |
+      **Time Complexity:** O(n log k) — We process each of `n` elements with heap operations costing `O(log k)`.
+
+      **Space Complexity:** O(k) — The heap stores at most `k` elements.
+
+      This approach maintains a min-heap of the `k` largest elements seen so far. By keeping the heap size at `k` and using a min-heap, the smallest element in our collection (the root) is always the `k`<sup>th</sup> largest overall.
+
+  - approach_name: Quickselect
+    is_optimal: true
+    code: |
+      import random
+
+      def find_kth_largest(nums: list[int], k: int) -> int:
+          # Convert kth largest to index in sorted array
+          # kth largest = element at index (n - k) in ascending order
+          target_index = len(nums) - k
+
+          def quickselect(left: int, right: int) -> int:
+              # Random pivot to avoid worst-case on sorted input
+              pivot_idx = random.randint(left, right)
+              pivot = nums[pivot_idx]
+
+              # Move pivot to end
+              nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx]
+
+              # Partition: elements < pivot go to the left
+              store_idx = left
+              for i in range(left, right):
+                  if nums[i] < pivot:
+                      nums[store_idx], nums[i] = nums[i], nums[store_idx]
+                      store_idx += 1
+
+              # Move pivot to its final sorted position
+              nums[store_idx], nums[right] = nums[right], nums[store_idx]
+
+              # Check if we found the target
+              if store_idx == target_index:
+                  return nums[store_idx]
+              elif store_idx < target_index:
+                  # Target is in the right partition
+                  return quickselect(store_idx + 1, right)
+              else:
+                  # Target is in the left partition
+                  return quickselect(left, store_idx - 1)
+
+          return quickselect(0, len(nums) - 1)
+    explanation: |
+      **Time Complexity:** O(n) average, O(n^2) worst case — Average case is linear because we only recurse into one half. Random pivot selection makes worst case very unlikely.
+
+      **Space Complexity:** O(log n) average for recursion stack, O(n) worst case.
+
+      Quickselect uses the partitioning logic from quicksort but only recurses into the partition containing our target index. This reduces the expected work from `O(n log n)` to `O(n)`.
+
+  - approach_name: Sorting
+    is_optimal: false
+    code: |
+      def find_kth_largest(nums: list[int], k: int) -> int:
+          # Sort in descending order
+          nums.sort(reverse=True)
+
+          # Return the kth element (0-indexed, so k-1)
+          return nums[k - 1]
+    explanation: |
+      **Time Complexity:** O(n log n) — Dominated by the sorting step.
+
+      **Space Complexity:** O(1) to O(n) — Depends on the sorting algorithm used (in-place vs. not).
+
+      The simplest approach: sort and index. While not optimal for this specific problem, it's worth knowing as a baseline. For small arrays or when `k` is close to `n`, the practical difference may be negligible.