title: Kth Largest Element in an Array slug: kth-largest-element-in-an-array difficulty: medium leetcode_id: 215 leetcode_url: https://leetcode.com/problems/kth-largest-element-in-an-array/ categories: - arrays - sorting - heap patterns: - slug: heap is_optimal: true - slug: binary-search is_optimal: false function_signature: "def find_kth_largest(nums: list[int], k: int) -> int:" test_cases: visible: - input: { nums: [3, 2, 1, 5, 6, 4], k: 2 } expected: 5 - input: { nums: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4 } expected: 4 - input: { nums: [1], k: 1 } expected: 1 hidden: - input: { nums: [7, 7, 7, 7], k: 2 } expected: 7 - input: { nums: [1, 2, 3, 4, 5], k: 5 } expected: 1 - input: { nums: [1, 2, 3, 4, 5], k: 1 } expected: 5 - input: { nums: [-1, -2, -3, -4], k: 1 } expected: -1 - input: { nums: [5, 2, 4, 1, 3, 6, 0], k: 3 } expected: 4 - input: { nums: [99, 99], k: 1 } expected: 99 description: | Given an integer array `nums` and an integer `k`, return *the* `k`^th *largest element in the array*. Note that it is the `k`^th largest element in the sorted order, not the `k`^th distinct element. Can you solve it without sorting? constraints: | - `1 <= k <= nums.length <= 10^5` - `-10^4 <= nums[i] <= 10^4` examples: - input: "nums = [3,2,1,5,6,4], k = 2" output: "5" explanation: "The sorted array is [1,2,3,4,5,6]. The 2nd largest element is 5." - input: "nums = [3,2,3,1,2,4,5,5,6], k = 4" output: "4" explanation: "The sorted array is [1,2,2,3,3,4,5,5,6]. The 4th largest element is 4." explanation: intuition: | Imagine you have a collection of exam scores and you want to find the student who ranked `k`^th from the top. The most straightforward approach would be to sort all scores and pick the `k`^th one from the end — but can we do better? Think of it like this: if you only need to find *one* specific ranking, do you really need to sort *everything*? This is similar to finding the tallest person in a room versus sorting everyone by height — the first task is much simpler. The key insight is that we don't need a fully sorted array. We only need to find the element that would be at position `n - k` if the array were sorted (0-indexed). This opens the door to more efficient approaches: 1. **Heap approach**: Maintain a "top k" collection using a min-heap of size `k`. Any element smaller than our current `k`^th largest can be discarded. 2. **Quickselect approach**: Use the partitioning logic from quicksort, but only recurse into the half that contains our target position. Both avoid the full `O(n log n)` cost of sorting when we only need partial ordering. approach: | We'll focus on the **Min-Heap approach** as the primary solution due to its consistent performance and clarity: **Step 1: Understand the heap strategy** - We maintain a min-heap of size `k` - The min-heap always contains the `k` largest elements seen so far - The root of the heap (minimum of these `k` elements) is our answer **Step 2: Initialise the heap** - Create an empty min-heap - We'll use Python's `heapq` which implements a min-heap **Step 3: Process each element** - For each number in the array: - If the heap has fewer than `k` elements, push the number - Otherwise, if the number is larger than the heap's minimum (root), replace the root with this number - This ensures we always keep the `k` largest elements **Step 4: Return the result** - The root of the heap is the `k`^th largest element - Return `heap[0]` **Why this works**: By keeping exactly `k` elements and always removing the smallest when we exceed capacity, we guarantee that the smallest element in our heap is larger than all discarded elements — making it exactly the `k`^th largest overall. common_pitfalls: - title: Off-by-One with Heap Size description: | A common mistake is confusion about when to push vs. replace in the heap. If you always push and then pop when size exceeds `k`, you might accidentally pop the element you just added if it's the smallest. The correct approach is to check if the new element is larger than the heap's minimum *before* deciding to add it. Alternatively, you can push unconditionally and pop if size exceeds `k` — this is simpler and works correctly, though slightly less efficient. wrong_approach: "Complex conditional logic that's easy to get wrong" correct_approach: "Push then pop if size > k, or use heappushpop for efficiency" - title: Using Max-Heap Incorrectly description: | Some attempt to use a max-heap of the entire array and pop `k-1` times. While correct, this is inefficient: - Building a max-heap: `O(n)` - Popping `k` times: `O(k log n)` - Total: `O(n + k log n)` With a min-heap of size `k`, we get `O(n log k)`, which is better when `k` is small relative to `n`. wrong_approach: "Max-heap of all elements, pop k-1 times" correct_approach: "Min-heap of size k, maintaining the k largest" - title: Forgetting Python's heapq is Min-Heap Only description: | Python's `heapq` only provides a min-heap. To simulate a max-heap, you must negate values when pushing and negate again when popping. For this problem, a min-heap is actually what we want — we keep the `k` largest elements by discarding elements smaller than our current `k`^th largest. wrong_approach: "Assuming heapq has a max-heap option" correct_approach: "Use min-heap directly for finding kth largest" key_takeaways: - "**Partial ordering insight**: When you only need one specific rank, you don't need to sort everything — use a heap or quickselect instead" - "**Min-heap for top-k**: A min-heap of size `k` naturally maintains the `k` largest elements, with the `k`^th largest at the root" - "**Trade-off awareness**: Heap gives `O(n log k)` guaranteed; Quickselect gives `O(n)` average but `O(n^2)` worst case" - "**Foundation pattern**: This technique applies to streaming data, top-k frequent elements, and many ranking problems" time_complexity: "O(n log k). We iterate through all `n` elements, and each heap operation (push/pop) takes `O(log k)` time since the heap size is bounded by `k`." space_complexity: "O(k). We maintain a heap containing at most `k` elements." pattern_comparison: | **Heap vs Quickselect: Choosing the Right Pattern** Both approaches avoid full sorting, but they have different characteristics: | Approach | Time (Avg) | Time (Worst) | Space | Modifies Input? | |----------|------------|--------------|-------|-----------------| | **Min Heap** | O(n log k) | O(n log k) | O(k) | No | | **Quickselect** | O(n) | O(n²) | O(log n) | Yes | | **Sorting** | O(n log n) | O(n log n) | O(1)-O(n) | Yes | **When to choose Heap:** - You need **guaranteed** performance (no worst-case quadratic time) - The input array shouldn't be modified - `k` is small relative to `n` (the O(log k) factor stays small) - You're working with streaming data **When to choose Quickselect:** - You need the **fastest average** performance - Modifying the input array is acceptable - You're comfortable with randomised algorithms - Space is at a premium (O(log n) recursion vs O(k) heap) **Interview tip:** Start with Heap for its simplicity and guaranteed bounds, then mention Quickselect as an optimisation if the interviewer asks about O(n) solutions. solutions: - approach_name: Min-Heap is_optimal: true code: | import heapq def find_kth_largest(nums: list[int], k: int) -> int: # Min-heap to store the k largest elements heap = [] for num in nums: # Add current number to heap heapq.heappush(heap, num) # If heap exceeds size k, remove the smallest # This ensures we keep only the k largest elements if len(heap) > k: heapq.heappop(heap) # The root of min-heap is the kth largest return heap[0] explanation: | **Time Complexity:** O(n log k) — We process each of `n` elements with heap operations costing `O(log k)`. **Space Complexity:** O(k) — The heap stores at most `k` elements. This approach maintains a min-heap of the `k` largest elements seen so far. By keeping the heap size at `k` and using a min-heap, the smallest element in our collection (the root) is always the `k`^th largest overall. - approach_name: Quickselect is_optimal: true code: | import random def find_kth_largest(nums: list[int], k: int) -> int: # Convert kth largest to index in sorted array # kth largest = element at index (n - k) in ascending order target_index = len(nums) - k def quickselect(left: int, right: int) -> int: # Random pivot to avoid worst-case on sorted input pivot_idx = random.randint(left, right) pivot = nums[pivot_idx] # Move pivot to end nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx] # Partition: elements < pivot go to the left store_idx = left for i in range(left, right): if nums[i] < pivot: nums[store_idx], nums[i] = nums[i], nums[store_idx] store_idx += 1 # Move pivot to its final sorted position nums[store_idx], nums[right] = nums[right], nums[store_idx] # Check if we found the target if store_idx == target_index: return nums[store_idx] elif store_idx < target_index: # Target is in the right partition return quickselect(store_idx + 1, right) else: # Target is in the left partition return quickselect(left, store_idx - 1) return quickselect(0, len(nums) - 1) explanation: | **Time Complexity:** O(n) average, O(n^2) worst case — Average case is linear because we only recurse into one half. Random pivot selection makes worst case very unlikely. **Space Complexity:** O(log n) average for recursion stack, O(n) worst case. Quickselect uses the partitioning logic from quicksort but only recurses into the partition containing our target index. This reduces the expected work from `O(n log n)` to `O(n)`. - approach_name: Sorting is_optimal: false code: | def find_kth_largest(nums: list[int], k: int) -> int: # Sort in descending order nums.sort(reverse=True) # Return the kth element (0-indexed, so k-1) return nums[k - 1] explanation: | **Time Complexity:** O(n log n) — Dominated by the sorting step. **Space Complexity:** O(1) to O(n) — Depends on the sorting algorithm used (in-place vs. not). The simplest approach: sort and index. While not optimal for this specific problem, it's worth knowing as a baseline. For small arrays or when `k` is close to `n`, the practical difference may be negligible.