263 lines
11 KiB
YAML
263 lines
11 KiB
YAML
title: Kth Largest Element in an Array
|
|
slug: kth-largest-element-in-an-array
|
|
difficulty: medium
|
|
leetcode_id: 215
|
|
leetcode_url: https://leetcode.com/problems/kth-largest-element-in-an-array/
|
|
categories:
|
|
- arrays
|
|
- sorting
|
|
- heap
|
|
patterns:
|
|
- slug: heap
|
|
is_optimal: true
|
|
- slug: binary-search
|
|
is_optimal: false
|
|
|
|
function_signature: "def find_kth_largest(nums: list[int], k: int) -> int:"
|
|
|
|
test_cases:
|
|
visible:
|
|
- input: { nums: [3, 2, 1, 5, 6, 4], k: 2 }
|
|
expected: 5
|
|
- input: { nums: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4 }
|
|
expected: 4
|
|
- input: { nums: [1], k: 1 }
|
|
expected: 1
|
|
hidden:
|
|
- input: { nums: [7, 7, 7, 7], k: 2 }
|
|
expected: 7
|
|
- input: { nums: [1, 2, 3, 4, 5], k: 5 }
|
|
expected: 1
|
|
- input: { nums: [1, 2, 3, 4, 5], k: 1 }
|
|
expected: 5
|
|
- input: { nums: [-1, -2, -3, -4], k: 1 }
|
|
expected: -1
|
|
- input: { nums: [5, 2, 4, 1, 3, 6, 0], k: 3 }
|
|
expected: 4
|
|
- input: { nums: [99, 99], k: 1 }
|
|
expected: 99
|
|
|
|
description: |
|
|
Given an integer array `nums` and an integer `k`, return *the* `k`<sup>th</sup> *largest element in the array*.
|
|
|
|
Note that it is the `k`<sup>th</sup> largest element in the sorted order, not the `k`<sup>th</sup> distinct element.
|
|
|
|
Can you solve it without sorting?
|
|
|
|
constraints: |
|
|
- `1 <= k <= nums.length <= 10^5`
|
|
- `-10^4 <= nums[i] <= 10^4`
|
|
|
|
examples:
|
|
- input: "nums = [3,2,1,5,6,4], k = 2"
|
|
output: "5"
|
|
explanation: "The sorted array is [1,2,3,4,5,6]. The 2nd largest element is 5."
|
|
- input: "nums = [3,2,3,1,2,4,5,5,6], k = 4"
|
|
output: "4"
|
|
explanation: "The sorted array is [1,2,2,3,3,4,5,5,6]. The 4th largest element is 4."
|
|
|
|
explanation:
|
|
intuition: |
|
|
Imagine you have a collection of exam scores and you want to find the student who ranked `k`<sup>th</sup> from the top. The most straightforward approach would be to sort all scores and pick the `k`<sup>th</sup> one from the end — but can we do better?
|
|
|
|
Think of it like this: if you only need to find *one* specific ranking, do you really need to sort *everything*? This is similar to finding the tallest person in a room versus sorting everyone by height — the first task is much simpler.
|
|
|
|
The key insight is that we don't need a fully sorted array. We only need to find the element that would be at position `n - k` if the array were sorted (0-indexed). This opens the door to more efficient approaches:
|
|
|
|
1. **Heap approach**: Maintain a "top k" collection using a min-heap of size `k`. Any element smaller than our current `k`<sup>th</sup> largest can be discarded.
|
|
|
|
2. **Quickselect approach**: Use the partitioning logic from quicksort, but only recurse into the half that contains our target position.
|
|
|
|
Both avoid the full `O(n log n)` cost of sorting when we only need partial ordering.
|
|
|
|
approach: |
|
|
We'll focus on the **Min-Heap approach** as the primary solution due to its consistent performance and clarity:
|
|
|
|
**Step 1: Understand the heap strategy**
|
|
|
|
- We maintain a min-heap of size `k`
|
|
- The min-heap always contains the `k` largest elements seen so far
|
|
- The root of the heap (minimum of these `k` elements) is our answer
|
|
|
|
|
|
|
|
**Step 2: Initialise the heap**
|
|
|
|
- Create an empty min-heap
|
|
- We'll use Python's `heapq` which implements a min-heap
|
|
|
|
|
|
|
|
**Step 3: Process each element**
|
|
|
|
- For each number in the array:
|
|
- If the heap has fewer than `k` elements, push the number
|
|
- Otherwise, if the number is larger than the heap's minimum (root), replace the root with this number
|
|
- This ensures we always keep the `k` largest elements
|
|
|
|
|
|
|
|
**Step 4: Return the result**
|
|
|
|
- The root of the heap is the `k`<sup>th</sup> largest element
|
|
- Return `heap[0]`
|
|
|
|
|
|
|
|
**Why this works**: By keeping exactly `k` elements and always removing the smallest when we exceed capacity, we guarantee that the smallest element in our heap is larger than all discarded elements — making it exactly the `k`<sup>th</sup> largest overall.
|
|
|
|
common_pitfalls:
|
|
- title: Off-by-One with Heap Size
|
|
description: |
|
|
A common mistake is confusion about when to push vs. replace in the heap.
|
|
|
|
If you always push and then pop when size exceeds `k`, you might accidentally pop the element you just added if it's the smallest. The correct approach is to check if the new element is larger than the heap's minimum *before* deciding to add it.
|
|
|
|
Alternatively, you can push unconditionally and pop if size exceeds `k` — this is simpler and works correctly, though slightly less efficient.
|
|
wrong_approach: "Complex conditional logic that's easy to get wrong"
|
|
correct_approach: "Push then pop if size > k, or use heappushpop for efficiency"
|
|
|
|
- title: Using Max-Heap Incorrectly
|
|
description: |
|
|
Some attempt to use a max-heap of the entire array and pop `k-1` times. While correct, this is inefficient:
|
|
|
|
- Building a max-heap: `O(n)`
|
|
- Popping `k` times: `O(k log n)`
|
|
- Total: `O(n + k log n)`
|
|
|
|
With a min-heap of size `k`, we get `O(n log k)`, which is better when `k` is small relative to `n`.
|
|
wrong_approach: "Max-heap of all elements, pop k-1 times"
|
|
correct_approach: "Min-heap of size k, maintaining the k largest"
|
|
|
|
- title: Forgetting Python's heapq is Min-Heap Only
|
|
description: |
|
|
Python's `heapq` only provides a min-heap. To simulate a max-heap, you must negate values when pushing and negate again when popping.
|
|
|
|
For this problem, a min-heap is actually what we want — we keep the `k` largest elements by discarding elements smaller than our current `k`<sup>th</sup> largest.
|
|
wrong_approach: "Assuming heapq has a max-heap option"
|
|
correct_approach: "Use min-heap directly for finding kth largest"
|
|
|
|
key_takeaways:
|
|
- "**Partial ordering insight**: When you only need one specific rank, you don't need to sort everything — use a heap or quickselect instead"
|
|
- "**Min-heap for top-k**: A min-heap of size `k` naturally maintains the `k` largest elements, with the `k`<sup>th</sup> largest at the root"
|
|
- "**Trade-off awareness**: Heap gives `O(n log k)` guaranteed; Quickselect gives `O(n)` average but `O(n^2)` worst case"
|
|
- "**Foundation pattern**: This technique applies to streaming data, top-k frequent elements, and many ranking problems"
|
|
|
|
time_complexity: "O(n log k). We iterate through all `n` elements, and each heap operation (push/pop) takes `O(log k)` time since the heap size is bounded by `k`."
|
|
space_complexity: "O(k). We maintain a heap containing at most `k` elements."
|
|
|
|
pattern_comparison: |
|
|
**Heap vs Quickselect: Choosing the Right Pattern**
|
|
|
|
Both approaches avoid full sorting, but they have different characteristics:
|
|
|
|
| Approach | Time (Avg) | Time (Worst) | Space | Modifies Input? |
|
|
|----------|------------|--------------|-------|-----------------|
|
|
| **Min Heap** | O(n log k) | O(n log k) | O(k) | No |
|
|
| **Quickselect** | O(n) | O(n²) | O(log n) | Yes |
|
|
| **Sorting** | O(n log n) | O(n log n) | O(1)-O(n) | Yes |
|
|
|
|
**When to choose Heap:**
|
|
- You need **guaranteed** performance (no worst-case quadratic time)
|
|
- The input array shouldn't be modified
|
|
- `k` is small relative to `n` (the O(log k) factor stays small)
|
|
- You're working with streaming data
|
|
|
|
**When to choose Quickselect:**
|
|
- You need the **fastest average** performance
|
|
- Modifying the input array is acceptable
|
|
- You're comfortable with randomised algorithms
|
|
- Space is at a premium (O(log n) recursion vs O(k) heap)
|
|
|
|
**Interview tip:** Start with Heap for its simplicity and guaranteed bounds, then mention Quickselect as an optimisation if the interviewer asks about O(n) solutions.
|
|
|
|
solutions:
|
|
- approach_name: Min-Heap
|
|
is_optimal: true
|
|
code: |
|
|
import heapq
|
|
|
|
def find_kth_largest(nums: list[int], k: int) -> int:
|
|
# Min-heap to store the k largest elements
|
|
heap = []
|
|
|
|
for num in nums:
|
|
# Add current number to heap
|
|
heapq.heappush(heap, num)
|
|
|
|
# If heap exceeds size k, remove the smallest
|
|
# This ensures we keep only the k largest elements
|
|
if len(heap) > k:
|
|
heapq.heappop(heap)
|
|
|
|
# The root of min-heap is the kth largest
|
|
return heap[0]
|
|
explanation: |
|
|
**Time Complexity:** O(n log k) — We process each of `n` elements with heap operations costing `O(log k)`.
|
|
|
|
**Space Complexity:** O(k) — The heap stores at most `k` elements.
|
|
|
|
This approach maintains a min-heap of the `k` largest elements seen so far. By keeping the heap size at `k` and using a min-heap, the smallest element in our collection (the root) is always the `k`<sup>th</sup> largest overall.
|
|
|
|
- approach_name: Quickselect
|
|
is_optimal: true
|
|
code: |
|
|
import random
|
|
|
|
def find_kth_largest(nums: list[int], k: int) -> int:
|
|
# Convert kth largest to index in sorted array
|
|
# kth largest = element at index (n - k) in ascending order
|
|
target_index = len(nums) - k
|
|
|
|
def quickselect(left: int, right: int) -> int:
|
|
# Random pivot to avoid worst-case on sorted input
|
|
pivot_idx = random.randint(left, right)
|
|
pivot = nums[pivot_idx]
|
|
|
|
# Move pivot to end
|
|
nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx]
|
|
|
|
# Partition: elements < pivot go to the left
|
|
store_idx = left
|
|
for i in range(left, right):
|
|
if nums[i] < pivot:
|
|
nums[store_idx], nums[i] = nums[i], nums[store_idx]
|
|
store_idx += 1
|
|
|
|
# Move pivot to its final sorted position
|
|
nums[store_idx], nums[right] = nums[right], nums[store_idx]
|
|
|
|
# Check if we found the target
|
|
if store_idx == target_index:
|
|
return nums[store_idx]
|
|
elif store_idx < target_index:
|
|
# Target is in the right partition
|
|
return quickselect(store_idx + 1, right)
|
|
else:
|
|
# Target is in the left partition
|
|
return quickselect(left, store_idx - 1)
|
|
|
|
return quickselect(0, len(nums) - 1)
|
|
explanation: |
|
|
**Time Complexity:** O(n) average, O(n^2) worst case — Average case is linear because we only recurse into one half. Random pivot selection makes worst case very unlikely.
|
|
|
|
**Space Complexity:** O(log n) average for recursion stack, O(n) worst case.
|
|
|
|
Quickselect uses the partitioning logic from quicksort but only recurses into the partition containing our target index. This reduces the expected work from `O(n log n)` to `O(n)`.
|
|
|
|
- approach_name: Sorting
|
|
is_optimal: false
|
|
code: |
|
|
def find_kth_largest(nums: list[int], k: int) -> int:
|
|
# Sort in descending order
|
|
nums.sort(reverse=True)
|
|
|
|
# Return the kth element (0-indexed, so k-1)
|
|
return nums[k - 1]
|
|
explanation: |
|
|
**Time Complexity:** O(n log n) — Dominated by the sorting step.
|
|
|
|
**Space Complexity:** O(1) to O(n) — Depends on the sorting algorithm used (in-place vs. not).
|
|
|
|
The simplest approach: sort and index. While not optimal for this specific problem, it's worth knowing as a baseline. For small arrays or when `k` is close to `n`, the practical difference may be negligible.
|