Files
codetutor/backend/data/questions/kth-largest-element-in-a-stream.yaml

207 lines
10 KiB
YAML

title: Kth Largest Element in a Stream
slug: kth-largest-element-in-a-stream
difficulty: easy
leetcode_id: 703
leetcode_url: https://leetcode.com/problems/kth-largest-element-in-a-stream/
categories:
- heap
- arrays
patterns:
- heap
function_signature: "class KthLargest:\n def __init__(self, k: int, nums: list[int]):\n def add(self, val: int) -> int:"
test_cases:
visible:
- input: { operations: ["KthLargest", "add", "add", "add", "add", "add"], arguments: [[3, [4, 5, 8, 2]], [3], [5], [10], [9], [4]] }
expected: [null, 4, 5, 5, 8, 8]
- input: { operations: ["KthLargest", "add", "add", "add", "add"], arguments: [[4, [7, 7, 7, 7, 8, 3]], [2], [10], [9], [9]] }
expected: [null, 7, 7, 7, 8]
hidden:
- input: { operations: ["KthLargest", "add"], arguments: [[1, []], [-3]] }
expected: [null, -3]
- input: { operations: ["KthLargest", "add", "add"], arguments: [[1, [1, 2, 3]], [4], [5]] }
expected: [null, 4, 5]
- input: { operations: ["KthLargest", "add", "add", "add"], arguments: [[2, [0]], [1], [2], [3]] }
expected: [null, 0, 1, 2]
- input: { operations: ["KthLargest", "add", "add"], arguments: [[3, [5, 5, 5]], [5], [5]] }
expected: [null, 5, 5]
- input: { operations: ["KthLargest", "add"], arguments: [[1, [-10000]], [10000]] }
expected: [null, 10000]
description: |
Design a class to find the `k`<sup>th</sup> largest element in a stream.
Note that it is the `k`<sup>th</sup> largest element in the sorted order, not the `k`<sup>th</sup> distinct element.
Implement the `KthLargest` class:
- `KthLargest(int k, int[] nums)` — Initializes the object with the integer `k` and the stream of test scores `nums`.
- `int add(int val)` — Adds a new test score `val` to the stream and returns the element representing the `k`<sup>th</sup> largest element in the pool of test scores so far.
constraints: |
- `0 <= nums.length <= 10^4`
- `1 <= k <= nums.length + 1`
- `-10^4 <= nums[i] <= 10^4`
- `-10^4 <= val <= 10^4`
- At most `10^4` calls will be made to `add`
examples:
- input: |
["KthLargest", "add", "add", "add", "add", "add"]
[[3, [4, 5, 8, 2]], [3], [5], [10], [9], [4]]
output: "[null, 4, 5, 5, 8, 8]"
explanation: |
KthLargest kthLargest = new KthLargest(3, [4, 5, 8, 2]);
kthLargest.add(3); // return 4 (stream: [2,3,4,5,8], 3rd largest = 4)
kthLargest.add(5); // return 5 (stream: [2,3,4,5,5,8], 3rd largest = 5)
kthLargest.add(10); // return 5 (stream: [2,3,4,5,5,8,10], 3rd largest = 5)
kthLargest.add(9); // return 8 (stream: [2,3,4,5,5,8,9,10], 3rd largest = 8)
kthLargest.add(4); // return 8 (stream: [2,3,4,4,5,5,8,9,10], 3rd largest = 8)
- input: |
["KthLargest", "add", "add", "add", "add"]
[[4, [7, 7, 7, 7, 8, 3]], [2], [10], [9], [9]]
output: "[null, 7, 7, 7, 8]"
explanation: |
KthLargest kthLargest = new KthLargest(4, [7, 7, 7, 7, 8, 3]);
kthLargest.add(2); // return 7 (4th largest = 7)
kthLargest.add(10); // return 7 (4th largest = 7)
kthLargest.add(9); // return 7 (4th largest = 7)
kthLargest.add(9); // return 8 (4th largest = 8)
explanation:
intuition: |
Imagine you're running a leaderboard for the top `k` players in a game. You don't need to track *everyone* — just the top `k`. When a new player joins:
- If they're not good enough to crack the top `k`, you ignore them
- If they are, they bump out the current `k`<sup>th</sup> place player
The key insight is: **the `k`<sup>th</sup> largest element is always the smallest element in the top `k` group**. If we maintain exactly `k` elements (the `k` largest seen so far), the minimum of this group is our answer.
A **min-heap** of size `k` is perfect for this. The heap property guarantees the smallest element sits at the top. After each insertion, if our heap grows beyond `k` elements, we pop the smallest — ensuring we always keep exactly the `k` largest values, with the `k`<sup>th</sup> largest conveniently sitting at the heap's root.
Think of it like a bouncer at an exclusive club: the venue only holds `k` people. When someone new arrives, if they're more important than the least important person inside, they swap places. The bouncer (heap root) always knows who's on the bubble.
approach: |
We use a **Min-Heap of size k** to solve this efficiently:
**Step 1: Initialise the heap**
- Create an empty min-heap
- Add all elements from the initial `nums` array to the heap
- After adding each element, if heap size exceeds `k`, pop the minimum
&nbsp;
**Step 2: Implement the add operation**
- Push the new value onto the heap
- If heap size exceeds `k`, pop the minimum (it's no longer in the top `k`)
- Return the heap's minimum — this is the `k`<sup>th</sup> largest
&nbsp;
**Why this works:**
- The heap always contains exactly the `k` largest elements seen so far
- The min-heap property ensures the smallest of these (the `k`<sup>th</sup> largest overall) is at the root
- We only pop elements smaller than the `k`<sup>th</sup> largest, preserving correctness
common_pitfalls:
- title: Using a Max-Heap Instead of Min-Heap
description: |
A max-heap gives you the *largest* element at the root, but we need the `k`<sup>th</sup> largest. With a max-heap of size `k`, you'd have to traverse to find the minimum.
The trick is counter-intuitive: use a **min-heap** of size `k`. The root gives you the minimum of the `k` largest elements — which is exactly the `k`<sup>th</sup> largest overall.
wrong_approach: "Max-heap of size k"
correct_approach: "Min-heap of size k"
- title: Keeping All Elements
description: |
Storing all `n` elements and sorting to find the `k`<sup>th</sup> largest gives O(n log n) per query. With up to `10^4` calls to `add`, this becomes too slow.
By maintaining only `k` elements in the heap, each `add` operation is O(log k), which is much faster when `k << n`.
wrong_approach: "Sort all elements on each query"
correct_approach: "Maintain a fixed-size heap of k elements"
- title: Forgetting to Handle Initial Array
description: |
The constructor receives an initial array `nums` that may have more than `k` elements. You must process these through the heap first, trimming down to size `k` before any `add` calls.
If you skip this step, your heap won't be properly initialised and the first few `add` calls will return wrong results.
wrong_approach: "Ignore nums in constructor"
correct_approach: "Heapify nums and trim to size k in constructor"
key_takeaways:
- "**Min-heap for k largest**: A min-heap of size `k` efficiently tracks the `k`<sup>th</sup> largest element — it's the heap's root"
- "**Bounded heap pattern**: Maintain a fixed-size heap by popping after each push when size exceeds `k`"
- "**O(log k) vs O(log n)**: Limiting heap size to `k` gives faster operations than keeping all elements"
- "**Foundation for streaming problems**: This pattern applies to any 'top k' problem in a data stream (e.g., top k frequent, k closest points)"
time_complexity: "O(n log k) for initialisation where `n` is the size of `nums`, and O(log k) for each `add` call. Each heap operation (push/pop) takes O(log k) time since the heap never exceeds size `k`."
space_complexity: "O(k). We only store at most `k` elements in the heap at any time, regardless of how many elements are added to the stream."
solutions:
- approach_name: Min-Heap
is_optimal: true
code: |
import heapq
class KthLargest:
def __init__(self, k: int, nums: list[int]):
self.k = k
self.heap = []
# Add initial elements to the heap
for num in nums:
heapq.heappush(self.heap, num)
# Keep only the k largest elements
if len(self.heap) > k:
heapq.heappop(self.heap)
def add(self, val: int) -> int:
# Add new value to the heap
heapq.heappush(self.heap, val)
# If heap exceeds k, remove the smallest
if len(self.heap) > self.k:
heapq.heappop(self.heap)
# The root of min-heap is the kth largest
return self.heap[0]
explanation: |
**Time Complexity:** O(n log k) for constructor, O(log k) per `add` call — heap operations on a heap of size `k`.
**Space Complexity:** O(k) — the heap stores at most `k` elements.
We maintain a min-heap of exactly `k` elements. The smallest element in this heap (the root) is the `k`<sup>th</sup> largest overall. When adding a new element, if the heap grows beyond `k`, we pop the smallest — it's no longer in the top `k`.
- approach_name: Sorted List
is_optimal: false
code: |
import bisect
class KthLargest:
def __init__(self, k: int, nums: list[int]):
self.k = k
# Keep a sorted list of the k largest elements
self.sorted_list = sorted(nums, reverse=True)[:k]
self.sorted_list.reverse() # Ascending order for bisect
def add(self, val: int) -> int:
# Insert in sorted position
bisect.insort(self.sorted_list, val)
# Keep only k largest (remove smallest if needed)
if len(self.sorted_list) > self.k:
self.sorted_list.pop(0)
# Return the kth largest (smallest in our k-size list)
return self.sorted_list[0]
explanation: |
**Time Complexity:** O(n log n) for constructor, O(k) per `add` call — `bisect.insort` is O(k) due to shifting elements.
**Space Complexity:** O(k) — stores at most `k` elements.
This approach uses a sorted list with binary search insertion. While the space is the same, the O(k) insertion time makes it slower than the heap approach for large `k`. The heap's O(log k) operations are more efficient.