codetutor/backend/data/questions/kth-largest-element-in-an-array.yaml

title: Kth Largest Element in an Array
slug: kth-largest-element-in-an-array
difficulty: medium
leetcode_id: 215
leetcode_url: https://leetcode.com/problems/kth-largest-element-in-an-array/
categories:
  - arrays
  - sorting
  - heap
patterns:
  - slug: heap
    is_optimal: true
  - slug: binary-search
    is_optimal: false

function_signature: "def find_kth_largest(nums: list[int], k: int) -> int:"

test_cases:
  visible:
    - input: { nums: [3, 2, 1, 5, 6, 4], k: 2 }
      expected: 5
    - input: { nums: [3, 2, 3, 1, 2, 4, 5, 5, 6], k: 4 }
      expected: 4
    - input: { nums: [1], k: 1 }
      expected: 1
  hidden:
    - input: { nums: [7, 7, 7, 7], k: 2 }
      expected: 7
    - input: { nums: [1, 2, 3, 4, 5], k: 5 }
      expected: 1
    - input: { nums: [1, 2, 3, 4, 5], k: 1 }
      expected: 5
    - input: { nums: [-1, -2, -3, -4], k: 1 }
      expected: -1
    - input: { nums: [5, 2, 4, 1, 3, 6, 0], k: 3 }
      expected: 4
    - input: { nums: [99, 99], k: 1 }
      expected: 99

description: |
  Given an integer array `nums` and an integer `k`, return *the* `k`<sup>th</sup> *largest element in the array*.

  Note that it is the `k`<sup>th</sup> largest element in the sorted order, not the `k`<sup>th</sup> distinct element.

  Can you solve it without sorting?

constraints: |
  - `1 <= k <= nums.length <= 10^5`
  - `-10^4 <= nums[i] <= 10^4`

examples:
  - input: "nums = [3,2,1,5,6,4], k = 2"
    output: "5"
    explanation: "The sorted array is [1,2,3,4,5,6]. The 2nd largest element is 5."
  - input: "nums = [3,2,3,1,2,4,5,5,6], k = 4"
    output: "4"
    explanation: "The sorted array is [1,2,2,3,3,4,5,5,6]. The 4th largest element is 4."

explanation:
  intuition: |
    Imagine you have a collection of exam scores and you want to find the student who ranked `k`<sup>th</sup> from the top. The most straightforward approach would be to sort all scores and pick the `k`<sup>th</sup> one from the end — but can we do better?

    Think of it like this: if you only need to find *one* specific ranking, do you really need to sort *everything*? This is similar to finding the tallest person in a room versus sorting everyone by height — the first task is much simpler.

    The key insight is that we don't need a fully sorted array. We only need to find the element that would be at position `n - k` if the array were sorted (0-indexed). This opens the door to more efficient approaches:

    1. **Heap approach**: Maintain a "top k" collection using a min-heap of size `k`. Any element smaller than our current `k`<sup>th</sup> largest can be discarded.

    2. **Quickselect approach**: Use the partitioning logic from quicksort, but only recurse into the half that contains our target position.

    Both avoid the full `O(n log n)` cost of sorting when we only need partial ordering.

  approach: |
    We'll focus on the **Min-Heap approach** as the primary solution due to its consistent performance and clarity:

    **Step 1: Understand the heap strategy**

    - We maintain a min-heap of size `k`
    - The min-heap always contains the `k` largest elements seen so far
    - The root of the heap (minimum of these `k` elements) is our answer

    &nbsp;

    **Step 2: Initialise the heap**

    - Create an empty min-heap
    - We'll use Python's `heapq` which implements a min-heap

    &nbsp;

    **Step 3: Process each element**

    - For each number in the array:
      - If the heap has fewer than `k` elements, push the number
      - Otherwise, if the number is larger than the heap's minimum (root), replace the root with this number
    - This ensures we always keep the `k` largest elements

    &nbsp;

    **Step 4: Return the result**

    - The root of the heap is the `k`<sup>th</sup> largest element
    - Return `heap[0]`

    &nbsp;

    **Why this works**: By keeping exactly `k` elements and always removing the smallest when we exceed capacity, we guarantee that the smallest element in our heap is larger than all discarded elements — making it exactly the `k`<sup>th</sup> largest overall.

  common_pitfalls:
    - title: Off-by-One with Heap Size
      description: |
        A common mistake is confusion about when to push vs. replace in the heap.

        If you always push and then pop when size exceeds `k`, you might accidentally pop the element you just added if it's the smallest. The correct approach is to check if the new element is larger than the heap's minimum *before* deciding to add it.

        Alternatively, you can push unconditionally and pop if size exceeds `k` — this is simpler and works correctly, though slightly less efficient.
      wrong_approach: "Complex conditional logic that's easy to get wrong"
      correct_approach: "Push then pop if size > k, or use heappushpop for efficiency"

    - title: Using Max-Heap Incorrectly
      description: |
        Some attempt to use a max-heap of the entire array and pop `k-1` times. While correct, this is inefficient:

        - Building a max-heap: `O(n)`
        - Popping `k` times: `O(k log n)`
        - Total: `O(n + k log n)`

        With a min-heap of size `k`, we get `O(n log k)`, which is better when `k` is small relative to `n`.
      wrong_approach: "Max-heap of all elements, pop k-1 times"
      correct_approach: "Min-heap of size k, maintaining the k largest"

    - title: Forgetting Python's heapq is Min-Heap Only
      description: |
        Python's `heapq` only provides a min-heap. To simulate a max-heap, you must negate values when pushing and negate again when popping.

        For this problem, a min-heap is actually what we want — we keep the `k` largest elements by discarding elements smaller than our current `k`<sup>th</sup> largest.
      wrong_approach: "Assuming heapq has a max-heap option"
      correct_approach: "Use min-heap directly for finding kth largest"

  key_takeaways:
    - "**Partial ordering insight**: When you only need one specific rank, you don't need to sort everything — use a heap or quickselect instead"
    - "**Min-heap for top-k**: A min-heap of size `k` naturally maintains the `k` largest elements, with the `k`<sup>th</sup> largest at the root"
    - "**Trade-off awareness**: Heap gives `O(n log k)` guaranteed; Quickselect gives `O(n)` average but `O(n^2)` worst case"
    - "**Foundation pattern**: This technique applies to streaming data, top-k frequent elements, and many ranking problems"

  time_complexity: "O(n log k). We iterate through all `n` elements, and each heap operation (push/pop) takes `O(log k)` time since the heap size is bounded by `k`."
  space_complexity: "O(k). We maintain a heap containing at most `k` elements."

  pattern_comparison: |
    **Heap vs Quickselect: Choosing the Right Pattern**

    Both approaches avoid full sorting, but they have different characteristics:

    | Approach | Time (Avg) | Time (Worst) | Space | Modifies Input? |
    |----------|------------|--------------|-------|-----------------|
    | **Min Heap** | O(n log k) | O(n log k) | O(k) | No |
    | **Quickselect** | O(n) | O(n²) | O(log n) | Yes |
    | **Sorting** | O(n log n) | O(n log n) | O(1)-O(n) | Yes |

    **When to choose Heap:**
    - You need **guaranteed** performance (no worst-case quadratic time)
    - The input array shouldn't be modified
    - `k` is small relative to `n` (the O(log k) factor stays small)
    - You're working with streaming data

    **When to choose Quickselect:**
    - You need the **fastest average** performance
    - Modifying the input array is acceptable
    - You're comfortable with randomised algorithms
    - Space is at a premium (O(log n) recursion vs O(k) heap)

    **Interview tip:** Start with Heap for its simplicity and guaranteed bounds, then mention Quickselect as an optimisation if the interviewer asks about O(n) solutions.

solutions:
  - approach_name: Min-Heap
    is_optimal: true
    code: |
      import heapq

      def find_kth_largest(nums: list[int], k: int) -> int:
          # Min-heap to store the k largest elements
          heap = []

          for num in nums:
              # Add current number to heap
              heapq.heappush(heap, num)

              # If heap exceeds size k, remove the smallest
              # This ensures we keep only the k largest elements
              if len(heap) > k:
                  heapq.heappop(heap)

          # The root of min-heap is the kth largest
          return heap[0]
    explanation: |
      **Time Complexity:** O(n log k) — We process each of `n` elements with heap operations costing `O(log k)`.

      **Space Complexity:** O(k) — The heap stores at most `k` elements.

      This approach maintains a min-heap of the `k` largest elements seen so far. By keeping the heap size at `k` and using a min-heap, the smallest element in our collection (the root) is always the `k`<sup>th</sup> largest overall.

  - approach_name: Quickselect
    is_optimal: true
    code: |
      import random

      def find_kth_largest(nums: list[int], k: int) -> int:
          # Convert kth largest to index in sorted array
          # kth largest = element at index (n - k) in ascending order
          target_index = len(nums) - k

          def quickselect(left: int, right: int) -> int:
              # Random pivot to avoid worst-case on sorted input
              pivot_idx = random.randint(left, right)
              pivot = nums[pivot_idx]

              # Move pivot to end
              nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx]

              # Partition: elements < pivot go to the left
              store_idx = left
              for i in range(left, right):
                  if nums[i] < pivot:
                      nums[store_idx], nums[i] = nums[i], nums[store_idx]
                      store_idx += 1

              # Move pivot to its final sorted position
              nums[store_idx], nums[right] = nums[right], nums[store_idx]

              # Check if we found the target
              if store_idx == target_index:
                  return nums[store_idx]
              elif store_idx < target_index:
                  # Target is in the right partition
                  return quickselect(store_idx + 1, right)
              else:
                  # Target is in the left partition
                  return quickselect(left, store_idx - 1)

          return quickselect(0, len(nums) - 1)
    explanation: |
      **Time Complexity:** O(n) average, O(n^2) worst case — Average case is linear because we only recurse into one half. Random pivot selection makes worst case very unlikely.

      **Space Complexity:** O(log n) average for recursion stack, O(n) worst case.

      Quickselect uses the partitioning logic from quicksort but only recurses into the partition containing our target index. This reduces the expected work from `O(n log n)` to `O(n)`.

  - approach_name: Sorting
    is_optimal: false
    code: |
      def find_kth_largest(nums: list[int], k: int) -> int:
          # Sort in descending order
          nums.sort(reverse=True)

          # Return the kth element (0-indexed, so k-1)
          return nums[k - 1]
    explanation: |
      **Time Complexity:** O(n log n) — Dominated by the sorting step.

      **Space Complexity:** O(1) to O(n) — Depends on the sorting algorithm used (in-place vs. not).

      The simplest approach: sort and index. While not optimal for this specific problem, it's worth knowing as a baseline. For small arrays or when `k` is close to `n`, the practical difference may be negligible.