codetutor/backend/data/questions/top-k-frequent-elements.yaml

title: Top K Frequent Elements
slug: top-k-frequent-elements
difficulty: medium
leetcode_id: 347
leetcode_url: https://leetcode.com/problems/top-k-frequent-elements/
categories:
  - arrays
  - hash-tables
  - heap
  - sorting
patterns:
  - slug: counting-sort
    is_optimal: true

function_signature: "def top_k_frequent(nums: list[int], k: int) -> list[int]:"

test_cases:
  visible:
    - input: { nums: [1, 1, 1, 2, 2, 3], k: 2 }
      expected: [1, 2]
    - input: { nums: [1], k: 1 }
      expected: [1]
    - input: { nums: [4, 1, -1, 2, -1, 2, 3], k: 2 }
      expected: [-1, 2]
  hidden:
    - input: { nums: [1, 2], k: 2 }
      expected: [1, 2]
    - input: { nums: [5, 5, 5, 5, 5], k: 1 }
      expected: [5]
    - input: { nums: [1, 1, 2, 2, 3, 3, 4], k: 3 }
      expected: [1, 2, 3]
    - input: { nums: [-1, -1, -1, 2, 2, 3], k: 2 }
      expected: [-1, 2]
    - input: { nums: [3, 0, 1, 0], k: 1 }
      expected: [0]
    - input: { nums: [1, 1, 1, 2, 2, 3], k: 3 }
      expected: [1, 2, 3]

description: |
  Given an integer array `nums` and an integer `k`, return *the* `k` *most frequent elements*. You may return the answer in **any order**.

constraints: |
  - `1 <= nums.length <= 10^5`
  - `-10^4 <= nums[i] <= 10^4`
  - `k` is in the range `[1, the number of unique elements in the array]`
  - It is **guaranteed** that the answer is **unique**

  **Follow up:** Your algorithm's time complexity must be better than `O(n log n)`, where `n` is the array's size.

examples:
  - input: "nums = [1,1,1,2,2,3], k = 2"
    output: "[1,2]"
    explanation: "Element 1 appears 3 times, element 2 appears 2 times, and element 3 appears once. The two most frequent elements are 1 and 2."
  - input: "nums = [1], k = 1"
    output: "[1]"
    explanation: "There is only one element, and we need the top 1 most frequent element."

explanation:
  intuition: |
    Imagine you're analysing survey results and need to find the most popular choices. Your first instinct might be to count how many times each option appears, then sort by popularity. But can we do better than sorting?

    The key insight is that **frequency values are bounded**. If you have `n` elements, the maximum possible frequency is `n` (when all elements are the same). This means we can use the frequency itself as an index into an array — a technique called **bucket sort**.

    Think of it like this: create `n + 1` buckets labelled by frequency (0 through n). After counting each element's frequency, drop each element into its corresponding bucket. Then, starting from the highest-frequency bucket, collect elements until you have `k` of them.

    This approach cleverly avoids comparison-based sorting (which has an `O(n log n)` lower bound) by using the frequency as a direct index.

  approach: |
    We solve this using a **Bucket Sort** approach:

    **Step 1: Count frequencies**

    - Use a hash map to count how many times each element appears
    - Key: the element value, Value: its frequency
    - This takes `O(n)` time with a single pass through the array

    &nbsp;

    **Step 2: Create frequency buckets**

    - Create an array of `n + 1` empty lists (buckets), where index `i` will hold elements that appear exactly `i` times
    - The maximum possible frequency is `n` (if all elements are identical)

    &nbsp;

    **Step 3: Fill the buckets**

    - For each element in our frequency map, add it to the bucket corresponding to its frequency
    - If element `x` appears 3 times, put `x` in `bucket[3]`

    &nbsp;

    **Step 4: Collect top k elements**

    - Starting from the highest frequency bucket (index `n`), work backwards
    - Add elements from each bucket to the result until we have `k` elements
    - Return the result

    &nbsp;

    This works because elements in higher-indexed buckets have higher frequencies, so by traversing from high to low, we naturally get the most frequent elements first.

  common_pitfalls:
    - title: Sorting the Entire Frequency Map
      description: |
        A common approach is to count frequencies, then sort all elements by their frequency. While correct, sorting takes `O(n log n)` time.

        The follow-up explicitly asks for better than `O(n log n)`. Bucket sort achieves `O(n)` by exploiting the bounded range of frequencies.
      wrong_approach: "Sort all elements by frequency"
      correct_approach: "Use bucket sort with frequency as index"

    - title: Using a Max Heap Without Size Limit
      description: |
        Building a max heap of all unique elements and extracting `k` times works, but building the heap is `O(m)` and each extraction is `O(log m)` where `m` is the number of unique elements.

        A more efficient heap approach uses a **min heap of size k**: maintain only the top `k` elements, evicting the minimum when the heap exceeds `k`. This gives `O(n log k)` time, which is better when `k << n`.
      wrong_approach: "Max heap of all elements, extract k times"
      correct_approach: "Min heap of size k, or bucket sort for O(n)"

    - title: Off-by-One in Bucket Array Size
      description: |
        If you create only `n` buckets (indices 0 to n-1), you'll miss the case where an element appears `n` times (all elements are identical).

        Create `n + 1` buckets to handle frequencies from 0 to `n` inclusive.
      wrong_approach: "buckets = [[] for _ in range(n)]"
      correct_approach: "buckets = [[] for _ in range(n + 1)]"

  key_takeaways:
    - "**Bucket sort** is powerful when values are bounded — use the value itself as an array index to avoid comparison-based sorting"
    - "**Frequency counting + bucketing** is a common pattern for \"top k\" problems with bounded frequencies"
    - "**Min heap of size k** is another useful technique: maintain only what you need, not everything"
    - "This problem appears frequently in interviews and tests understanding of time complexity tradeoffs"

  time_complexity: "O(n). We make one pass to count frequencies and one pass to fill and traverse buckets."
  space_complexity: "O(n). We use a hash map for counts and an array of buckets, both proportional to the input size."

  pattern_comparison: |
    **Bucket Sort vs Heap: Which Pattern Wins?**

    Both patterns solve this problem correctly, but with different trade-offs:

    | Approach | Time | Space | When to Use |
    |----------|------|-------|-------------|
    | **Bucket Sort** | O(n) | O(n) | When frequencies are bounded (always true here since max freq ≤ n) |
    | **Min Heap** | O(n log k) | O(n) | When k is much smaller than n, or for streaming data |
    | **Sorting** | O(n log n) | O(n) | Simplest to implement, but doesn't meet the follow-up requirement |

    **Why Bucket Sort is optimal here:**
    - Frequencies are naturally bounded by array length — if you have `n` elements, the maximum frequency is `n`
    - This bounded range lets us use the frequency as a direct array index, avoiding comparison-based sorting entirely
    - We achieve O(n) time by exploiting the problem's structure

    **When to prefer Heap:**
    - Streaming scenarios where you don't know all elements upfront
    - When `k << n` (the O(log k) factor becomes negligible)
    - When you need to support dynamic updates (adding/removing elements)

solutions:
  - approach_name: Bucket Sort
    is_optimal: true
    code: |
      def top_k_frequent(nums: list[int], k: int) -> list[int]:
          # Step 1: Count frequency of each element
          count = {}
          for num in nums:
              count[num] = count.get(num, 0) + 1

          # Step 2: Create buckets where index = frequency
          # Max frequency is n (all elements identical)
          n = len(nums)
          buckets = [[] for _ in range(n + 1)]

          # Step 3: Place each element in its frequency bucket
          for num, freq in count.items():
              buckets[freq].append(num)

          # Step 4: Collect k elements from highest frequency buckets
          result = []
          for freq in range(n, 0, -1):  # Start from highest frequency
              for num in buckets[freq]:
                  result.append(num)
                  if len(result) == k:
                      return result

          return result
    explanation: |
      **Time Complexity:** O(n) — One pass to count, one pass to bucket, one pass to collect.

      **Space Complexity:** O(n) — Hash map and bucket array.

      Bucket sort exploits the fact that frequencies are bounded by `n`. By using frequency as an index, we avoid comparison-based sorting entirely and achieve linear time.

  - approach_name: Min Heap
    is_optimal: false
    code: |
      import heapq

      def top_k_frequent(nums: list[int], k: int) -> list[int]:
          # Step 1: Count frequency of each element
          count = {}
          for num in nums:
              count[num] = count.get(num, 0) + 1

          # Step 2: Use min heap to keep only top k elements
          # Heap contains (frequency, element) tuples
          heap = []
          for num, freq in count.items():
              heapq.heappush(heap, (freq, num))
              # If heap exceeds size k, remove the minimum
              if len(heap) > k:
                  heapq.heappop(heap)

          # Step 3: Extract elements from heap
          return [num for freq, num in heap]
    explanation: |
      **Time Complexity:** O(n log k) — We push each unique element onto a heap of size at most `k`.

      **Space Complexity:** O(n) — Hash map for counts, plus O(k) for the heap.

      The min heap approach maintains only the top `k` elements at any time. When we see a new element, if its frequency is higher than the minimum in our heap (and the heap is full), we replace it. This is more efficient than a max heap of all elements when `k` is small.

  - approach_name: Sorting
    is_optimal: false
    code: |
      def top_k_frequent(nums: list[int], k: int) -> list[int]:
          # Step 1: Count frequency of each element
          count = {}
          for num in nums:
              count[num] = count.get(num, 0) + 1

          # Step 2: Sort by frequency (descending) and take top k
          sorted_elements = sorted(count.keys(), key=lambda x: count[x], reverse=True)

          return sorted_elements[:k]
    explanation: |
      **Time Complexity:** O(n log n) — Sorting dominates.

      **Space Complexity:** O(n) — Hash map and sorted list.

      This straightforward approach counts frequencies then sorts. While simple to implement, it doesn't meet the follow-up requirement of better than `O(n log n)`. Included here to contrast with the optimal solutions.