codetutor/backend/data/questions/k-closest-points-to-origin.yaml

title: K Closest Points to Origin
slug: k-closest-points-to-origin
difficulty: medium
leetcode_id: 973
leetcode_url: https://leetcode.com/problems/k-closest-points-to-origin/
categories:
  - arrays
  - heap
  - sorting
patterns:
  - heap

function_signature: "def k_closest(points: list[list[int]], k: int) -> list[list[int]]:"

test_cases:
  visible:
    - input: { points: [[1, 3], [-2, 2]], k: 1 }
      expected: [[-2, 2]]
    - input: { points: [[3, 3], [5, -1], [-2, 4]], k: 2 }
      expected: [[3, 3], [-2, 4]]
  hidden:
    - input: { points: [[0, 0]], k: 1 }
      expected: [[0, 0]]
    - input: { points: [[1, 1], [2, 2], [3, 3]], k: 3 }
      expected: [[1, 1], [2, 2], [3, 3]]
    - input: { points: [[1, 0], [0, 1]], k: 2 }
      expected: [[1, 0], [0, 1]]
    - input: { points: [[-5, 4], [3, 2], [1, 1], [0, 2]], k: 2 }
      expected: [[1, 1], [0, 2]]
    - input: { points: [[10, 10], [-10, -10], [1, 1]], k: 1 }
      expected: [[1, 1]]

description: |
  Given an array of `points` where `points[i] = [x_i, y_i]` represents a point on the **X-Y** plane and an integer `k`, return the `k` closest points to the origin `(0, 0)`.

  The distance between two points on the **X-Y** plane is the Euclidean distance (i.e., `sqrt((x1 - x2)^2 + (y1 - y2)^2)`).

  You may return the answer in **any order**. The answer is **guaranteed** to be **unique** (except for the order that it is in).

constraints: |
  - `1 <= k <= points.length <= 10^4`
  - `-10^4 <= x_i, y_i <= 10^4`

examples:
  - input: "points = [[1,3],[-2,2]], k = 1"
    output: "[[-2,2]]"
    explanation: "The distance between (1, 3) and the origin is sqrt(10). The distance between (-2, 2) and the origin is sqrt(8). Since sqrt(8) < sqrt(10), (-2, 2) is closer to the origin. We only want the closest k = 1 points from the origin, so the answer is just [[-2,2]]."
  - input: "points = [[3,3],[5,-1],[-2,4]], k = 2"
    output: "[[3,3],[-2,4]]"
    explanation: "The answer [[-2,4],[3,3]] would also be accepted since any order is valid."

explanation:
  intuition: |
    Imagine you have a map with several pins representing locations, and you're standing at the center (the origin). You need to find the `k` pins closest to you.

    The **core insight** is that we need to efficiently select the k smallest values from a collection — this is the classic **top-k problem**. While sorting all points would work, it's more work than necessary. We don't need full ordering; we just need to identify which k points are closest.

    Think of it like this: imagine you're a bouncer at a club with a capacity of `k` people. As people (points) arrive, you only let them in if there's room or if they're "better" (closer) than someone already inside. You don't need to rank everyone perfectly — you just need to maintain the best k at any moment.

    A **max-heap of size k** is perfect for this. The heap always holds the k closest points seen so far. When we encounter a new point, we compare it to the *farthest* point in our heap (the max). If the new point is closer, we evict the farthest and add the new one.

    **Key optimization**: Since we only care about *relative* distances, we can compare squared distances (`x^2 + y^2`) instead of actual Euclidean distances. This avoids expensive square root calculations without affecting correctness.

  approach: |
    We solve this using a **Max-Heap of Size K** approach:

    **Step 1: Define a distance function**

    - Create a helper to compute squared Euclidean distance: `x^2 + y^2`
    - We use squared distance to avoid the expensive `sqrt()` operation — comparing `d1^2` vs `d2^2` gives the same ordering as `d1` vs `d2`

    &nbsp;

    **Step 2: Build a max-heap of size k**

    - Iterate through each point in the input
    - Push each point onto a max-heap (in Python, negate the distance for a max-heap using `heapq`)
    - If the heap size exceeds `k`, pop the largest (farthest) point

    &nbsp;

    **Step 3: Extract results from the heap**

    - After processing all points, the heap contains exactly the k closest points
    - Extract and return these points

    &nbsp;

    **Why this works**: By maintaining a max-heap of size k, the root is always the *farthest* among our k candidates. When a closer point arrives, it replaces the farthest, ensuring we always have the k closest. This is more efficient than sorting when `k << n`.

  common_pitfalls:
    - title: Computing Actual Euclidean Distance
      description: |
        A common mistake is to compute the actual Euclidean distance using `sqrt(x^2 + y^2)` for every point.

        While mathematically correct, the `sqrt()` function is computationally expensive. Since we only need to *compare* distances (not their exact values), squared distances work just as well: if `a^2 < b^2` and both are positive, then `a < b`.

        This optimisation can provide a noticeable performance boost, especially with `10^4` points.
      wrong_approach: "Using sqrt(x^2 + y^2) for distance"
      correct_approach: "Using x^2 + y^2 for distance comparison"

    - title: Sorting All Points
      description: |
        The naive approach is to sort all n points by distance and take the first k.

        This gives O(n log n) time complexity regardless of k. When k is small (e.g., k = 10 with n = 10,000), we're doing far more work than necessary.

        The heap approach is O(n log k), which is significantly faster when `k << n`. For k = 10 and n = 10,000, that's roughly 3x fewer operations.
      wrong_approach: "Sort all points, take first k"
      correct_approach: "Use a max-heap of size k"

    - title: Using a Min-Heap Instead of Max-Heap
      description: |
        If you use a min-heap and push all n points, you'd need to pop k times at the end. This requires O(n) space for the full heap.

        A max-heap of size k is more memory-efficient (O(k) space) and naturally evicts the farthest point when a closer one arrives. In Python, since `heapq` is a min-heap by default, negate the distances to simulate a max-heap.
      wrong_approach: "Min-heap with all n points"
      correct_approach: "Max-heap limited to size k"

  key_takeaways:
    - "**Top-k pattern**: When you need the k smallest/largest elements, a heap of size k is often optimal — O(n log k) beats O(n log n) sorting when `k << n`"
    - "**Squared distance optimisation**: Avoid `sqrt()` when comparing distances — squared distances preserve ordering and are faster to compute"
    - "**Max-heap for k smallest**: Use a max-heap to track k smallest values; the root lets you quickly check if a new element belongs"
    - "**Related problems**: This pattern applies to Kth Largest Element, Top K Frequent Elements, and similar selection problems"

  time_complexity: "O(n log k). We iterate through all n points, and each heap operation (push/pop) takes O(log k) time since the heap is capped at size k."
  space_complexity: "O(k). The heap stores at most k points at any time."

solutions:
  - approach_name: Max-Heap
    is_optimal: true
    code: |
      import heapq

      def k_closest(points: list[list[int]], k: int) -> list[list[int]]:
          # Max-heap to store k closest points (negate distance for max-heap)
          max_heap = []

          for x, y in points:
              # Squared distance avoids expensive sqrt()
              dist = x * x + y * y
              # Push negative distance to simulate max-heap
              heapq.heappush(max_heap, (-dist, [x, y]))

              # If heap exceeds size k, remove the farthest point
              if len(max_heap) > k:
                  heapq.heappop(max_heap)

          # Extract the k closest points from the heap
          return [point for _, point in max_heap]
    explanation: |
      **Time Complexity:** O(n log k) — We process n points, each heap operation is O(log k).

      **Space Complexity:** O(k) — The heap stores at most k elements.

      By maintaining a max-heap of size k, we efficiently track the k closest points. The negative distance trick converts Python's min-heap into a max-heap, ensuring the farthest point is always at the root for quick comparison and removal.

  - approach_name: Sort by Distance
    is_optimal: false
    code: |
      def k_closest(points: list[list[int]], k: int) -> list[list[int]]:
          # Sort all points by squared distance from origin
          points.sort(key=lambda p: p[0] * p[0] + p[1] * p[1])

          # Return the first k points
          return points[:k]
    explanation: |
      **Time Complexity:** O(n log n) — Sorting dominates the complexity.

      **Space Complexity:** O(1) to O(n) — Depends on the sorting algorithm used.

      This approach is simpler to implement and may be preferred when k is close to n. However, for small k values relative to n, the heap approach is more efficient. The simplicity makes this a good choice when optimisation isn't critical.

  - approach_name: Quickselect
    is_optimal: false
    code: |
      import random

      def k_closest(points: list[list[int]], k: int) -> list[list[int]]:
          def dist(point: list[int]) -> int:
              return point[0] * point[0] + point[1] * point[1]

          def partition(left: int, right: int, pivot_idx: int) -> int:
              pivot_dist = dist(points[pivot_idx])
              # Move pivot to end
              points[pivot_idx], points[right] = points[right], points[pivot_idx]
              store_idx = left

              # Move all closer points to the left
              for i in range(left, right):
                  if dist(points[i]) < pivot_dist:
                      points[store_idx], points[i] = points[i], points[store_idx]
                      store_idx += 1

              # Move pivot to its final position
              points[store_idx], points[right] = points[right], points[store_idx]
              return store_idx

          left, right = 0, len(points) - 1
          while left < right:
              pivot_idx = random.randint(left, right)
              pivot_idx = partition(left, right, pivot_idx)

              if pivot_idx == k:
                  break
              elif pivot_idx < k:
                  left = pivot_idx + 1
              else:
                  right = pivot_idx - 1

          return points[:k]
    explanation: |
      **Time Complexity:** O(n) average, O(n^2) worst case — Quickselect has linear average time.

      **Space Complexity:** O(1) — In-place partitioning.

      Quickselect is theoretically optimal with O(n) average time. It partitions the array around a pivot, similar to quicksort, but only recurses into the partition containing the k-th element. The randomised pivot selection helps avoid worst-case scenarios. However, the heap approach is often preferred in practice due to its guaranteed O(n log k) bound.