title: K Closest Points to Origin slug: k-closest-points-to-origin difficulty: medium leetcode_id: 973 leetcode_url: https://leetcode.com/problems/k-closest-points-to-origin/ categories: - arrays - heap - sorting patterns: - heap function_signature: "def k_closest(points: list[list[int]], k: int) -> list[list[int]]:" test_cases: visible: - input: { points: [[1, 3], [-2, 2]], k: 1 } expected: [[-2, 2]] - input: { points: [[3, 3], [5, -1], [-2, 4]], k: 2 } expected: [[3, 3], [-2, 4]] hidden: - input: { points: [[0, 0]], k: 1 } expected: [[0, 0]] - input: { points: [[1, 1], [2, 2], [3, 3]], k: 3 } expected: [[1, 1], [2, 2], [3, 3]] - input: { points: [[1, 0], [0, 1]], k: 2 } expected: [[1, 0], [0, 1]] - input: { points: [[-5, 4], [3, 2], [1, 1], [0, 2]], k: 2 } expected: [[1, 1], [0, 2]] - input: { points: [[10, 10], [-10, -10], [1, 1]], k: 1 } expected: [[1, 1]] description: | Given an array of `points` where `points[i] = [x_i, y_i]` represents a point on the **X-Y** plane and an integer `k`, return the `k` closest points to the origin `(0, 0)`. The distance between two points on the **X-Y** plane is the Euclidean distance (i.e., `sqrt((x1 - x2)^2 + (y1 - y2)^2)`). You may return the answer in **any order**. The answer is **guaranteed** to be **unique** (except for the order that it is in). constraints: | - `1 <= k <= points.length <= 10^4` - `-10^4 <= x_i, y_i <= 10^4` examples: - input: "points = [[1,3],[-2,2]], k = 1" output: "[[-2,2]]" explanation: "The distance between (1, 3) and the origin is sqrt(10). The distance between (-2, 2) and the origin is sqrt(8). Since sqrt(8) < sqrt(10), (-2, 2) is closer to the origin. We only want the closest k = 1 points from the origin, so the answer is just [[-2,2]]." - input: "points = [[3,3],[5,-1],[-2,4]], k = 2" output: "[[3,3],[-2,4]]" explanation: "The answer [[-2,4],[3,3]] would also be accepted since any order is valid." explanation: intuition: | Imagine you have a map with several pins representing locations, and you're standing at the center (the origin). You need to find the `k` pins closest to you. The **core insight** is that we need to efficiently select the k smallest values from a collection — this is the classic **top-k problem**. While sorting all points would work, it's more work than necessary. We don't need full ordering; we just need to identify which k points are closest. Think of it like this: imagine you're a bouncer at a club with a capacity of `k` people. As people (points) arrive, you only let them in if there's room or if they're "better" (closer) than someone already inside. You don't need to rank everyone perfectly — you just need to maintain the best k at any moment. A **max-heap of size k** is perfect for this. The heap always holds the k closest points seen so far. When we encounter a new point, we compare it to the *farthest* point in our heap (the max). If the new point is closer, we evict the farthest and add the new one. **Key optimization**: Since we only care about *relative* distances, we can compare squared distances (`x^2 + y^2`) instead of actual Euclidean distances. This avoids expensive square root calculations without affecting correctness. approach: | We solve this using a **Max-Heap of Size K** approach: **Step 1: Define a distance function** - Create a helper to compute squared Euclidean distance: `x^2 + y^2` - We use squared distance to avoid the expensive `sqrt()` operation — comparing `d1^2` vs `d2^2` gives the same ordering as `d1` vs `d2`   **Step 2: Build a max-heap of size k** - Iterate through each point in the input - Push each point onto a max-heap (in Python, negate the distance for a max-heap using `heapq`) - If the heap size exceeds `k`, pop the largest (farthest) point   **Step 3: Extract results from the heap** - After processing all points, the heap contains exactly the k closest points - Extract and return these points   **Why this works**: By maintaining a max-heap of size k, the root is always the *farthest* among our k candidates. When a closer point arrives, it replaces the farthest, ensuring we always have the k closest. This is more efficient than sorting when `k << n`. common_pitfalls: - title: Computing Actual Euclidean Distance description: | A common mistake is to compute the actual Euclidean distance using `sqrt(x^2 + y^2)` for every point. While mathematically correct, the `sqrt()` function is computationally expensive. Since we only need to *compare* distances (not their exact values), squared distances work just as well: if `a^2 < b^2` and both are positive, then `a < b`. This optimisation can provide a noticeable performance boost, especially with `10^4` points. wrong_approach: "Using sqrt(x^2 + y^2) for distance" correct_approach: "Using x^2 + y^2 for distance comparison" - title: Sorting All Points description: | The naive approach is to sort all n points by distance and take the first k. This gives O(n log n) time complexity regardless of k. When k is small (e.g., k = 10 with n = 10,000), we're doing far more work than necessary. The heap approach is O(n log k), which is significantly faster when `k << n`. For k = 10 and n = 10,000, that's roughly 3x fewer operations. wrong_approach: "Sort all points, take first k" correct_approach: "Use a max-heap of size k" - title: Using a Min-Heap Instead of Max-Heap description: | If you use a min-heap and push all n points, you'd need to pop k times at the end. This requires O(n) space for the full heap. A max-heap of size k is more memory-efficient (O(k) space) and naturally evicts the farthest point when a closer one arrives. In Python, since `heapq` is a min-heap by default, negate the distances to simulate a max-heap. wrong_approach: "Min-heap with all n points" correct_approach: "Max-heap limited to size k" key_takeaways: - "**Top-k pattern**: When you need the k smallest/largest elements, a heap of size k is often optimal — O(n log k) beats O(n log n) sorting when `k << n`" - "**Squared distance optimisation**: Avoid `sqrt()` when comparing distances — squared distances preserve ordering and are faster to compute" - "**Max-heap for k smallest**: Use a max-heap to track k smallest values; the root lets you quickly check if a new element belongs" - "**Related problems**: This pattern applies to Kth Largest Element, Top K Frequent Elements, and similar selection problems" time_complexity: "O(n log k). We iterate through all n points, and each heap operation (push/pop) takes O(log k) time since the heap is capped at size k." space_complexity: "O(k). The heap stores at most k points at any time." solutions: - approach_name: Max-Heap is_optimal: true code: | import heapq def k_closest(points: list[list[int]], k: int) -> list[list[int]]: # Max-heap to store k closest points (negate distance for max-heap) max_heap = [] for x, y in points: # Squared distance avoids expensive sqrt() dist = x * x + y * y # Push negative distance to simulate max-heap heapq.heappush(max_heap, (-dist, [x, y])) # If heap exceeds size k, remove the farthest point if len(max_heap) > k: heapq.heappop(max_heap) # Extract the k closest points from the heap return [point for _, point in max_heap] explanation: | **Time Complexity:** O(n log k) — We process n points, each heap operation is O(log k). **Space Complexity:** O(k) — The heap stores at most k elements. By maintaining a max-heap of size k, we efficiently track the k closest points. The negative distance trick converts Python's min-heap into a max-heap, ensuring the farthest point is always at the root for quick comparison and removal. - approach_name: Sort by Distance is_optimal: false code: | def k_closest(points: list[list[int]], k: int) -> list[list[int]]: # Sort all points by squared distance from origin points.sort(key=lambda p: p[0] * p[0] + p[1] * p[1]) # Return the first k points return points[:k] explanation: | **Time Complexity:** O(n log n) — Sorting dominates the complexity. **Space Complexity:** O(1) to O(n) — Depends on the sorting algorithm used. This approach is simpler to implement and may be preferred when k is close to n. However, for small k values relative to n, the heap approach is more efficient. The simplicity makes this a good choice when optimisation isn't critical. - approach_name: Quickselect is_optimal: false code: | import random def k_closest(points: list[list[int]], k: int) -> list[list[int]]: def dist(point: list[int]) -> int: return point[0] * point[0] + point[1] * point[1] def partition(left: int, right: int, pivot_idx: int) -> int: pivot_dist = dist(points[pivot_idx]) # Move pivot to end points[pivot_idx], points[right] = points[right], points[pivot_idx] store_idx = left # Move all closer points to the left for i in range(left, right): if dist(points[i]) < pivot_dist: points[store_idx], points[i] = points[i], points[store_idx] store_idx += 1 # Move pivot to its final position points[store_idx], points[right] = points[right], points[store_idx] return store_idx left, right = 0, len(points) - 1 while left < right: pivot_idx = random.randint(left, right) pivot_idx = partition(left, right, pivot_idx) if pivot_idx == k: break elif pivot_idx < k: left = pivot_idx + 1 else: right = pivot_idx - 1 return points[:k] explanation: | **Time Complexity:** O(n) average, O(n^2) worst case — Quickselect has linear average time. **Space Complexity:** O(1) — In-place partitioning. Quickselect is theoretically optimal with O(n) average time. It partitions the array around a pivot, similar to quicksort, but only recurses into the partition containing the k-th element. The randomised pivot selection helps avoid worst-case scenarios. However, the heap approach is often preferred in practice due to its guaranteed O(n log k) bound.