218 lines
11 KiB
YAML
218 lines
11 KiB
YAML
title: K Closest Points to Origin
|
|
slug: k-closest-points-to-origin
|
|
difficulty: medium
|
|
leetcode_id: 973
|
|
leetcode_url: https://leetcode.com/problems/k-closest-points-to-origin/
|
|
categories:
|
|
- arrays
|
|
- heap
|
|
- sorting
|
|
patterns:
|
|
- heap
|
|
|
|
function_signature: "def k_closest(points: list[list[int]], k: int) -> list[list[int]]:"
|
|
|
|
test_cases:
|
|
visible:
|
|
- input: { points: [[1, 3], [-2, 2]], k: 1 }
|
|
expected: [[-2, 2]]
|
|
- input: { points: [[3, 3], [5, -1], [-2, 4]], k: 2 }
|
|
expected: [[3, 3], [-2, 4]]
|
|
hidden:
|
|
- input: { points: [[0, 0]], k: 1 }
|
|
expected: [[0, 0]]
|
|
- input: { points: [[1, 1], [2, 2], [3, 3]], k: 3 }
|
|
expected: [[1, 1], [2, 2], [3, 3]]
|
|
- input: { points: [[1, 0], [0, 1]], k: 2 }
|
|
expected: [[1, 0], [0, 1]]
|
|
- input: { points: [[-5, 4], [3, 2], [1, 1], [0, 2]], k: 2 }
|
|
expected: [[1, 1], [0, 2]]
|
|
- input: { points: [[10, 10], [-10, -10], [1, 1]], k: 1 }
|
|
expected: [[1, 1]]
|
|
|
|
description: |
|
|
Given an array of `points` where `points[i] = [x_i, y_i]` represents a point on the **X-Y** plane and an integer `k`, return the `k` closest points to the origin `(0, 0)`.
|
|
|
|
The distance between two points on the **X-Y** plane is the Euclidean distance (i.e., `sqrt((x1 - x2)^2 + (y1 - y2)^2)`).
|
|
|
|
You may return the answer in **any order**. The answer is **guaranteed** to be **unique** (except for the order that it is in).
|
|
|
|
constraints: |
|
|
- `1 <= k <= points.length <= 10^4`
|
|
- `-10^4 <= x_i, y_i <= 10^4`
|
|
|
|
examples:
|
|
- input: "points = [[1,3],[-2,2]], k = 1"
|
|
output: "[[-2,2]]"
|
|
explanation: "The distance between (1, 3) and the origin is sqrt(10). The distance between (-2, 2) and the origin is sqrt(8). Since sqrt(8) < sqrt(10), (-2, 2) is closer to the origin. We only want the closest k = 1 points from the origin, so the answer is just [[-2,2]]."
|
|
- input: "points = [[3,3],[5,-1],[-2,4]], k = 2"
|
|
output: "[[3,3],[-2,4]]"
|
|
explanation: "The answer [[-2,4],[3,3]] would also be accepted since any order is valid."
|
|
|
|
explanation:
|
|
intuition: |
|
|
Imagine you have a map with several pins representing locations, and you're standing at the center (the origin). You need to find the `k` pins closest to you.
|
|
|
|
The **core insight** is that we need to efficiently select the k smallest values from a collection — this is the classic **top-k problem**. While sorting all points would work, it's more work than necessary. We don't need full ordering; we just need to identify which k points are closest.
|
|
|
|
Think of it like this: imagine you're a bouncer at a club with a capacity of `k` people. As people (points) arrive, you only let them in if there's room or if they're "better" (closer) than someone already inside. You don't need to rank everyone perfectly — you just need to maintain the best k at any moment.
|
|
|
|
A **max-heap of size k** is perfect for this. The heap always holds the k closest points seen so far. When we encounter a new point, we compare it to the *farthest* point in our heap (the max). If the new point is closer, we evict the farthest and add the new one.
|
|
|
|
**Key optimization**: Since we only care about *relative* distances, we can compare squared distances (`x^2 + y^2`) instead of actual Euclidean distances. This avoids expensive square root calculations without affecting correctness.
|
|
|
|
approach: |
|
|
We solve this using a **Max-Heap of Size K** approach:
|
|
|
|
**Step 1: Define a distance function**
|
|
|
|
- Create a helper to compute squared Euclidean distance: `x^2 + y^2`
|
|
- We use squared distance to avoid the expensive `sqrt()` operation — comparing `d1^2` vs `d2^2` gives the same ordering as `d1` vs `d2`
|
|
|
|
|
|
|
|
**Step 2: Build a max-heap of size k**
|
|
|
|
- Iterate through each point in the input
|
|
- Push each point onto a max-heap (in Python, negate the distance for a max-heap using `heapq`)
|
|
- If the heap size exceeds `k`, pop the largest (farthest) point
|
|
|
|
|
|
|
|
**Step 3: Extract results from the heap**
|
|
|
|
- After processing all points, the heap contains exactly the k closest points
|
|
- Extract and return these points
|
|
|
|
|
|
|
|
**Why this works**: By maintaining a max-heap of size k, the root is always the *farthest* among our k candidates. When a closer point arrives, it replaces the farthest, ensuring we always have the k closest. This is more efficient than sorting when `k << n`.
|
|
|
|
common_pitfalls:
|
|
- title: Computing Actual Euclidean Distance
|
|
description: |
|
|
A common mistake is to compute the actual Euclidean distance using `sqrt(x^2 + y^2)` for every point.
|
|
|
|
While mathematically correct, the `sqrt()` function is computationally expensive. Since we only need to *compare* distances (not their exact values), squared distances work just as well: if `a^2 < b^2` and both are positive, then `a < b`.
|
|
|
|
This optimisation can provide a noticeable performance boost, especially with `10^4` points.
|
|
wrong_approach: "Using sqrt(x^2 + y^2) for distance"
|
|
correct_approach: "Using x^2 + y^2 for distance comparison"
|
|
|
|
- title: Sorting All Points
|
|
description: |
|
|
The naive approach is to sort all n points by distance and take the first k.
|
|
|
|
This gives O(n log n) time complexity regardless of k. When k is small (e.g., k = 10 with n = 10,000), we're doing far more work than necessary.
|
|
|
|
The heap approach is O(n log k), which is significantly faster when `k << n`. For k = 10 and n = 10,000, that's roughly 3x fewer operations.
|
|
wrong_approach: "Sort all points, take first k"
|
|
correct_approach: "Use a max-heap of size k"
|
|
|
|
- title: Using a Min-Heap Instead of Max-Heap
|
|
description: |
|
|
If you use a min-heap and push all n points, you'd need to pop k times at the end. This requires O(n) space for the full heap.
|
|
|
|
A max-heap of size k is more memory-efficient (O(k) space) and naturally evicts the farthest point when a closer one arrives. In Python, since `heapq` is a min-heap by default, negate the distances to simulate a max-heap.
|
|
wrong_approach: "Min-heap with all n points"
|
|
correct_approach: "Max-heap limited to size k"
|
|
|
|
key_takeaways:
|
|
- "**Top-k pattern**: When you need the k smallest/largest elements, a heap of size k is often optimal — O(n log k) beats O(n log n) sorting when `k << n`"
|
|
- "**Squared distance optimisation**: Avoid `sqrt()` when comparing distances — squared distances preserve ordering and are faster to compute"
|
|
- "**Max-heap for k smallest**: Use a max-heap to track k smallest values; the root lets you quickly check if a new element belongs"
|
|
- "**Related problems**: This pattern applies to Kth Largest Element, Top K Frequent Elements, and similar selection problems"
|
|
|
|
time_complexity: "O(n log k). We iterate through all n points, and each heap operation (push/pop) takes O(log k) time since the heap is capped at size k."
|
|
space_complexity: "O(k). The heap stores at most k points at any time."
|
|
|
|
solutions:
|
|
- approach_name: Max-Heap
|
|
is_optimal: true
|
|
code: |
|
|
import heapq
|
|
|
|
def k_closest(points: list[list[int]], k: int) -> list[list[int]]:
|
|
# Max-heap to store k closest points (negate distance for max-heap)
|
|
max_heap = []
|
|
|
|
for x, y in points:
|
|
# Squared distance avoids expensive sqrt()
|
|
dist = x * x + y * y
|
|
# Push negative distance to simulate max-heap
|
|
heapq.heappush(max_heap, (-dist, [x, y]))
|
|
|
|
# If heap exceeds size k, remove the farthest point
|
|
if len(max_heap) > k:
|
|
heapq.heappop(max_heap)
|
|
|
|
# Extract the k closest points from the heap
|
|
return [point for _, point in max_heap]
|
|
explanation: |
|
|
**Time Complexity:** O(n log k) — We process n points, each heap operation is O(log k).
|
|
|
|
**Space Complexity:** O(k) — The heap stores at most k elements.
|
|
|
|
By maintaining a max-heap of size k, we efficiently track the k closest points. The negative distance trick converts Python's min-heap into a max-heap, ensuring the farthest point is always at the root for quick comparison and removal.
|
|
|
|
- approach_name: Sort by Distance
|
|
is_optimal: false
|
|
code: |
|
|
def k_closest(points: list[list[int]], k: int) -> list[list[int]]:
|
|
# Sort all points by squared distance from origin
|
|
points.sort(key=lambda p: p[0] * p[0] + p[1] * p[1])
|
|
|
|
# Return the first k points
|
|
return points[:k]
|
|
explanation: |
|
|
**Time Complexity:** O(n log n) — Sorting dominates the complexity.
|
|
|
|
**Space Complexity:** O(1) to O(n) — Depends on the sorting algorithm used.
|
|
|
|
This approach is simpler to implement and may be preferred when k is close to n. However, for small k values relative to n, the heap approach is more efficient. The simplicity makes this a good choice when optimisation isn't critical.
|
|
|
|
- approach_name: Quickselect
|
|
is_optimal: false
|
|
code: |
|
|
import random
|
|
|
|
def k_closest(points: list[list[int]], k: int) -> list[list[int]]:
|
|
def dist(point: list[int]) -> int:
|
|
return point[0] * point[0] + point[1] * point[1]
|
|
|
|
def partition(left: int, right: int, pivot_idx: int) -> int:
|
|
pivot_dist = dist(points[pivot_idx])
|
|
# Move pivot to end
|
|
points[pivot_idx], points[right] = points[right], points[pivot_idx]
|
|
store_idx = left
|
|
|
|
# Move all closer points to the left
|
|
for i in range(left, right):
|
|
if dist(points[i]) < pivot_dist:
|
|
points[store_idx], points[i] = points[i], points[store_idx]
|
|
store_idx += 1
|
|
|
|
# Move pivot to its final position
|
|
points[store_idx], points[right] = points[right], points[store_idx]
|
|
return store_idx
|
|
|
|
left, right = 0, len(points) - 1
|
|
while left < right:
|
|
pivot_idx = random.randint(left, right)
|
|
pivot_idx = partition(left, right, pivot_idx)
|
|
|
|
if pivot_idx == k:
|
|
break
|
|
elif pivot_idx < k:
|
|
left = pivot_idx + 1
|
|
else:
|
|
right = pivot_idx - 1
|
|
|
|
return points[:k]
|
|
explanation: |
|
|
**Time Complexity:** O(n) average, O(n^2) worst case — Quickselect has linear average time.
|
|
|
|
**Space Complexity:** O(1) — In-place partitioning.
|
|
|
|
Quickselect is theoretically optimal with O(n) average time. It partitions the array around a pivot, similar to quicksort, but only recurses into the partition containing the k-th element. The randomised pivot selection helps avoid worst-case scenarios. However, the heap approach is often preferred in practice due to its guaranteed O(n log k) bound.
|