Files
codetutor/backend/data/questions/min-cost-to-connect-all-points.yaml

241 lines
10 KiB
YAML

title: Min Cost to Connect All Points
slug: min-cost-to-connect-all-points
difficulty: medium
leetcode_id: 1584
leetcode_url: https://leetcode.com/problems/min-cost-to-connect-all-points/
categories:
- graphs
- heap
patterns:
- slug: heap
is_optimal: true
- slug: union-find
is_optimal: false
function_signature: "def min_cost_connect_points(points: list[list[int]]) -> int:"
test_cases:
visible:
- input: { points: [[0, 0], [2, 2], [3, 10], [5, 2], [7, 0]] }
expected: 20
- input: { points: [[3, 12], [-2, 5], [-4, 1]] }
expected: 18
hidden:
- input: { points: [[0, 0]] }
expected: 0
- input: { points: [[0, 0], [1, 1]] }
expected: 2
- input: { points: [[0, 0], [1, 0], [2, 0]] }
expected: 2
- input: { points: [[-1000000, -1000000], [1000000, 1000000]] }
expected: 4000000
- input: { points: [[0, 0], [0, 1], [1, 0], [1, 1]] }
expected: 3
description: |
You are given an array `points` representing integer coordinates of some points on a 2D-plane, where `points[i] = [x_i, y_i]`.
The cost of connecting two points `[x_i, y_i]` and `[x_j, y_j]` is the **manhattan distance** between them: `|x_i - x_j| + |y_i - y_j|`, where `|val|` denotes the absolute value of `val`.
Return *the minimum cost to make all points connected*. All points are connected if there is **exactly one** simple path between any two points.
constraints: |
- `1 <= points.length <= 1000`
- `-10^6 <= x_i, y_i <= 10^6`
- All pairs `(x_i, y_i)` are distinct
examples:
- input: "points = [[0,0],[2,2],[3,10],[5,2],[7,0]]"
output: "20"
explanation: "We can connect points with edges of total cost 20. Notice that there is a unique path between every pair of points."
- input: "points = [[3,12],[-2,5],[-4,1]]"
output: "18"
explanation: "Connect the three points with edges to form a tree of minimum total cost."
explanation:
intuition: |
This problem is asking us to connect all points with the **minimum total edge cost** such that any point can reach any other point. This is exactly the definition of a **Minimum Spanning Tree (MST)**.
Think of it like this: imagine you're a city planner laying down cables between houses. Each house is a point, and the cost of laying cable between two houses is the manhattan distance. You want to connect all houses while spending as little as possible on cable.
The key insight is that to connect `n` points, we need exactly `n - 1` edges (any more would create a cycle, any fewer would leave points disconnected). Among all possible ways to choose `n - 1` edges that connect everything, we want the one with minimum total weight.
Two classic algorithms solve this:
- **Prim's Algorithm**: Start from one point and greedily add the cheapest edge that connects a new point to our growing tree
- **Kruskal's Algorithm**: Sort all edges by cost and greedily add them if they don't create a cycle
Since the graph is **dense** (every point can connect to every other point, giving us `n(n-1)/2` edges), Prim's algorithm with a min-heap is typically more efficient here.
approach: |
We'll use **Prim's Algorithm** with a min-heap to build the MST efficiently.
**Step 1: Initialise data structures**
- `total_cost`: Set to `0` to accumulate the MST weight
- `visited`: A set to track which points are already in our MST
- `min_heap`: Priority queue storing `(cost, point_index)` tuples, initialised with `(0, 0)` to start from point 0
&nbsp;
**Step 2: Build the MST greedily**
- While we haven't connected all `n` points:
- Pop the minimum cost edge from the heap
- If this point is already visited, skip it (we found a cheaper path earlier)
- Otherwise, add this point to the MST: mark as visited, add the edge cost to `total_cost`
- For each unvisited point, calculate the manhattan distance and push `(distance, point_index)` to the heap
&nbsp;
**Step 3: Return the result**
- Return `total_cost` once all `n` points are connected
&nbsp;
The min-heap ensures we always process the cheapest available edge first, guaranteeing we build an optimal MST.
common_pitfalls:
- title: Using Adjacency List for Dense Graph
description: |
A common instinct is to precompute all edges and store them in an adjacency list. With `n` points, this creates `n(n-1)/2` edges, using O(n^2) space.
For this problem with `n <= 1000`, that's about 500,000 edges which is manageable. However, Prim's algorithm can compute edge weights on-the-fly, avoiding the upfront memory cost while achieving the same time complexity.
wrong_approach: "Precompute and store all O(n^2) edges"
correct_approach: "Compute manhattan distance on-demand during Prim's traversal"
- title: Forgetting to Check Visited Before Processing
description: |
When popping from the heap, the same point might appear multiple times with different costs (we pushed it once for each neighbor that discovered it). Always check if a point is already in the MST before processing.
Processing a visited point would add duplicate edges and inflate the total cost.
wrong_approach: "Process every heap entry without checking visited"
correct_approach: "Skip heap entries for already-visited points"
- title: Off-by-One in Edge Count
description: |
An MST connecting `n` nodes has exactly `n - 1` edges. Some implementations track edge count to know when to stop. If you're counting edges, ensure you stop at `n - 1`, not `n`.
Using a visited set with size check `len(visited) == n` avoids this issue entirely.
wrong_approach: "Stop when edge_count == n"
correct_approach: "Stop when len(visited) == n or edge_count == n - 1"
key_takeaways:
- "**Minimum Spanning Tree**: When connecting nodes with minimum cost and no cycles, think MST algorithms (Prim's or Kruskal's)"
- "**Dense vs Sparse graphs**: Prim's with a heap is O(E log V), which is efficient for dense graphs where E approaches V^2"
- "**On-demand computation**: For fully connected graphs, compute edge weights as needed rather than storing them all"
- "**Heap for greedy selection**: Min-heaps efficiently find the next best edge in O(log n) time"
time_complexity: "O(n^2 log n). We potentially push O(n^2) edges to the heap, and each heap operation is O(log n). Alternatively, O(n^2) using Prim's with an array instead of a heap."
space_complexity: "O(n). We store the visited set of size n and the heap can grow up to O(n) entries at a time (we only push edges to unvisited nodes)."
solutions:
- approach_name: Prim's Algorithm with Min-Heap
is_optimal: true
code: |
import heapq
def min_cost_connect_points(points: list[list[int]]) -> int:
n = len(points)
if n <= 1:
return 0
# Track which points are in our MST
visited = set()
# Min-heap: (cost, point_index)
# Start from point 0 with cost 0
min_heap = [(0, 0)]
total_cost = 0
while len(visited) < n:
# Get the cheapest edge to an unvisited point
cost, curr = heapq.heappop(min_heap)
# Skip if already in MST (found a cheaper path earlier)
if curr in visited:
continue
# Add this point to MST
visited.add(curr)
total_cost += cost
# Explore edges to all unvisited points
for next_point in range(n):
if next_point not in visited:
# Calculate manhattan distance
dist = (abs(points[curr][0] - points[next_point][0]) +
abs(points[curr][1] - points[next_point][1]))
heapq.heappush(min_heap, (dist, next_point))
return total_cost
explanation: |
**Time Complexity:** O(n^2 log n) — We push up to O(n^2) edges to the heap, each operation is O(log n).
**Space Complexity:** O(n) — The visited set is O(n), and the heap stores at most one entry per unvisited node at any time in the worst case.
This is the standard Prim's algorithm implementation. We greedily select the minimum cost edge that adds a new point to our growing MST, continuing until all points are connected.
- approach_name: Kruskal's Algorithm with Union-Find
is_optimal: false
code: |
class UnionFind:
def __init__(self, n: int):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x: int) -> int:
# Path compression
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x: int, y: int) -> bool:
# Union by rank, returns True if merged
px, py = self.find(x), self.find(y)
if px == py:
return False
if self.rank[px] < self.rank[py]:
px, py = py, px
self.parent[py] = px
if self.rank[px] == self.rank[py]:
self.rank[px] += 1
return True
def min_cost_connect_points(points: list[list[int]]) -> int:
n = len(points)
if n <= 1:
return 0
# Generate all edges: (cost, point_i, point_j)
edges = []
for i in range(n):
for j in range(i + 1, n):
dist = (abs(points[i][0] - points[j][0]) +
abs(points[i][1] - points[j][1]))
edges.append((dist, i, j))
# Sort edges by cost
edges.sort()
# Build MST using Union-Find
uf = UnionFind(n)
total_cost = 0
edges_used = 0
for cost, u, v in edges:
# Only add edge if it connects two components
if uf.union(u, v):
total_cost += cost
edges_used += 1
# MST complete when we have n-1 edges
if edges_used == n - 1:
break
return total_cost
explanation: |
**Time Complexity:** O(n^2 log n) — Generating edges is O(n^2), sorting is O(n^2 log n), and Union-Find operations are nearly O(1) amortized.
**Space Complexity:** O(n^2) — We store all n(n-1)/2 edges before sorting.
Kruskal's algorithm sorts all edges and greedily adds them if they don't create a cycle. Union-Find efficiently detects cycles. This approach uses more memory due to storing all edges but is conceptually simpler.