name: Union Find slug: union-find difficulty_level: 3 pattern_type: data_structure display_order: 18 description: > Track disjoint sets with efficient union and find operations. Union-Find (also called Disjoint Set Union) excels at dynamically grouping elements and answering "are these two elements in the same group?" queries. when_to_use: | - Finding connected components - Detecting cycles in undirected graphs - Kruskal's minimum spanning tree - Dynamic connectivity queries - Grouping related items (accounts merge, friend circles) metaphor: | Imagine a social network where you want to know if two people are connected (directly or through friends of friends). Instead of searching the entire network each time, everyone in a connected group points to a group leader. To check if two people are connected, just check if they have the same leader. When groups merge (someone bridges two groups), you just update one leader to point to the other. Another analogy: corporate acquisitions. Each company has a parent company (possibly itself). When companies merge, one becomes a subsidiary of the other. To find the ultimate parent, you follow the chain of ownership. core_concept: | Union-Find maintains a forest of trees where each tree represents a set. Each element points to its parent, and the root of the tree is the set's representative. **Two key operations:** - **Find(x)**: Return the root (representative) of x's set - **Union(x, y)**: Merge the sets containing x and y **Two key optimizations:** - **Path compression**: During Find, make each node point directly to root. This flattens the tree for future queries. - **Union by rank/size**: Always attach the smaller tree under the larger. This keeps trees shallow. With both optimizations, operations run in nearly O(1) time—technically O(α(n)) where α is the inverse Ackermann function, which is ≤ 4 for any practical input size. visualization: | **Initial state (each element is its own set):** ``` parent: [0, 1, 2, 3, 4] (each points to itself) Sets: {0}, {1}, {2}, {3}, {4} ``` **Union(0, 1):** ``` parent: [0, 0, 2, 3, 4] (1 now points to 0) 0 | 1 Sets: {0, 1}, {2}, {3}, {4} ``` **Union(2, 3) then Union(3, 4):** ``` parent: [0, 0, 2, 2, 3] 0 2 | / \ 1 3 (direct) | 4 Sets: {0, 1}, {2, 3, 4} ``` **Union(1, 4) — merges the two trees:** ``` Find(1) = 0, Find(4) = 2 Union by rank: attach smaller under larger parent: [0, 0, 0, 2, 3] 0 /| 1 2 / \ 3 (direct) | 4 Sets: {0, 1, 2, 3, 4} ``` **Path compression during Find(4):** ``` Find(4): 4 → 3 → 2 → 0 (found root) Compress: make 4, 3, 2 all point directly to 0 parent: [0, 0, 0, 0, 0] 0 /|\ \ 1 2 3 4 Now Find(4) is O(1)! ``` code_template: | class UnionFind: """Union-Find with path compression and union by rank.""" def __init__(self, n: int): self.parent = list(range(n)) self.rank = [0] * n self.count = n # Number of disjoint sets def find(self, x: int) -> int: """Find root with path compression.""" if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) return self.parent[x] def union(self, x: int, y: int) -> bool: """Union by rank. Returns True if x and y were in different sets.""" root_x, root_y = self.find(x), self.find(y) if root_x == root_y: return False # Already in same set # Union by rank if self.rank[root_x] < self.rank[root_y]: root_x, root_y = root_y, root_x self.parent[root_y] = root_x if self.rank[root_x] == self.rank[root_y]: self.rank[root_x] += 1 self.count -= 1 return True def connected(self, x: int, y: int) -> bool: """Check if x and y are in the same set.""" return self.find(x) == self.find(y) def count_components(n: int, edges: list[list[int]]) -> int: """Count connected components in undirected graph.""" uf = UnionFind(n) for u, v in edges: uf.union(u, v) return uf.count def has_cycle(n: int, edges: list[list[int]]) -> bool: """Detect cycle in undirected graph.""" uf = UnionFind(n) for u, v in edges: if uf.connected(u, v): return True # Adding edge creates cycle uf.union(u, v) return False def kruskal_mst(n: int, edges: list[tuple[int, int, int]]) -> int: """Kruskal's MST algorithm using Union-Find.""" # edges are (weight, u, v) edges.sort() # Sort by weight uf = UnionFind(n) mst_weight = 0 edges_used = 0 for weight, u, v in edges: if uf.union(u, v): mst_weight += weight edges_used += 1 if edges_used == n - 1: break return mst_weight if edges_used == n - 1 else -1 def accounts_merge(accounts: list[list[str]]) -> list[list[str]]: """Merge accounts with common emails.""" email_to_id = {} email_to_name = {} uf = UnionFind(len(accounts)) # Map emails to account indices for i, account in enumerate(accounts): name = account[0] for email in account[1:]: email_to_name[email] = name if email in email_to_id: uf.union(i, email_to_id[email]) else: email_to_id[email] = i # Group emails by root account from collections import defaultdict root_to_emails = defaultdict(set) for email, idx in email_to_id.items(): root = uf.find(idx) root_to_emails[root].add(email) # Build result return [[email_to_name[next(iter(emails))]] + sorted(emails) for emails in root_to_emails.values()] recognition_signals: - "connected components" - "disjoint sets" - "union" - "groups" - "merge accounts" - "friend circles" - "detect cycle undirected" - "Kruskal" - "minimum spanning tree" - "redundant connection" - "equivalence" common_mistakes: - title: Forgetting path compression description: | Without path compression, repeated Find operations can be O(n) each, degrading overall performance. fix: | Always compress paths during Find: ```python if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) ``` - title: Using Union-Find for directed graphs description: | Union-Find assumes undirected connections. For directed graphs, cycles mean something different (back edges in DFS). fix: | Use DFS with coloring (WHITE/GRAY/BLACK) for cycle detection in directed graphs. Union-Find is for undirected connectivity. - title: Not tracking component count description: | For problems asking "how many components," manually counting at the end is inefficient. fix: | Decrement count in union when merging two different sets: ```python if root_x != root_y: self.count -= 1 ``` - title: Union returning wrong information description: | Some solutions need to know if a union actually merged two sets or if they were already connected. fix: | Return boolean from union indicating if merge happened: ```python if root_x == root_y: return False # Already same set # ... do union ... return True # Merged ``` variations: - name: Basic connectivity description: | Track whether elements are in the same connected component. example: "Number of Connected Components, Friend Circles" - name: Cycle detection description: | If union is called on two already-connected elements, adding that edge would create a cycle. example: "Redundant Connection, Graph Valid Tree" - name: Kruskal's MST description: | Sort edges by weight, greedily add edges that don't create cycles (checked via Union-Find). example: "Min Cost to Connect All Points, Connecting Cities With Minimum Cost" - name: Dynamic connectivity description: | Handle streaming edge insertions while answering connectivity queries. example: "Evaluate Division, Accounts Merge" - name: Weighted Union-Find description: | Track relative weights/distances between elements and their roots. Used in problems with equivalence relationships. example: "Evaluate Division (weighted paths)" related_patterns: - dfs - bfs prerequisite_patterns: []