314 lines
16 KiB
YAML
314 lines
16 KiB
YAML
title: Accounts Merge
|
||
slug: accounts-merge
|
||
difficulty: medium
|
||
leetcode_id: 721
|
||
leetcode_url: https://leetcode.com/problems/accounts-merge/
|
||
categories:
|
||
- graphs
|
||
- hash-tables
|
||
- strings
|
||
patterns:
|
||
- slug: union-find
|
||
is_optimal: true
|
||
- slug: dfs
|
||
is_optimal: false
|
||
|
||
function_signature: "def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:"
|
||
|
||
test_cases:
|
||
visible:
|
||
- input: { accounts: [["John", "johnsmith@mail.com", "john_newyork@mail.com"], ["John", "johnsmith@mail.com", "john00@mail.com"], ["Mary", "mary@mail.com"], ["John", "johnnybravo@mail.com"]] }
|
||
expected: [["John", "john00@mail.com", "john_newyork@mail.com", "johnsmith@mail.com"], ["Mary", "mary@mail.com"], ["John", "johnnybravo@mail.com"]]
|
||
- input: { accounts: [["Gabe", "Gabe0@m.co", "Gabe3@m.co", "Gabe1@m.co"], ["Kevin", "Kevin3@m.co", "Kevin5@m.co", "Kevin0@m.co"]] }
|
||
expected: [["Gabe", "Gabe0@m.co", "Gabe1@m.co", "Gabe3@m.co"], ["Kevin", "Kevin0@m.co", "Kevin3@m.co", "Kevin5@m.co"]]
|
||
hidden:
|
||
- input: { accounts: [["Alex", "a@mail.com"]] }
|
||
expected: [["Alex", "a@mail.com"]]
|
||
- input: { accounts: [["David", "d1@m.co", "d2@m.co"], ["David", "d2@m.co", "d3@m.co"], ["David", "d3@m.co", "d4@m.co"]] }
|
||
expected: [["David", "d1@m.co", "d2@m.co", "d3@m.co", "d4@m.co"]]
|
||
- input: { accounts: [["A", "a@a.com"], ["B", "b@b.com"], ["A", "a@a.com"]] }
|
||
expected: [["A", "a@a.com"], ["B", "b@b.com"]]
|
||
|
||
description: |
|
||
Given a list of `accounts` where each element `accounts[i]` is a list of strings, where the first element `accounts[i][0]` is a name, and the rest of the elements are **emails** representing emails of the account.
|
||
|
||
Now, we would like to merge these accounts. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name.
|
||
|
||
After merging the accounts, return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails **in sorted order**. The accounts themselves can be returned in **any order**.
|
||
|
||
constraints: |
|
||
- `1 <= accounts.length <= 1000`
|
||
- `2 <= accounts[i].length <= 10`
|
||
- `1 <= accounts[i][j].length <= 30`
|
||
- `accounts[i][0]` consists of English letters
|
||
- `accounts[i][j]` (for `j > 0`) is a valid email
|
||
|
||
examples:
|
||
- input: 'accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],["John","johnsmith@mail.com","john00@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]'
|
||
output: '[["John","john00@mail.com","john_newyork@mail.com","johnsmith@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]'
|
||
explanation: "The first and second John's are the same person as they have the common email 'johnsmith@mail.com'. The third John and Mary are different people as none of their email addresses are used by other accounts."
|
||
- input: 'accounts = [["Gabe","Gabe0@m.co","Gabe3@m.co","Gabe1@m.co"],["Kevin","Kevin3@m.co","Kevin5@m.co","Kevin0@m.co"],["Ethan","Ethan5@m.co","Ethan4@m.co","Ethan0@m.co"],["Hanzo","Hanzo3@m.co","Hanzo1@m.co","Hanzo0@m.co"],["Fern","Fern5@m.co","Fern1@m.co","Fern0@m.co"]]'
|
||
output: '[["Ethan","Ethan0@m.co","Ethan4@m.co","Ethan5@m.co"],["Gabe","Gabe0@m.co","Gabe1@m.co","Gabe3@m.co"],["Hanzo","Hanzo0@m.co","Hanzo1@m.co","Hanzo3@m.co"],["Kevin","Kevin0@m.co","Kevin3@m.co","Kevin5@m.co"],["Fern","Fern0@m.co","Fern1@m.co","Fern5@m.co"]]'
|
||
explanation: "No accounts share common emails, so each account remains separate. The emails within each account are sorted alphabetically."
|
||
|
||
explanation:
|
||
intuition: |
|
||
Think of this problem as a **social network** where emails are people and accounts are group chats. If two accounts share even one email, they must belong to the same person — meaning all emails from both accounts should be merged together.
|
||
|
||
The key insight is that this is a **connected components** problem in disguise. Imagine each email as a node in a graph. When two emails appear in the same account, they're connected by an edge. Your task is to find all connected components (groups of emails that belong to the same person) and then associate each component with the correct name.
|
||
|
||
Think of it like this: if email A and email B are in one account, and email B and email C are in another account, then A, B, and C all belong to the same person — they form a connected chain.
|
||
|
||
Two classic approaches work well here:
|
||
- **Union-Find (Disjoint Set Union)**: Efficiently groups emails by treating each email as an element and unioning emails that appear together
|
||
- **DFS/BFS**: Build an adjacency graph of emails and traverse to find all connected emails
|
||
|
||
approach: |
|
||
We'll use the **Union-Find** approach, which is particularly elegant for this problem:
|
||
|
||
**Step 1: Build email-to-index mapping and union structure**
|
||
|
||
- Create a mapping from each email to a unique index
|
||
- Initialise the Union-Find structure where each email starts as its own parent
|
||
- Also track which name is associated with each email
|
||
|
||
|
||
|
||
**Step 2: Union emails within each account**
|
||
|
||
- For each account, union all emails together
|
||
- Use the first email in the account as the "anchor" — union every other email with it
|
||
- This ensures all emails in an account end up in the same connected component
|
||
|
||
|
||
|
||
**Step 3: Group emails by their root parent**
|
||
|
||
- For each email, find its root parent using path compression
|
||
- Group all emails that share the same root together
|
||
- Use a dictionary mapping root → list of emails
|
||
|
||
|
||
|
||
**Step 4: Build the final result**
|
||
|
||
- For each group of emails, get the associated name from any email in the group
|
||
- Sort the emails alphabetically
|
||
- Construct the result as `[name, email1, email2, ...]`
|
||
|
||
|
||
|
||
The Union-Find approach is efficient because the union and find operations are nearly O(1) with path compression and union by rank.
|
||
|
||
common_pitfalls:
|
||
- title: Assuming Same Name Means Same Person
|
||
description: |
|
||
A critical mistake is merging accounts just because they share the same name.
|
||
|
||
For example, `["John", "a@mail.com"]` and `["John", "b@mail.com"]` are **different people** unless they share a common email. The problem explicitly states: "even if two accounts have the same name, they may belong to different people."
|
||
|
||
Only merge accounts when they share at least one email address.
|
||
wrong_approach: "Grouping accounts by name"
|
||
correct_approach: "Group by shared emails using Union-Find or graph traversal"
|
||
|
||
- title: Missing Transitive Connections
|
||
description: |
|
||
If account 1 has emails `[a, b]` and account 2 has emails `[b, c]`, then a, b, and c ALL belong to the same person through the transitive connection via `b`.
|
||
|
||
A naive approach might only merge direct pairs, missing that `a` and `c` are connected through `b`. Union-Find naturally handles transitivity — when you union `a-b` and `b-c`, calling `find(a)` and `find(c)` will return the same root.
|
||
wrong_approach: "Only checking direct email matches between accounts"
|
||
correct_approach: "Use Union-Find or DFS to capture all transitive connections"
|
||
|
||
- title: Forgetting to Sort Emails
|
||
description: |
|
||
The problem requires emails within each merged account to be in **sorted order**. It's easy to forget this step after successfully grouping the emails.
|
||
|
||
Always sort the email list before constructing the final result.
|
||
wrong_approach: "Returning emails in arbitrary order"
|
||
correct_approach: "Sort emails alphabetically before adding to result"
|
||
|
||
- title: Inefficient Union-Find Without Optimizations
|
||
description: |
|
||
A basic Union-Find implementation without path compression or union by rank can degrade to O(n) per operation, making the overall solution O(n²).
|
||
|
||
With path compression (flattening the tree during `find`) and union by rank (attaching smaller trees under larger ones), operations become nearly O(1) amortized.
|
||
wrong_approach: "Basic Union-Find without optimizations"
|
||
correct_approach: "Use path compression and union by rank for O(α(n)) operations"
|
||
|
||
key_takeaways:
|
||
- "**Connected components pattern**: When you need to group items by shared relationships, think Union-Find or graph traversal"
|
||
- "**Union-Find efficiency**: Path compression and union by rank make Union-Find nearly O(1) per operation — ideal for grouping problems"
|
||
- "**Don't trust names**: In real-world data, names are not unique identifiers — only concrete links (like shared emails) can establish identity"
|
||
- "**Transitive relationships**: Union-Find elegantly handles chains of relationships without explicit graph construction"
|
||
|
||
time_complexity: "O(n × k × α(n × k)) where `n` is the number of accounts and `k` is the average number of emails per account. The `α` (inverse Ackermann) function grows so slowly it's effectively constant."
|
||
space_complexity: "O(n × k). We store each email once in the parent dictionary and once in the grouping phase."
|
||
|
||
solutions:
|
||
- approach_name: Union-Find
|
||
is_optimal: true
|
||
code: |
|
||
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
|
||
# Union-Find helper functions with path compression
|
||
def find(x: str) -> str:
|
||
# Find root with path compression
|
||
if parent[x] != x:
|
||
parent[x] = find(parent[x]) # Flatten the tree
|
||
return parent[x]
|
||
|
||
def union(x: str, y: str) -> None:
|
||
# Union two emails by connecting their roots
|
||
root_x, root_y = find(x), find(y)
|
||
if root_x != root_y:
|
||
# Union by rank for efficiency
|
||
if rank[root_x] < rank[root_y]:
|
||
parent[root_x] = root_y
|
||
elif rank[root_x] > rank[root_y]:
|
||
parent[root_y] = root_x
|
||
else:
|
||
parent[root_y] = root_x
|
||
rank[root_x] += 1
|
||
|
||
# Initialise Union-Find structures
|
||
parent = {} # Maps email -> parent email
|
||
rank = {} # Tracks tree depth for union by rank
|
||
email_to_name = {} # Maps email -> account name
|
||
|
||
# Process each account
|
||
for account in accounts:
|
||
name = account[0]
|
||
first_email = account[1]
|
||
|
||
for email in account[1:]:
|
||
# Initialise email if not seen before
|
||
if email not in parent:
|
||
parent[email] = email
|
||
rank[email] = 0
|
||
email_to_name[email] = name
|
||
|
||
# Union this email with the first email in the account
|
||
union(first_email, email)
|
||
|
||
# Group emails by their root parent
|
||
from collections import defaultdict
|
||
groups = defaultdict(list)
|
||
for email in parent:
|
||
root = find(email)
|
||
groups[root].append(email)
|
||
|
||
# Build result: [name, sorted emails...]
|
||
result = []
|
||
for root, emails in groups.items():
|
||
name = email_to_name[root]
|
||
# Sort emails alphabetically as required
|
||
result.append([name] + sorted(emails))
|
||
|
||
return result
|
||
explanation: |
|
||
**Time Complexity:** O(n × k × α(n × k)) — Each union/find operation is nearly O(1) with path compression and union by rank. We process each email once.
|
||
|
||
**Space Complexity:** O(n × k) — We store each unique email in the parent dictionary, rank dictionary, and email-to-name mapping.
|
||
|
||
The Union-Find approach efficiently groups emails by maintaining a forest of trees where each tree represents a connected component. Path compression ensures trees stay flat, and union by rank prevents worst-case linear trees.
|
||
|
||
- approach_name: DFS Graph Traversal
|
||
is_optimal: true
|
||
code: |
|
||
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
|
||
from collections import defaultdict
|
||
|
||
# Build adjacency list: email -> set of connected emails
|
||
graph = defaultdict(set)
|
||
email_to_name = {}
|
||
|
||
for account in accounts:
|
||
name = account[0]
|
||
first_email = account[1]
|
||
|
||
for email in account[1:]:
|
||
# Connect all emails in this account to the first email
|
||
graph[first_email].add(email)
|
||
graph[email].add(first_email)
|
||
email_to_name[email] = name
|
||
|
||
# DFS to find all connected emails
|
||
def dfs(email: str, component: list[str]) -> None:
|
||
visited.add(email)
|
||
component.append(email)
|
||
for neighbor in graph[email]:
|
||
if neighbor not in visited:
|
||
dfs(neighbor, component)
|
||
|
||
visited = set()
|
||
result = []
|
||
|
||
# Find all connected components
|
||
for email in graph:
|
||
if email not in visited:
|
||
component = []
|
||
dfs(email, component)
|
||
# Get name from any email in the component
|
||
name = email_to_name[component[0]]
|
||
# Sort emails and build result
|
||
result.append([name] + sorted(component))
|
||
|
||
return result
|
||
explanation: |
|
||
**Time Complexity:** O(n × k × log(n × k)) — Building the graph is O(n × k), DFS visits each email once O(n × k), and sorting each component adds the log factor.
|
||
|
||
**Space Complexity:** O(n × k) — The graph stores edges between emails, and the visited set tracks processed emails.
|
||
|
||
This approach explicitly builds a graph where edges connect emails that appear together in an account. DFS then finds all connected components. While conceptually clearer than Union-Find, it requires more memory for the adjacency list.
|
||
|
||
- approach_name: BFS Graph Traversal
|
||
is_optimal: false
|
||
code: |
|
||
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
|
||
from collections import defaultdict, deque
|
||
|
||
# Build adjacency list
|
||
graph = defaultdict(set)
|
||
email_to_name = {}
|
||
|
||
for account in accounts:
|
||
name = account[0]
|
||
first_email = account[1]
|
||
|
||
for email in account[1:]:
|
||
graph[first_email].add(email)
|
||
graph[email].add(first_email)
|
||
email_to_name[email] = name
|
||
|
||
# BFS to find connected component
|
||
def bfs(start: str) -> list[str]:
|
||
component = []
|
||
queue = deque([start])
|
||
visited.add(start)
|
||
|
||
while queue:
|
||
email = queue.popleft()
|
||
component.append(email)
|
||
for neighbor in graph[email]:
|
||
if neighbor not in visited:
|
||
visited.add(neighbor)
|
||
queue.append(neighbor)
|
||
|
||
return component
|
||
|
||
visited = set()
|
||
result = []
|
||
|
||
for email in graph:
|
||
if email not in visited:
|
||
component = bfs(email)
|
||
name = email_to_name[component[0]]
|
||
result.append([name] + sorted(component))
|
||
|
||
return result
|
||
explanation: |
|
||
**Time Complexity:** O(n × k × log(n × k)) — Same as DFS: graph construction, traversal, and sorting.
|
||
|
||
**Space Complexity:** O(n × k) — Graph storage plus the BFS queue in the worst case.
|
||
|
||
BFS achieves the same result as DFS but uses a queue instead of recursion. This can be preferable when recursion depth is a concern, though for this problem's constraints (≤1000 accounts), either works fine.
|