Files
codetutor/backend/data/questions/accounts-merge.yaml

296 lines
14 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
title: Accounts Merge
slug: accounts-merge
difficulty: medium
leetcode_id: 721
leetcode_url: https://leetcode.com/problems/accounts-merge/
categories:
- graphs
- hash-tables
- strings
patterns:
- union-find
- dfs
description: |
Given a list of `accounts` where each element `accounts[i]` is a list of strings, where the first element `accounts[i][0]` is a name, and the rest of the elements are **emails** representing emails of the account.
Now, we would like to merge these accounts. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name.
After merging the accounts, return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails **in sorted order**. The accounts themselves can be returned in **any order**.
constraints: |
- `1 <= accounts.length <= 1000`
- `2 <= accounts[i].length <= 10`
- `1 <= accounts[i][j].length <= 30`
- `accounts[i][0]` consists of English letters
- `accounts[i][j]` (for `j > 0`) is a valid email
examples:
- input: 'accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],["John","johnsmith@mail.com","john00@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]'
output: '[["John","john00@mail.com","john_newyork@mail.com","johnsmith@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]'
explanation: "The first and second John's are the same person as they have the common email 'johnsmith@mail.com'. The third John and Mary are different people as none of their email addresses are used by other accounts."
- input: 'accounts = [["Gabe","Gabe0@m.co","Gabe3@m.co","Gabe1@m.co"],["Kevin","Kevin3@m.co","Kevin5@m.co","Kevin0@m.co"],["Ethan","Ethan5@m.co","Ethan4@m.co","Ethan0@m.co"],["Hanzo","Hanzo3@m.co","Hanzo1@m.co","Hanzo0@m.co"],["Fern","Fern5@m.co","Fern1@m.co","Fern0@m.co"]]'
output: '[["Ethan","Ethan0@m.co","Ethan4@m.co","Ethan5@m.co"],["Gabe","Gabe0@m.co","Gabe1@m.co","Gabe3@m.co"],["Hanzo","Hanzo0@m.co","Hanzo1@m.co","Hanzo3@m.co"],["Kevin","Kevin0@m.co","Kevin3@m.co","Kevin5@m.co"],["Fern","Fern0@m.co","Fern1@m.co","Fern5@m.co"]]'
explanation: "No accounts share common emails, so each account remains separate. The emails within each account are sorted alphabetically."
explanation:
intuition: |
Think of this problem as a **social network** where emails are people and accounts are group chats. If two accounts share even one email, they must belong to the same person — meaning all emails from both accounts should be merged together.
The key insight is that this is a **connected components** problem in disguise. Imagine each email as a node in a graph. When two emails appear in the same account, they're connected by an edge. Your task is to find all connected components (groups of emails that belong to the same person) and then associate each component with the correct name.
Think of it like this: if email A and email B are in one account, and email B and email C are in another account, then A, B, and C all belong to the same person — they form a connected chain.
Two classic approaches work well here:
- **Union-Find (Disjoint Set Union)**: Efficiently groups emails by treating each email as an element and unioning emails that appear together
- **DFS/BFS**: Build an adjacency graph of emails and traverse to find all connected emails
approach: |
We'll use the **Union-Find** approach, which is particularly elegant for this problem:
**Step 1: Build email-to-index mapping and union structure**
- Create a mapping from each email to a unique index
- Initialise the Union-Find structure where each email starts as its own parent
- Also track which name is associated with each email
&nbsp;
**Step 2: Union emails within each account**
- For each account, union all emails together
- Use the first email in the account as the "anchor" — union every other email with it
- This ensures all emails in an account end up in the same connected component
&nbsp;
**Step 3: Group emails by their root parent**
- For each email, find its root parent using path compression
- Group all emails that share the same root together
- Use a dictionary mapping root → list of emails
&nbsp;
**Step 4: Build the final result**
- For each group of emails, get the associated name from any email in the group
- Sort the emails alphabetically
- Construct the result as `[name, email1, email2, ...]`
&nbsp;
The Union-Find approach is efficient because the union and find operations are nearly O(1) with path compression and union by rank.
common_pitfalls:
- title: Assuming Same Name Means Same Person
description: |
A critical mistake is merging accounts just because they share the same name.
For example, `["John", "a@mail.com"]` and `["John", "b@mail.com"]` are **different people** unless they share a common email. The problem explicitly states: "even if two accounts have the same name, they may belong to different people."
Only merge accounts when they share at least one email address.
wrong_approach: "Grouping accounts by name"
correct_approach: "Group by shared emails using Union-Find or graph traversal"
- title: Missing Transitive Connections
description: |
If account 1 has emails `[a, b]` and account 2 has emails `[b, c]`, then a, b, and c ALL belong to the same person through the transitive connection via `b`.
A naive approach might only merge direct pairs, missing that `a` and `c` are connected through `b`. Union-Find naturally handles transitivity — when you union `a-b` and `b-c`, calling `find(a)` and `find(c)` will return the same root.
wrong_approach: "Only checking direct email matches between accounts"
correct_approach: "Use Union-Find or DFS to capture all transitive connections"
- title: Forgetting to Sort Emails
description: |
The problem requires emails within each merged account to be in **sorted order**. It's easy to forget this step after successfully grouping the emails.
Always sort the email list before constructing the final result.
wrong_approach: "Returning emails in arbitrary order"
correct_approach: "Sort emails alphabetically before adding to result"
- title: Inefficient Union-Find Without Optimizations
description: |
A basic Union-Find implementation without path compression or union by rank can degrade to O(n) per operation, making the overall solution O(n²).
With path compression (flattening the tree during `find`) and union by rank (attaching smaller trees under larger ones), operations become nearly O(1) amortized.
wrong_approach: "Basic Union-Find without optimizations"
correct_approach: "Use path compression and union by rank for O(α(n)) operations"
key_takeaways:
- "**Connected components pattern**: When you need to group items by shared relationships, think Union-Find or graph traversal"
- "**Union-Find efficiency**: Path compression and union by rank make Union-Find nearly O(1) per operation — ideal for grouping problems"
- "**Don't trust names**: In real-world data, names are not unique identifiers — only concrete links (like shared emails) can establish identity"
- "**Transitive relationships**: Union-Find elegantly handles chains of relationships without explicit graph construction"
time_complexity: "O(n × k × α(n × k)) where `n` is the number of accounts and `k` is the average number of emails per account. The `α` (inverse Ackermann) function grows so slowly it's effectively constant."
space_complexity: "O(n × k). We store each email once in the parent dictionary and once in the grouping phase."
solutions:
- approach_name: Union-Find
is_optimal: true
code: |
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
# Union-Find helper functions with path compression
def find(x: str) -> str:
# Find root with path compression
if parent[x] != x:
parent[x] = find(parent[x]) # Flatten the tree
return parent[x]
def union(x: str, y: str) -> None:
# Union two emails by connecting their roots
root_x, root_y = find(x), find(y)
if root_x != root_y:
# Union by rank for efficiency
if rank[root_x] < rank[root_y]:
parent[root_x] = root_y
elif rank[root_x] > rank[root_y]:
parent[root_y] = root_x
else:
parent[root_y] = root_x
rank[root_x] += 1
# Initialise Union-Find structures
parent = {} # Maps email -> parent email
rank = {} # Tracks tree depth for union by rank
email_to_name = {} # Maps email -> account name
# Process each account
for account in accounts:
name = account[0]
first_email = account[1]
for email in account[1:]:
# Initialise email if not seen before
if email not in parent:
parent[email] = email
rank[email] = 0
email_to_name[email] = name
# Union this email with the first email in the account
union(first_email, email)
# Group emails by their root parent
from collections import defaultdict
groups = defaultdict(list)
for email in parent:
root = find(email)
groups[root].append(email)
# Build result: [name, sorted emails...]
result = []
for root, emails in groups.items():
name = email_to_name[root]
# Sort emails alphabetically as required
result.append([name] + sorted(emails))
return result
explanation: |
**Time Complexity:** O(n × k × α(n × k)) — Each union/find operation is nearly O(1) with path compression and union by rank. We process each email once.
**Space Complexity:** O(n × k) — We store each unique email in the parent dictionary, rank dictionary, and email-to-name mapping.
The Union-Find approach efficiently groups emails by maintaining a forest of trees where each tree represents a connected component. Path compression ensures trees stay flat, and union by rank prevents worst-case linear trees.
- approach_name: DFS Graph Traversal
is_optimal: true
code: |
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
from collections import defaultdict
# Build adjacency list: email -> set of connected emails
graph = defaultdict(set)
email_to_name = {}
for account in accounts:
name = account[0]
first_email = account[1]
for email in account[1:]:
# Connect all emails in this account to the first email
graph[first_email].add(email)
graph[email].add(first_email)
email_to_name[email] = name
# DFS to find all connected emails
def dfs(email: str, component: list[str]) -> None:
visited.add(email)
component.append(email)
for neighbor in graph[email]:
if neighbor not in visited:
dfs(neighbor, component)
visited = set()
result = []
# Find all connected components
for email in graph:
if email not in visited:
component = []
dfs(email, component)
# Get name from any email in the component
name = email_to_name[component[0]]
# Sort emails and build result
result.append([name] + sorted(component))
return result
explanation: |
**Time Complexity:** O(n × k × log(n × k)) — Building the graph is O(n × k), DFS visits each email once O(n × k), and sorting each component adds the log factor.
**Space Complexity:** O(n × k) — The graph stores edges between emails, and the visited set tracks processed emails.
This approach explicitly builds a graph where edges connect emails that appear together in an account. DFS then finds all connected components. While conceptually clearer than Union-Find, it requires more memory for the adjacency list.
- approach_name: BFS Graph Traversal
is_optimal: false
code: |
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
from collections import defaultdict, deque
# Build adjacency list
graph = defaultdict(set)
email_to_name = {}
for account in accounts:
name = account[0]
first_email = account[1]
for email in account[1:]:
graph[first_email].add(email)
graph[email].add(first_email)
email_to_name[email] = name
# BFS to find connected component
def bfs(start: str) -> list[str]:
component = []
queue = deque([start])
visited.add(start)
while queue:
email = queue.popleft()
component.append(email)
for neighbor in graph[email]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return component
visited = set()
result = []
for email in graph:
if email not in visited:
component = bfs(email)
name = email_to_name[component[0]]
result.append([name] + sorted(component))
return result
explanation: |
**Time Complexity:** O(n × k × log(n × k)) — Same as DFS: graph construction, traversal, and sorting.
**Space Complexity:** O(n × k) — Graph storage plus the BFS queue in the worst case.
BFS achieves the same result as DFS but uses a queue instead of recursion. This can be preferable when recursion depth is a concern, though for this problem's constraints (≤1000 accounts), either works fine.