codetutor/backend/data/questions/accounts-merge.yaml

title: Accounts Merge
slug: accounts-merge
difficulty: medium
leetcode_id: 721
leetcode_url: https://leetcode.com/problems/accounts-merge/
categories:
  - graphs
  - hash-tables
  - strings
patterns:
  - slug: union-find
    is_optimal: true
  - slug: dfs
    is_optimal: false

function_signature: "def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:"

test_cases:
  visible:
    - input: { accounts: [["John", "johnsmith@mail.com", "john_newyork@mail.com"], ["John", "johnsmith@mail.com", "john00@mail.com"], ["Mary", "mary@mail.com"], ["John", "johnnybravo@mail.com"]] }
      expected: [["John", "john00@mail.com", "john_newyork@mail.com", "johnsmith@mail.com"], ["Mary", "mary@mail.com"], ["John", "johnnybravo@mail.com"]]
    - input: { accounts: [["Gabe", "Gabe0@m.co", "Gabe3@m.co", "Gabe1@m.co"], ["Kevin", "Kevin3@m.co", "Kevin5@m.co", "Kevin0@m.co"]] }
      expected: [["Gabe", "Gabe0@m.co", "Gabe1@m.co", "Gabe3@m.co"], ["Kevin", "Kevin0@m.co", "Kevin3@m.co", "Kevin5@m.co"]]
  hidden:
    - input: { accounts: [["Alex", "a@mail.com"]] }
      expected: [["Alex", "a@mail.com"]]
    - input: { accounts: [["David", "d1@m.co", "d2@m.co"], ["David", "d2@m.co", "d3@m.co"], ["David", "d3@m.co", "d4@m.co"]] }
      expected: [["David", "d1@m.co", "d2@m.co", "d3@m.co", "d4@m.co"]]
    - input: { accounts: [["A", "a@a.com"], ["B", "b@b.com"], ["A", "a@a.com"]] }
      expected: [["A", "a@a.com"], ["B", "b@b.com"]]

description: |
  Given a list of `accounts` where each element `accounts[i]` is a list of strings, where the first element `accounts[i][0]` is a name, and the rest of the elements are **emails** representing emails of the account.

  Now, we would like to merge these accounts. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name.

  After merging the accounts, return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails **in sorted order**. The accounts themselves can be returned in **any order**.

constraints: |
  - `1 <= accounts.length <= 1000`
  - `2 <= accounts[i].length <= 10`
  - `1 <= accounts[i][j].length <= 30`
  - `accounts[i][0]` consists of English letters
  - `accounts[i][j]` (for `j > 0`) is a valid email

examples:
  - input: 'accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],["John","johnsmith@mail.com","john00@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]'
    output: '[["John","john00@mail.com","john_newyork@mail.com","johnsmith@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]'
    explanation: "The first and second John's are the same person as they have the common email 'johnsmith@mail.com'. The third John and Mary are different people as none of their email addresses are used by other accounts."
  - input: 'accounts = [["Gabe","Gabe0@m.co","Gabe3@m.co","Gabe1@m.co"],["Kevin","Kevin3@m.co","Kevin5@m.co","Kevin0@m.co"],["Ethan","Ethan5@m.co","Ethan4@m.co","Ethan0@m.co"],["Hanzo","Hanzo3@m.co","Hanzo1@m.co","Hanzo0@m.co"],["Fern","Fern5@m.co","Fern1@m.co","Fern0@m.co"]]'
    output: '[["Ethan","Ethan0@m.co","Ethan4@m.co","Ethan5@m.co"],["Gabe","Gabe0@m.co","Gabe1@m.co","Gabe3@m.co"],["Hanzo","Hanzo0@m.co","Hanzo1@m.co","Hanzo3@m.co"],["Kevin","Kevin0@m.co","Kevin3@m.co","Kevin5@m.co"],["Fern","Fern0@m.co","Fern1@m.co","Fern5@m.co"]]'
    explanation: "No accounts share common emails, so each account remains separate. The emails within each account are sorted alphabetically."

explanation:
  intuition: |
    Think of this problem as a **social network** where emails are people and accounts are group chats. If two accounts share even one email, they must belong to the same person — meaning all emails from both accounts should be merged together.

    The key insight is that this is a **connected components** problem in disguise. Imagine each email as a node in a graph. When two emails appear in the same account, they're connected by an edge. Your task is to find all connected components (groups of emails that belong to the same person) and then associate each component with the correct name.

    Think of it like this: if email A and email B are in one account, and email B and email C are in another account, then A, B, and C all belong to the same person — they form a connected chain.

    Two classic approaches work well here:
    - **Union-Find (Disjoint Set Union)**: Efficiently groups emails by treating each email as an element and unioning emails that appear together
    - **DFS/BFS**: Build an adjacency graph of emails and traverse to find all connected emails

  approach: |
    We'll use the **Union-Find** approach, which is particularly elegant for this problem:

    **Step 1: Build email-to-index mapping and union structure**

    - Create a mapping from each email to a unique index
    - Initialise the Union-Find structure where each email starts as its own parent
    - Also track which name is associated with each email

    &nbsp;

    **Step 2: Union emails within each account**

    - For each account, union all emails together
    - Use the first email in the account as the "anchor" — union every other email with it
    - This ensures all emails in an account end up in the same connected component

    &nbsp;

    **Step 3: Group emails by their root parent**

    - For each email, find its root parent using path compression
    - Group all emails that share the same root together
    - Use a dictionary mapping root → list of emails

    &nbsp;

    **Step 4: Build the final result**

    - For each group of emails, get the associated name from any email in the group
    - Sort the emails alphabetically
    - Construct the result as `[name, email1, email2, ...]`

    &nbsp;

    The Union-Find approach is efficient because the union and find operations are nearly O(1) with path compression and union by rank.

  common_pitfalls:
    - title: Assuming Same Name Means Same Person
      description: |
        A critical mistake is merging accounts just because they share the same name.

        For example, `["John", "a@mail.com"]` and `["John", "b@mail.com"]` are **different people** unless they share a common email. The problem explicitly states: "even if two accounts have the same name, they may belong to different people."

        Only merge accounts when they share at least one email address.
      wrong_approach: "Grouping accounts by name"
      correct_approach: "Group by shared emails using Union-Find or graph traversal"

    - title: Missing Transitive Connections
      description: |
        If account 1 has emails `[a, b]` and account 2 has emails `[b, c]`, then a, b, and c ALL belong to the same person through the transitive connection via `b`.

        A naive approach might only merge direct pairs, missing that `a` and `c` are connected through `b`. Union-Find naturally handles transitivity — when you union `a-b` and `b-c`, calling `find(a)` and `find(c)` will return the same root.
      wrong_approach: "Only checking direct email matches between accounts"
      correct_approach: "Use Union-Find or DFS to capture all transitive connections"

    - title: Forgetting to Sort Emails
      description: |
        The problem requires emails within each merged account to be in **sorted order**. It's easy to forget this step after successfully grouping the emails.

        Always sort the email list before constructing the final result.
      wrong_approach: "Returning emails in arbitrary order"
      correct_approach: "Sort emails alphabetically before adding to result"

    - title: Inefficient Union-Find Without Optimizations
      description: |
        A basic Union-Find implementation without path compression or union by rank can degrade to O(n) per operation, making the overall solution O(n²).

        With path compression (flattening the tree during `find`) and union by rank (attaching smaller trees under larger ones), operations become nearly O(1) amortized.
      wrong_approach: "Basic Union-Find without optimizations"
      correct_approach: "Use path compression and union by rank for O(α(n)) operations"

  key_takeaways:
    - "**Connected components pattern**: When you need to group items by shared relationships, think Union-Find or graph traversal"
    - "**Union-Find efficiency**: Path compression and union by rank make Union-Find nearly O(1) per operation — ideal for grouping problems"
    - "**Don't trust names**: In real-world data, names are not unique identifiers — only concrete links (like shared emails) can establish identity"
    - "**Transitive relationships**: Union-Find elegantly handles chains of relationships without explicit graph construction"

  time_complexity: "O(n × k × α(n × k)) where `n` is the number of accounts and `k` is the average number of emails per account. The `α` (inverse Ackermann) function grows so slowly it's effectively constant."
  space_complexity: "O(n × k). We store each email once in the parent dictionary and once in the grouping phase."

solutions:
  - approach_name: Union-Find
    is_optimal: true
    code: |
      def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
          # Union-Find helper functions with path compression
          def find(x: str) -> str:
              # Find root with path compression
              if parent[x] != x:
                  parent[x] = find(parent[x])  # Flatten the tree
              return parent[x]

          def union(x: str, y: str) -> None:
              # Union two emails by connecting their roots
              root_x, root_y = find(x), find(y)
              if root_x != root_y:
                  # Union by rank for efficiency
                  if rank[root_x] < rank[root_y]:
                      parent[root_x] = root_y
                  elif rank[root_x] > rank[root_y]:
                      parent[root_y] = root_x
                  else:
                      parent[root_y] = root_x
                      rank[root_x] += 1

          # Initialise Union-Find structures
          parent = {}  # Maps email -> parent email
          rank = {}    # Tracks tree depth for union by rank
          email_to_name = {}  # Maps email -> account name

          # Process each account
          for account in accounts:
              name = account[0]
              first_email = account[1]

              for email in account[1:]:
                  # Initialise email if not seen before
                  if email not in parent:
                      parent[email] = email
                      rank[email] = 0
                  email_to_name[email] = name

                  # Union this email with the first email in the account
                  union(first_email, email)

          # Group emails by their root parent
          from collections import defaultdict
          groups = defaultdict(list)
          for email in parent:
              root = find(email)
              groups[root].append(email)

          # Build result: [name, sorted emails...]
          result = []
          for root, emails in groups.items():
              name = email_to_name[root]
              # Sort emails alphabetically as required
              result.append([name] + sorted(emails))

          return result
    explanation: |
      **Time Complexity:** O(n × k × α(n × k)) — Each union/find operation is nearly O(1) with path compression and union by rank. We process each email once.

      **Space Complexity:** O(n × k) — We store each unique email in the parent dictionary, rank dictionary, and email-to-name mapping.

      The Union-Find approach efficiently groups emails by maintaining a forest of trees where each tree represents a connected component. Path compression ensures trees stay flat, and union by rank prevents worst-case linear trees.

  - approach_name: DFS Graph Traversal
    is_optimal: true
    code: |
      def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
          from collections import defaultdict

          # Build adjacency list: email -> set of connected emails
          graph = defaultdict(set)
          email_to_name = {}

          for account in accounts:
              name = account[0]
              first_email = account[1]

              for email in account[1:]:
                  # Connect all emails in this account to the first email
                  graph[first_email].add(email)
                  graph[email].add(first_email)
                  email_to_name[email] = name

          # DFS to find all connected emails
          def dfs(email: str, component: list[str]) -> None:
              visited.add(email)
              component.append(email)
              for neighbor in graph[email]:
                  if neighbor not in visited:
                      dfs(neighbor, component)

          visited = set()
          result = []

          # Find all connected components
          for email in graph:
              if email not in visited:
                  component = []
                  dfs(email, component)
                  # Get name from any email in the component
                  name = email_to_name[component[0]]
                  # Sort emails and build result
                  result.append([name] + sorted(component))

          return result
    explanation: |
      **Time Complexity:** O(n × k × log(n × k)) — Building the graph is O(n × k), DFS visits each email once O(n × k), and sorting each component adds the log factor.

      **Space Complexity:** O(n × k) — The graph stores edges between emails, and the visited set tracks processed emails.

      This approach explicitly builds a graph where edges connect emails that appear together in an account. DFS then finds all connected components. While conceptually clearer than Union-Find, it requires more memory for the adjacency list.

  - approach_name: BFS Graph Traversal
    is_optimal: false
    code: |
      def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
          from collections import defaultdict, deque

          # Build adjacency list
          graph = defaultdict(set)
          email_to_name = {}

          for account in accounts:
              name = account[0]
              first_email = account[1]

              for email in account[1:]:
                  graph[first_email].add(email)
                  graph[email].add(first_email)
                  email_to_name[email] = name

          # BFS to find connected component
          def bfs(start: str) -> list[str]:
              component = []
              queue = deque([start])
              visited.add(start)

              while queue:
                  email = queue.popleft()
                  component.append(email)
                  for neighbor in graph[email]:
                      if neighbor not in visited:
                          visited.add(neighbor)
                          queue.append(neighbor)

              return component

          visited = set()
          result = []

          for email in graph:
              if email not in visited:
                  component = bfs(email)
                  name = email_to_name[component[0]]
                  result.append([name] + sorted(component))

          return result
    explanation: |
      **Time Complexity:** O(n × k × log(n × k)) — Same as DFS: graph construction, traversal, and sorting.

      **Space Complexity:** O(n × k) — Graph storage plus the BFS queue in the worst case.

      BFS achieves the same result as DFS but uses a queue instead of recursion. This can be preferable when recursion depth is a concern, though for this problem's constraints (≤1000 accounts), either works fine.