codetutor/backend/data/questions/group-anagrams.yaml

title: Group Anagrams
slug: group-anagrams
difficulty: medium
leetcode_id: 49
leetcode_url: https://leetcode.com/problems/group-anagrams/
categories:
  - strings
  - hash-tables
  - sorting
patterns:
  - hashing

function_signature: "def group_anagrams(strs: list[str]) -> list[list[str]]:"

test_cases:
  visible:
    - input: { strs: ["eat", "tea", "tan", "ate", "nat", "bat"] }
      expected: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
    - input: { strs: [""] }
      expected: [[""]]
    - input: { strs: ["a"] }
      expected: [["a"]]
  hidden:
    - input: { strs: ["abc", "bca", "cab", "xyz", "zyx"] }
      expected: [["abc", "bca", "cab"], ["xyz", "zyx"]]
    - input: { strs: ["", ""] }
      expected: [["", ""]]
    - input: { strs: ["listen", "silent", "enlist"] }
      expected: [["listen", "silent", "enlist"]]

description: |
  Given an array of strings `strs`, group the **anagrams** together. You can return the answer in **any order**.

  An **anagram** is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once.

constraints: |
  - `1 <= strs.length <= 10^4`
  - `0 <= strs[i].length <= 100`
  - `strs[i]` consists of lowercase English letters

examples:
  - input: 'strs = ["eat","tea","tan","ate","nat","bat"]'
    output: '[["bat"],["nat","tan"],["ate","eat","tea"]]'
    explanation: "Words with the same letters are grouped together."
  - input: 'strs = [""]'
    output: '[[""]]'
    explanation: "Empty string forms its own group."
  - input: 'strs = ["a"]'
    output: '[["a"]]'
    explanation: "Single character forms its own group."

explanation:
  intuition: |
    What makes two words anagrams? They have exactly the same letters in exactly the same quantities. "eat" and "tea" both have one 'e', one 'a', and one 't'.

    Think of it like this: if you sort the letters of any anagram, you get the same result. `sorted("eat") = "aet"` and `sorted("tea") = "aet"`. This sorted form is a **canonical representation** — a fingerprint that's identical for all anagrams.

    So the strategy is simple: for each word, compute its fingerprint (sorted letters), and group words with the same fingerprint together. A hash map is perfect for this — the fingerprint is the key, and each key maps to a list of original words.

    There's an alternative fingerprint: instead of sorting, count each letter's frequency. `"eat"` becomes `(1,0,0,0,1,0,...,1,0,0)` — a tuple of 26 counts. This is O(k) instead of O(k log k), better for long strings.

  approach: |
    We solve this using **Hash Map with Sorted String Keys**:

    **Step 1: Create a hash map for grouping**

    - Use a `defaultdict(list)` so we can append to non-existent keys
    - Keys will be the canonical form (sorted string)
    - Values will be lists of original strings

    &nbsp;

    **Step 2: Process each string**

    - For each string `s` in the input:
      - Compute the key: `''.join(sorted(s))`
      - Append the original string to `groups[key]`

    &nbsp;

    **Step 3: Return all groups**

    - Return `list(groups.values())` — each value is one anagram group

    &nbsp;

    Why does sorting work? Two strings are anagrams if and only if they contain the same characters. Sorting arranges characters in a canonical order, so anagrams produce identical sorted strings.

  common_pitfalls:
    - title: Using Unhashable Types as Dictionary Keys
      description: |
        In Python, `sorted(s)` returns a **list**, which can't be a dictionary key (lists are mutable, hence unhashable).

        You must convert to a hashable type:
        - `''.join(sorted(s))` → string key
        - `tuple(sorted(s))` → tuple key
      wrong_approach: "groups[sorted(s)].append(s)"
      correct_approach: "groups[''.join(sorted(s))].append(s)"

    - title: Forgetting Empty Strings
      description: |
        An empty string `""` is a valid input. `sorted("")` returns `[]`, and `''.join([])` returns `""`. The algorithm handles this correctly, but edge case testing is important.
      wrong_approach: "Assuming all strings are non-empty"
      correct_approach: "Empty strings are handled naturally — they form their own group"

    - title: Using Regular Dict Without Default
      description: |
        With a regular `dict`, you must check if a key exists before appending:
        ```python
        if key not in groups:
            groups[key] = []
        groups[key].append(s)
        ```
        Using `defaultdict(list)` eliminates this boilerplate.
      wrong_approach: "groups[key].append(s) with regular dict (KeyError)"
      correct_approach: "Use defaultdict(list) for automatic list creation"

  key_takeaways:
    - "**Canonical form for grouping**: Anagrams share a canonical representation (sorted or counted)"
    - "**Hash map for grouping**: When grouping by some property, use that property as the key"
    - "**Sorting vs counting**: Sorting is O(k log k), counting is O(k) — counting is faster for long strings"
    - "**defaultdict simplifies code**: Eliminates key-existence checks when building lists"

  time_complexity: "O(n × k log k). We process n strings, and sorting each string of length k takes O(k log k). With the counting approach, this becomes O(n × k)."
  space_complexity: "O(n × k). We store all n strings in the hash map. Each string has length up to k."

solutions:
  - approach_name: Sorted String Key
    is_optimal: true
    code: |
      from collections import defaultdict

      def group_anagrams(strs: list[str]) -> list[list[str]]:
          # Map: sorted string -> list of original strings
          groups = defaultdict(list)

          for s in strs:
              # All anagrams sort to the same string
              key = ''.join(sorted(s))
              groups[key].append(s)

          # Return all groups (order doesn't matter)
          return list(groups.values())
    explanation: |
      **Time Complexity:** O(n × k log k) — Sorting each of n strings of average length k.

      **Space Complexity:** O(n × k) — Storing all strings in the hash map.

      Sorting gives each string a canonical form. All anagrams produce the same sorted string, so they end up in the same bucket. Simple, readable, and efficient enough for most cases.

  - approach_name: Character Count Key
    is_optimal: true
    code: |
      from collections import defaultdict

      def group_anagrams(strs: list[str]) -> list[list[str]]:
          groups = defaultdict(list)

          for s in strs:
              # Count frequency of each letter (a-z)
              count = [0] * 26
              for c in s:
                  count[ord(c) - ord('a')] += 1

              # Use tuple of counts as key (tuples are hashable)
              groups[tuple(count)].append(s)

          return list(groups.values())
    explanation: |
      **Time Complexity:** O(n × k) — Counting is O(k) per string, better than O(k log k) sorting.

      **Space Complexity:** O(n × k) — Same as sorted approach.

      Instead of sorting, we count the frequency of each letter. Two strings are anagrams if and only if they have identical character counts. The count array is converted to a tuple (hashable) for use as a dictionary key. This is faster for long strings.