181 lines
7.4 KiB
YAML
181 lines
7.4 KiB
YAML
title: Group Anagrams
|
||
slug: group-anagrams
|
||
difficulty: medium
|
||
leetcode_id: 49
|
||
leetcode_url: https://leetcode.com/problems/group-anagrams/
|
||
categories:
|
||
- strings
|
||
- hash-tables
|
||
- sorting
|
||
patterns:
|
||
- hashing
|
||
|
||
function_signature: "def group_anagrams(strs: list[str]) -> list[list[str]]:"
|
||
|
||
test_cases:
|
||
visible:
|
||
- input: { strs: ["eat", "tea", "tan", "ate", "nat", "bat"] }
|
||
expected: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
|
||
- input: { strs: [""] }
|
||
expected: [[""]]
|
||
- input: { strs: ["a"] }
|
||
expected: [["a"]]
|
||
hidden:
|
||
- input: { strs: ["abc", "bca", "cab", "xyz", "zyx"] }
|
||
expected: [["abc", "bca", "cab"], ["xyz", "zyx"]]
|
||
- input: { strs: ["", ""] }
|
||
expected: [["", ""]]
|
||
- input: { strs: ["listen", "silent", "enlist"] }
|
||
expected: [["listen", "silent", "enlist"]]
|
||
- input: { strs: ["a", "b", "c"] }
|
||
expected: [["a"], ["b"], ["c"]]
|
||
- input: { strs: ["abc", "def", "ghi"] }
|
||
expected: [["abc"], ["def"], ["ghi"]]
|
||
- input: { strs: ["aab", "aba", "baa", "ab", "ba"] }
|
||
expected: [["aab", "aba", "baa"], ["ab", "ba"]]
|
||
|
||
description: |
|
||
Given an array of strings `strs`, group the **anagrams** together. You can return the answer in **any order**.
|
||
|
||
An **anagram** is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once.
|
||
|
||
constraints: |
|
||
- `1 <= strs.length <= 10^4`
|
||
- `0 <= strs[i].length <= 100`
|
||
- `strs[i]` consists of lowercase English letters
|
||
|
||
examples:
|
||
- input: 'strs = ["eat","tea","tan","ate","nat","bat"]'
|
||
output: '[["bat"],["nat","tan"],["ate","eat","tea"]]'
|
||
explanation: "Words with the same letters are grouped together."
|
||
- input: 'strs = [""]'
|
||
output: '[[""]]'
|
||
explanation: "Empty string forms its own group."
|
||
- input: 'strs = ["a"]'
|
||
output: '[["a"]]'
|
||
explanation: "Single character forms its own group."
|
||
|
||
explanation:
|
||
intuition: |
|
||
What makes two words anagrams? They have exactly the same letters in exactly the same quantities. "eat" and "tea" both have one 'e', one 'a', and one 't'.
|
||
|
||
Think of it like this: if you sort the letters of any anagram, you get the same result. `sorted("eat") = "aet"` and `sorted("tea") = "aet"`. This sorted form is a **canonical representation** — a fingerprint that's identical for all anagrams.
|
||
|
||
So the strategy is simple: for each word, compute its fingerprint (sorted letters), and group words with the same fingerprint together. A hash map is perfect for this — the fingerprint is the key, and each key maps to a list of original words.
|
||
|
||
There's an alternative fingerprint: instead of sorting, count each letter's frequency. `"eat"` becomes `(1,0,0,0,1,0,...,1,0,0)` — a tuple of 26 counts. This is O(k) instead of O(k log k), better for long strings.
|
||
|
||
approach: |
|
||
We solve this using **Hash Map with Sorted String Keys**:
|
||
|
||
**Step 1: Create a hash map for grouping**
|
||
|
||
- Use a `defaultdict(list)` so we can append to non-existent keys
|
||
- Keys will be the canonical form (sorted string)
|
||
- Values will be lists of original strings
|
||
|
||
|
||
|
||
**Step 2: Process each string**
|
||
|
||
- For each string `s` in the input:
|
||
- Compute the key: `''.join(sorted(s))`
|
||
- Append the original string to `groups[key]`
|
||
|
||
|
||
|
||
**Step 3: Return all groups**
|
||
|
||
- Return `list(groups.values())` — each value is one anagram group
|
||
|
||
|
||
|
||
Why does sorting work? Two strings are anagrams if and only if they contain the same characters. Sorting arranges characters in a canonical order, so anagrams produce identical sorted strings.
|
||
|
||
common_pitfalls:
|
||
- title: Using Unhashable Types as Dictionary Keys
|
||
description: |
|
||
In Python, `sorted(s)` returns a **list**, which can't be a dictionary key (lists are mutable, hence unhashable).
|
||
|
||
You must convert to a hashable type:
|
||
- `''.join(sorted(s))` → string key
|
||
- `tuple(sorted(s))` → tuple key
|
||
wrong_approach: "groups[sorted(s)].append(s)"
|
||
correct_approach: "groups[''.join(sorted(s))].append(s)"
|
||
|
||
- title: Forgetting Empty Strings
|
||
description: |
|
||
An empty string `""` is a valid input. `sorted("")` returns `[]`, and `''.join([])` returns `""`. The algorithm handles this correctly, but edge case testing is important.
|
||
wrong_approach: "Assuming all strings are non-empty"
|
||
correct_approach: "Empty strings are handled naturally — they form their own group"
|
||
|
||
- title: Using Regular Dict Without Default
|
||
description: |
|
||
With a regular `dict`, you must check if a key exists before appending:
|
||
```python
|
||
if key not in groups:
|
||
groups[key] = []
|
||
groups[key].append(s)
|
||
```
|
||
Using `defaultdict(list)` eliminates this boilerplate.
|
||
wrong_approach: "groups[key].append(s) with regular dict (KeyError)"
|
||
correct_approach: "Use defaultdict(list) for automatic list creation"
|
||
|
||
key_takeaways:
|
||
- "**Canonical form for grouping**: Anagrams share a canonical representation (sorted or counted)"
|
||
- "**Hash map for grouping**: When grouping by some property, use that property as the key"
|
||
- "**Sorting vs counting**: Sorting is O(k log k), counting is O(k) — counting is faster for long strings"
|
||
- "**defaultdict simplifies code**: Eliminates key-existence checks when building lists"
|
||
|
||
time_complexity: "O(n × k log k). We process n strings, and sorting each string of length k takes O(k log k). With the counting approach, this becomes O(n × k)."
|
||
space_complexity: "O(n × k). We store all n strings in the hash map. Each string has length up to k."
|
||
|
||
solutions:
|
||
- approach_name: Sorted String Key
|
||
is_optimal: true
|
||
code: |
|
||
from collections import defaultdict
|
||
|
||
def group_anagrams(strs: list[str]) -> list[list[str]]:
|
||
# Map: sorted string -> list of original strings
|
||
groups = defaultdict(list)
|
||
|
||
for s in strs:
|
||
# All anagrams sort to the same string
|
||
key = ''.join(sorted(s))
|
||
groups[key].append(s)
|
||
|
||
# Return all groups (order doesn't matter)
|
||
return list(groups.values())
|
||
explanation: |
|
||
**Time Complexity:** O(n × k log k) — Sorting each of n strings of average length k.
|
||
|
||
**Space Complexity:** O(n × k) — Storing all strings in the hash map.
|
||
|
||
Sorting gives each string a canonical form. All anagrams produce the same sorted string, so they end up in the same bucket. Simple, readable, and efficient enough for most cases.
|
||
|
||
- approach_name: Character Count Key
|
||
is_optimal: true
|
||
code: |
|
||
from collections import defaultdict
|
||
|
||
def group_anagrams(strs: list[str]) -> list[list[str]]:
|
||
groups = defaultdict(list)
|
||
|
||
for s in strs:
|
||
# Count frequency of each letter (a-z)
|
||
count = [0] * 26
|
||
for c in s:
|
||
count[ord(c) - ord('a')] += 1
|
||
|
||
# Use tuple of counts as key (tuples are hashable)
|
||
groups[tuple(count)].append(s)
|
||
|
||
return list(groups.values())
|
||
explanation: |
|
||
**Time Complexity:** O(n × k) — Counting is O(k) per string, better than O(k log k) sorting.
|
||
|
||
**Space Complexity:** O(n × k) — Same as sorted approach.
|
||
|
||
Instead of sorting, we count the frequency of each letter. Two strings are anagrams if and only if they have identical character counts. The count array is converted to a tuple (hashable) for use as a dictionary key. This is faster for long strings.
|