Files
codetutor/backend/data/questions/majority-element-ii.yaml

212 lines
9.3 KiB
YAML

title: Majority Element II
slug: majority-element-ii
difficulty: medium
leetcode_id: 229
leetcode_url: https://leetcode.com/problems/majority-element-ii/
categories:
- arrays
- hash-tables
patterns:
- slug: greedy
is_optimal: true
function_signature: "def majority_element(nums: list[int]) -> list[int]:"
test_cases:
visible:
- input: { nums: [3, 2, 3] }
expected: [3]
- input: { nums: [1] }
expected: [1]
- input: { nums: [1, 2] }
expected: [1, 2]
hidden:
- input: { nums: [1, 1, 1, 2, 2, 3] }
expected: [1]
- input: { nums: [1, 2, 3, 4, 5] }
expected: []
- input: { nums: [2, 2, 2, 2] }
expected: [2]
- input: { nums: [1, 1, 2, 2, 3] }
expected: [1, 2]
- input: { nums: [0, 0, 0] }
expected: [0]
- input: { nums: [-1, -1, -1, 2, 2] }
expected: [-1]
- input: { nums: [1, 2, 1, 2, 1, 2, 3] }
expected: [1, 2]
description: |
Given an integer array of size `n`, find all elements that appear **more than** `⌊n/3⌋` times.
examples:
- input: "nums = [3,2,3]"
output: "[3]"
explanation: "The element 3 appears twice out of 3 elements. Since ⌊3/3⌋ = 1, and 2 > 1, the answer is [3]."
- input: "nums = [1]"
output: "[1]"
explanation: "The element 1 appears once out of 1 element. Since ⌊1/3⌋ = 0, and 1 > 0, the answer is [1]."
- input: "nums = [1,2]"
output: "[1,2]"
explanation: "Both 1 and 2 appear once out of 2 elements. Since ⌊2/3⌋ = 0, and 1 > 0, both qualify."
constraints: |
- `1 <= nums.length <= 5 * 10^4`
- `-10^9 <= nums[i] <= 10^9`
explanation:
intuition: |
This problem extends the classic Majority Element problem. Instead of finding elements appearing more than `n/2` times, we're looking for elements appearing more than `n/3` times.
Here's the key mathematical insight: **at most two elements** can appear more than `n/3` times. Why? If three elements each appeared more than `n/3` times, we'd need more than `n` elements total — impossible!
Think of it like a three-way election where a candidate needs more than 33% of votes to win. At most two candidates can achieve this threshold. If all three had over 33%, the percentages would exceed 100%.
This observation allows us to extend the **Boyer-Moore Voting Algorithm** to track two candidates instead of one. We run a "battle royale" where elements compete for two slots. When we encounter a third distinct element, it cancels out one vote from each candidate.
At the end, we verify which candidates (if any) actually exceed the `n/3` threshold — unlike the original problem, there's no guarantee any element qualifies.
approach: |
We solve this using the **Extended Boyer-Moore Voting Algorithm**:
**Step 1: Initialise two candidate slots**
- `candidate1`, `candidate2`: Will store our two potential majority elements
- `count1`, `count2`: Set to `0`, track the "strength" of each candidate
&nbsp;
**Step 2: First pass — find the candidates**
- For each element in the array:
- If it matches `candidate1`, increment `count1`
- Else if it matches `candidate2`, increment `count2`
- Else if `count1 == 0`, adopt this element as `candidate1` and set `count1 = 1`
- Else if `count2 == 0`, adopt this element as `candidate2` and set `count2 = 1`
- Else decrement both `count1` and `count2` (three distinct elements cancel out)
&nbsp;
**Step 3: Second pass — verify the candidates**
- Count actual occurrences of `candidate1` and `candidate2`
- Only include candidates that appear more than `n/3` times in the result
- Unlike the original problem, neither candidate may qualify
&nbsp;
The cancellation logic works because if an element appears more than `n/3` times, it cannot be fully cancelled by all other elements, ensuring it survives as one of the two candidates.
common_pitfalls:
- title: Forgetting the Verification Pass
description: |
Unlike Majority Element I where the majority is guaranteed, this problem may have zero, one, or two valid answers.
For example, with `nums = [1,2,3,4,5]`, no element appears more than `⌊5/3⌋ = 1` time. The Boyer-Moore phase will still produce two candidates, but neither actually qualifies.
Always verify candidates with a second pass to count their actual occurrences.
wrong_approach: "Returning candidates without verification"
correct_approach: "Count actual occurrences and filter by threshold"
- title: Using Hash Map Without Space Constraint Awareness
description: |
A hash map solution works and runs in O(n) time, but uses **O(n) space**. The follow-up specifically asks for O(1) space, which the extended Boyer-Moore algorithm achieves.
The hash map approach is acceptable if space isn't a concern, but the optimal solution uses constant space.
wrong_approach: "Hash map counting with O(n) space"
correct_approach: "Extended Boyer-Moore with O(1) space"
- title: Incorrect Order of Candidate Checks
description: |
The order of checks matters in the first pass. You must check if the element matches existing candidates *before* checking if a slot is available.
If you check `count1 == 0` first, you might reassign `candidate1` to an element that should have been counted under `candidate2`, corrupting your counts.
wrong_approach: "Checking for empty slots before checking for matches"
correct_approach: "Check matches first, then check for empty slots"
- title: Not Handling Duplicate Candidates
description: |
When assigning the second candidate, ensure it's different from the first candidate. If both slots hold the same value, you're effectively only tracking one element.
When `count2 == 0` and you adopt a new candidate, verify it's not equal to `candidate1`.
wrong_approach: "Allowing candidate1 and candidate2 to hold the same value"
correct_approach: "Ensure candidates are always distinct"
key_takeaways:
- "**Mathematical bound**: At most `k-1` elements can appear more than `n/k` times — this generalises Boyer-Moore"
- "**Verification is essential**: Unlike guaranteed-majority problems, always verify candidates when existence isn't guaranteed"
- "**Order of operations matters**: Check existing candidates before checking for empty slots"
- "**Foundation for generalisations**: The same technique extends to finding elements appearing more than `n/4`, `n/5`, etc., by tracking more candidate slots"
time_complexity: "O(n). We make two passes through the array — one for candidate selection, one for verification."
space_complexity: "O(1). We only use a fixed number of variables (two candidates, two counts) regardless of input size."
solutions:
- approach_name: Extended Boyer-Moore Voting
is_optimal: true
code: |
def majority_element(nums: list[int]) -> list[int]:
# At most 2 elements can appear more than n/3 times
candidate1, candidate2 = None, None
count1, count2 = 0, 0
# First pass: find potential candidates
for num in nums:
# Check matches first (order matters!)
if candidate1 == num:
count1 += 1
elif candidate2 == num:
count2 += 1
# Then check for empty slots
elif count1 == 0:
candidate1 = num
count1 = 1
elif count2 == 0:
candidate2 = num
count2 = 1
# Three distinct elements: cancel one from each
else:
count1 -= 1
count2 -= 1
# Second pass: verify candidates actually exceed threshold
threshold = len(nums) // 3
result = []
# Count actual occurrences
count1 = sum(1 for num in nums if num == candidate1)
count2 = sum(1 for num in nums if num == candidate2)
# Only include if they exceed n/3
if count1 > threshold:
result.append(candidate1)
if candidate2 != candidate1 and count2 > threshold:
result.append(candidate2)
return result
explanation: |
**Time Complexity:** O(n) — Two passes through the array.
**Space Complexity:** O(1) — Only a fixed number of variables used.
The algorithm extends Boyer-Moore to track two candidates. The key insight is that at most two elements can exceed the `n/3` threshold. When three distinct elements are seen, they cancel each other out. A verification pass confirms which candidates actually qualify.
- approach_name: Hash Map Counting
is_optimal: false
code: |
from collections import Counter
def majority_element(nums: list[int]) -> list[int]:
# Count occurrences of each element
counts = Counter(nums)
threshold = len(nums) // 3
# Return all elements exceeding the threshold
return [num for num, count in counts.items() if count > threshold]
explanation: |
**Time Complexity:** O(n) — Single pass to build the counter.
**Space Complexity:** O(n) — Hash map stores up to n distinct elements.
This approach is intuitive and easy to implement. It counts all elements and filters by the threshold. While correct, it uses more space than the optimal Boyer-Moore solution.