253 lines
10 KiB
YAML
253 lines
10 KiB
YAML
title: Find the Duplicate Number
|
|
slug: find-the-duplicate-number
|
|
difficulty: medium
|
|
leetcode_id: 287
|
|
leetcode_url: https://leetcode.com/problems/find-the-duplicate-number/
|
|
categories:
|
|
- arrays
|
|
- two-pointers
|
|
patterns:
|
|
- slug: fast-slow-pointers
|
|
is_optimal: false
|
|
- slug: binary-search
|
|
is_optimal: true
|
|
|
|
function_signature: "def find_duplicate(nums: list[int]) -> int:"
|
|
|
|
test_cases:
|
|
visible:
|
|
- input: { nums: [1, 3, 4, 2, 2] }
|
|
expected: 2
|
|
- input: { nums: [3, 1, 3, 4, 2] }
|
|
expected: 3
|
|
- input: { nums: [3, 3, 3, 3, 3] }
|
|
expected: 3
|
|
hidden:
|
|
- input: { nums: [1, 1] }
|
|
expected: 1
|
|
- input: { nums: [2, 2, 2, 2, 2] }
|
|
expected: 2
|
|
- input: { nums: [1, 4, 4, 2, 4] }
|
|
expected: 4
|
|
- input: { nums: [1, 2, 3, 4, 5, 6, 7, 8, 9, 5] }
|
|
expected: 5
|
|
- input: { nums: [2, 5, 9, 6, 9, 3, 8, 9, 7, 1] }
|
|
expected: 9
|
|
- input: { nums: [1, 1, 2] }
|
|
expected: 1
|
|
|
|
description: |
|
|
Given an array of integers `nums` containing `n + 1` integers where each integer is in the range `[1, n]` inclusive.
|
|
|
|
There is only **one repeated number** in `nums`, return *this repeated number*.
|
|
|
|
You must solve the problem **without** modifying the array `nums` and using only constant extra space.
|
|
|
|
constraints: |
|
|
- `1 <= n <= 10^5`
|
|
- `nums.length == n + 1`
|
|
- `1 <= nums[i] <= n`
|
|
- All the integers in `nums` appear only **once** except for **precisely one integer** which appears **two or more** times
|
|
|
|
examples:
|
|
- input: "nums = [1,3,4,2,2]"
|
|
output: "2"
|
|
explanation: "The number 2 appears twice in the array."
|
|
- input: "nums = [3,1,3,4,2]"
|
|
output: "3"
|
|
explanation: "The number 3 appears twice in the array."
|
|
- input: "nums = [3,3,3,3,3]"
|
|
output: "3"
|
|
explanation: "The number 3 appears five times in the array."
|
|
|
|
explanation:
|
|
intuition: |
|
|
This problem has a beautiful constraint: the array has `n + 1` elements but values are only in the range `[1, n]`. By the **Pigeonhole Principle**, at least one value must repeat.
|
|
|
|
The key insight is to view the array as a **linked list** where each value points to the next index. Since values are in `[1, n]` and we have indices `[0, n]`, treating `nums[i]` as "next pointer" creates a valid linked structure.
|
|
|
|
Think of it like this: if we start at index `0` and repeatedly jump to `nums[current_index]`, we create a sequence. Because one number repeats, two different indices point to the same location — this creates a **cycle**! The duplicate number is the entry point of this cycle.
|
|
|
|
For example, with `nums = [1,3,4,2,2]`:
|
|
- Index 0 → value 1 → jump to index 1
|
|
- Index 1 → value 3 → jump to index 3
|
|
- Index 3 → value 2 → jump to index 2
|
|
- Index 2 → value 4 → jump to index 4
|
|
- Index 4 → value 2 → jump to index 2 (cycle!)
|
|
|
|
The cycle exists because both index 3 and index 4 have value `2`. Floyd's Tortoise and Hare algorithm finds exactly where this cycle begins.
|
|
|
|
approach: |
|
|
We solve this using **Floyd's Cycle Detection** (Tortoise and Hare):
|
|
|
|
**Step 1: Detect the cycle**
|
|
|
|
- `slow`: Moves one step at a time (`slow = nums[slow]`)
|
|
- `fast`: Moves two steps at a time (`fast = nums[nums[fast]]`)
|
|
- Both start at index `0`
|
|
- Keep moving until they meet — this proves a cycle exists
|
|
|
|
|
|
|
|
**Step 2: Find the cycle entrance**
|
|
|
|
- Reset `slow` to index `0`, keep `fast` at the meeting point
|
|
- Move both pointers one step at a time
|
|
- The point where they meet again is the duplicate number
|
|
|
|
|
|
|
|
**Why does this work?**
|
|
|
|
Let's say the distance from start to cycle entrance is `F`, and the cycle length is `C`. When slow and fast first meet:
|
|
- Slow has traveled `F + a` steps (where `a` is distance into the cycle)
|
|
- Fast has traveled `2(F + a)` steps
|
|
- Since fast is in the cycle: `2(F + a) - (F + a) = C`, so `F + a = C`
|
|
|
|
This means `F = C - a`. When we reset slow to start and both move at the same speed, slow travels `F` steps to reach the entrance, while fast travels `F = C - a` steps from its position `a` into the cycle — also reaching the entrance!
|
|
|
|
|
|
|
|
**Step 3: Return the result**
|
|
|
|
- The meeting point in phase 2 is the duplicate value
|
|
|
|
common_pitfalls:
|
|
- title: Using Extra Space
|
|
description: |
|
|
A common first instinct is to use a hash set to track seen numbers:
|
|
|
|
```python
|
|
seen = set()
|
|
for num in nums:
|
|
if num in seen:
|
|
return num
|
|
seen.add(num)
|
|
```
|
|
|
|
While this works and runs in O(n) time, it uses O(n) space. The problem explicitly requires **O(1) space**, so this approach violates the constraints.
|
|
wrong_approach: "Hash set to track seen numbers"
|
|
correct_approach: "Floyd's cycle detection using the array itself"
|
|
|
|
- title: Modifying the Array
|
|
description: |
|
|
Another tempting approach is to mark visited indices by negating values:
|
|
|
|
```python
|
|
for num in nums:
|
|
idx = abs(num)
|
|
if nums[idx] < 0:
|
|
return idx
|
|
nums[idx] = -nums[idx]
|
|
```
|
|
|
|
This is O(n) time and O(1) space, but it **modifies the input array**, which the problem forbids. The cycle detection approach leaves the array untouched.
|
|
wrong_approach: "Negating values to mark as visited"
|
|
correct_approach: "Read-only traversal with two pointers"
|
|
|
|
- title: Sorting the Array
|
|
description: |
|
|
Sorting and finding adjacent duplicates is intuitive but has two problems:
|
|
- It modifies the array (or requires O(n) space for a copy)
|
|
- It's O(n log n) time, not optimal
|
|
|
|
The cycle detection method achieves O(n) time with O(1) space without modification.
|
|
wrong_approach: "Sort and find adjacent duplicates"
|
|
correct_approach: "Floyd's algorithm for O(n) time, O(1) space"
|
|
|
|
- title: Confusing Index with Value
|
|
description: |
|
|
In Floyd's algorithm, we treat values as pointers to indices. A common mistake is confusing when to use the value versus the index.
|
|
|
|
Remember: `slow = nums[slow]` means "jump to the index that equals the current value." The duplicate is a **value**, not an index — it's what gets returned after phase 2.
|
|
|
|
key_takeaways:
|
|
- "**Cycle detection pattern**: When array values can be treated as pointers (value in valid index range), consider Floyd's algorithm"
|
|
- "**Pigeonhole Principle**: With `n + 1` items in `n` slots, at least one slot must have multiple items — guaranteeing a duplicate exists"
|
|
- "**Creative problem reframing**: Transforming an array duplicate problem into a linked list cycle problem unlocks an elegant O(1) space solution"
|
|
- "**Two-phase approach**: First detect *that* a cycle exists (fast catches slow), then find *where* it starts (both at same speed)"
|
|
|
|
time_complexity: "O(n). Each pointer traverses at most O(n) steps in both phases."
|
|
space_complexity: "O(1). Only two pointer variables are used, regardless of input size."
|
|
|
|
solutions:
|
|
- approach_name: Floyd's Cycle Detection
|
|
is_optimal: true
|
|
code: |
|
|
def find_duplicate(nums: list[int]) -> int:
|
|
# Phase 1: Find the intersection point in the cycle
|
|
slow = nums[0]
|
|
fast = nums[0]
|
|
|
|
# Move slow by 1, fast by 2 until they meet
|
|
while True:
|
|
slow = nums[slow] # One step
|
|
fast = nums[nums[fast]] # Two steps
|
|
if slow == fast:
|
|
break
|
|
|
|
# Phase 2: Find the entrance to the cycle (the duplicate)
|
|
slow = nums[0] # Reset slow to start
|
|
|
|
# Move both at same speed until they meet at cycle entrance
|
|
while slow != fast:
|
|
slow = nums[slow]
|
|
fast = nums[fast]
|
|
|
|
# The meeting point is the duplicate number
|
|
return slow
|
|
explanation: |
|
|
**Time Complexity:** O(n) — Each pointer visits at most n nodes in each phase.
|
|
|
|
**Space Complexity:** O(1) — Only two pointer variables used.
|
|
|
|
By treating array values as "next pointers," we transform this into a cycle detection problem. The duplicate causes a cycle because two indices point to the same value. Floyd's algorithm finds the cycle entrance in linear time with constant space.
|
|
|
|
- approach_name: Binary Search on Value Range
|
|
is_optimal: false
|
|
code: |
|
|
def find_duplicate(nums: list[int]) -> int:
|
|
# Search the value range [1, n], not the array indices
|
|
low, high = 1, len(nums) - 1
|
|
|
|
while low < high:
|
|
mid = (low + high) // 2
|
|
|
|
# Count numbers <= mid
|
|
count = sum(1 for num in nums if num <= mid)
|
|
|
|
# If count > mid, duplicate is in [low, mid]
|
|
# Otherwise, duplicate is in [mid+1, high]
|
|
if count > mid:
|
|
high = mid
|
|
else:
|
|
low = mid + 1
|
|
|
|
return low
|
|
explanation: |
|
|
**Time Complexity:** O(n log n) — Binary search over n values, each iteration scans n elements.
|
|
|
|
**Space Complexity:** O(1) — Only a few variables used.
|
|
|
|
This approach binary searches the *value* range, not the array. If there are more than `mid` numbers in `[1, mid]`, the duplicate must be in that range (Pigeonhole Principle). While not optimal, this demonstrates binary search on answer space rather than on array indices.
|
|
|
|
- approach_name: Hash Set
|
|
is_optimal: false
|
|
code: |
|
|
def find_duplicate(nums: list[int]) -> int:
|
|
seen = set()
|
|
|
|
for num in nums:
|
|
# If we've seen this number before, it's the duplicate
|
|
if num in seen:
|
|
return num
|
|
seen.add(num)
|
|
|
|
return -1 # Should never reach here given constraints
|
|
explanation: |
|
|
**Time Complexity:** O(n) — Single pass through the array.
|
|
|
|
**Space Complexity:** O(n) — Hash set stores up to n elements.
|
|
|
|
The most intuitive approach: track seen numbers and return when we find a repeat. While this violates the O(1) space constraint, it's included to show the trade-off between space and algorithmic complexity. Understanding why this isn't acceptable motivates learning Floyd's algorithm.
|