title: Find the Duplicate Number slug: find-the-duplicate-number difficulty: medium leetcode_id: 287 leetcode_url: https://leetcode.com/problems/find-the-duplicate-number/ categories: - arrays - two-pointers patterns: - slug: fast-slow-pointers is_optimal: false - slug: binary-search is_optimal: true function_signature: "def find_duplicate(nums: list[int]) -> int:" test_cases: visible: - input: { nums: [1, 3, 4, 2, 2] } expected: 2 - input: { nums: [3, 1, 3, 4, 2] } expected: 3 - input: { nums: [3, 3, 3, 3, 3] } expected: 3 hidden: - input: { nums: [1, 1] } expected: 1 - input: { nums: [2, 2, 2, 2, 2] } expected: 2 - input: { nums: [1, 4, 4, 2, 4] } expected: 4 - input: { nums: [1, 2, 3, 4, 5, 6, 7, 8, 9, 5] } expected: 5 - input: { nums: [2, 5, 9, 6, 9, 3, 8, 9, 7, 1] } expected: 9 - input: { nums: [1, 1, 2] } expected: 1 description: | Given an array of integers `nums` containing `n + 1` integers where each integer is in the range `[1, n]` inclusive. There is only **one repeated number** in `nums`, return *this repeated number*. You must solve the problem **without** modifying the array `nums` and using only constant extra space. constraints: | - `1 <= n <= 10^5` - `nums.length == n + 1` - `1 <= nums[i] <= n` - All the integers in `nums` appear only **once** except for **precisely one integer** which appears **two or more** times examples: - input: "nums = [1,3,4,2,2]" output: "2" explanation: "The number 2 appears twice in the array." - input: "nums = [3,1,3,4,2]" output: "3" explanation: "The number 3 appears twice in the array." - input: "nums = [3,3,3,3,3]" output: "3" explanation: "The number 3 appears five times in the array." explanation: intuition: | This problem has a beautiful constraint: the array has `n + 1` elements but values are only in the range `[1, n]`. By the **Pigeonhole Principle**, at least one value must repeat. The key insight is to view the array as a **linked list** where each value points to the next index. Since values are in `[1, n]` and we have indices `[0, n]`, treating `nums[i]` as "next pointer" creates a valid linked structure. Think of it like this: if we start at index `0` and repeatedly jump to `nums[current_index]`, we create a sequence. Because one number repeats, two different indices point to the same location — this creates a **cycle**! The duplicate number is the entry point of this cycle. For example, with `nums = [1,3,4,2,2]`: - Index 0 → value 1 → jump to index 1 - Index 1 → value 3 → jump to index 3 - Index 3 → value 2 → jump to index 2 - Index 2 → value 4 → jump to index 4 - Index 4 → value 2 → jump to index 2 (cycle!) The cycle exists because both index 3 and index 4 have value `2`. Floyd's Tortoise and Hare algorithm finds exactly where this cycle begins. approach: | We solve this using **Floyd's Cycle Detection** (Tortoise and Hare): **Step 1: Detect the cycle** - `slow`: Moves one step at a time (`slow = nums[slow]`) - `fast`: Moves two steps at a time (`fast = nums[nums[fast]]`) - Both start at index `0` - Keep moving until they meet — this proves a cycle exists   **Step 2: Find the cycle entrance** - Reset `slow` to index `0`, keep `fast` at the meeting point - Move both pointers one step at a time - The point where they meet again is the duplicate number   **Why does this work?** Let's say the distance from start to cycle entrance is `F`, and the cycle length is `C`. When slow and fast first meet: - Slow has traveled `F + a` steps (where `a` is distance into the cycle) - Fast has traveled `2(F + a)` steps - Since fast is in the cycle: `2(F + a) - (F + a) = C`, so `F + a = C` This means `F = C - a`. When we reset slow to start and both move at the same speed, slow travels `F` steps to reach the entrance, while fast travels `F = C - a` steps from its position `a` into the cycle — also reaching the entrance!   **Step 3: Return the result** - The meeting point in phase 2 is the duplicate value common_pitfalls: - title: Using Extra Space description: | A common first instinct is to use a hash set to track seen numbers: ```python seen = set() for num in nums: if num in seen: return num seen.add(num) ``` While this works and runs in O(n) time, it uses O(n) space. The problem explicitly requires **O(1) space**, so this approach violates the constraints. wrong_approach: "Hash set to track seen numbers" correct_approach: "Floyd's cycle detection using the array itself" - title: Modifying the Array description: | Another tempting approach is to mark visited indices by negating values: ```python for num in nums: idx = abs(num) if nums[idx] < 0: return idx nums[idx] = -nums[idx] ``` This is O(n) time and O(1) space, but it **modifies the input array**, which the problem forbids. The cycle detection approach leaves the array untouched. wrong_approach: "Negating values to mark as visited" correct_approach: "Read-only traversal with two pointers" - title: Sorting the Array description: | Sorting and finding adjacent duplicates is intuitive but has two problems: - It modifies the array (or requires O(n) space for a copy) - It's O(n log n) time, not optimal The cycle detection method achieves O(n) time with O(1) space without modification. wrong_approach: "Sort and find adjacent duplicates" correct_approach: "Floyd's algorithm for O(n) time, O(1) space" - title: Confusing Index with Value description: | In Floyd's algorithm, we treat values as pointers to indices. A common mistake is confusing when to use the value versus the index. Remember: `slow = nums[slow]` means "jump to the index that equals the current value." The duplicate is a **value**, not an index — it's what gets returned after phase 2. key_takeaways: - "**Cycle detection pattern**: When array values can be treated as pointers (value in valid index range), consider Floyd's algorithm" - "**Pigeonhole Principle**: With `n + 1` items in `n` slots, at least one slot must have multiple items — guaranteeing a duplicate exists" - "**Creative problem reframing**: Transforming an array duplicate problem into a linked list cycle problem unlocks an elegant O(1) space solution" - "**Two-phase approach**: First detect *that* a cycle exists (fast catches slow), then find *where* it starts (both at same speed)" time_complexity: "O(n). Each pointer traverses at most O(n) steps in both phases." space_complexity: "O(1). Only two pointer variables are used, regardless of input size." solutions: - approach_name: Floyd's Cycle Detection is_optimal: true code: | def find_duplicate(nums: list[int]) -> int: # Phase 1: Find the intersection point in the cycle slow = nums[0] fast = nums[0] # Move slow by 1, fast by 2 until they meet while True: slow = nums[slow] # One step fast = nums[nums[fast]] # Two steps if slow == fast: break # Phase 2: Find the entrance to the cycle (the duplicate) slow = nums[0] # Reset slow to start # Move both at same speed until they meet at cycle entrance while slow != fast: slow = nums[slow] fast = nums[fast] # The meeting point is the duplicate number return slow explanation: | **Time Complexity:** O(n) — Each pointer visits at most n nodes in each phase. **Space Complexity:** O(1) — Only two pointer variables used. By treating array values as "next pointers," we transform this into a cycle detection problem. The duplicate causes a cycle because two indices point to the same value. Floyd's algorithm finds the cycle entrance in linear time with constant space. - approach_name: Binary Search on Value Range is_optimal: false code: | def find_duplicate(nums: list[int]) -> int: # Search the value range [1, n], not the array indices low, high = 1, len(nums) - 1 while low < high: mid = (low + high) // 2 # Count numbers <= mid count = sum(1 for num in nums if num <= mid) # If count > mid, duplicate is in [low, mid] # Otherwise, duplicate is in [mid+1, high] if count > mid: high = mid else: low = mid + 1 return low explanation: | **Time Complexity:** O(n log n) — Binary search over n values, each iteration scans n elements. **Space Complexity:** O(1) — Only a few variables used. This approach binary searches the *value* range, not the array. If there are more than `mid` numbers in `[1, mid]`, the duplicate must be in that range (Pigeonhole Principle). While not optimal, this demonstrates binary search on answer space rather than on array indices. - approach_name: Hash Set is_optimal: false code: | def find_duplicate(nums: list[int]) -> int: seen = set() for num in nums: # If we've seen this number before, it's the duplicate if num in seen: return num seen.add(num) return -1 # Should never reach here given constraints explanation: | **Time Complexity:** O(n) — Single pass through the array. **Space Complexity:** O(n) — Hash set stores up to n elements. The most intuitive approach: track seen numbers and return when we find a repeat. While this violates the O(1) space constraint, it's included to show the trade-off between space and algorithmic complexity. Understanding why this isn't acceptable motivates learning Floyd's algorithm.