codetutor/backend/data/questions/first-missing-positive.yaml

title: First Missing Positive
slug: first-missing-positive
difficulty: hard
leetcode_id: 41
leetcode_url: https://leetcode.com/problems/first-missing-positive/
categories:
  - arrays
  - hash-tables
patterns:
  - matrix-traversal

description: |
  Given an unsorted integer array `nums`, return the *smallest positive integer* that is *not present* in `nums`.

  You must implement an algorithm that runs in `O(n)` time and uses `O(1)` auxiliary space.

constraints: |
  - `1 <= nums.length <= 10^5`
  - `-2^31 <= nums[i] <= 2^31 - 1`

examples:
  - input: "nums = [1,2,0]"
    output: "3"
    explanation: "The numbers in the range [1,2] are all in the array."
  - input: "nums = [3,4,-1,1]"
    output: "2"
    explanation: "1 is in the array but 2 is missing."
  - input: "nums = [7,8,9,11,12]"
    output: "1"
    explanation: "The smallest positive integer 1 is missing."

explanation:
  intuition: |
    At first glance, this problem seems straightforward — just find the smallest positive integer not in the array. But the real challenge lies in the **O(n) time and O(1) space** constraints. These constraints rule out sorting (O(n log n)) and hash sets (O(n) space).

    The key insight is to **use the array itself as a hash table**. Think of it like assigning seats in a row: if you have `n` seats numbered 1 through `n`, you want each person with ticket number `i` to sit in seat `i`. After everyone is seated, you walk through the row and find the first empty seat — that's your answer.

    Why does this work? The first missing positive must be in the range `[1, n+1]` where `n` is the array length. If all numbers 1 through `n` are present, the answer is `n+1`. Otherwise, some number in `[1, n]` is missing, and we want the smallest one.

    By placing each value `x` at index `x-1` (so value `1` goes to index `0`, value `2` goes to index `1`, etc.), we transform the array into a lookup table. Then a single scan reveals the first position where the value doesn't match its expected index.

  approach: |
    We solve this using **Cyclic Sort** (in-place rearrangement):

    **Step 1: Rearrange the array**

    - Iterate through each position in the array
    - For each element `nums[i]`, if it's a positive integer in the range `[1, n]` and not already in its correct position, swap it to where it belongs
    - Continue swapping at the current position until the element there is either out of range or already correct
    - This ensures each valid value ends up at index `value - 1`

    &nbsp;

    **Step 2: Find the first missing positive**

    - Scan through the rearranged array
    - The first index `i` where `nums[i] != i + 1` indicates that `i + 1` is missing
    - Return `i + 1` as the answer

    &nbsp;

    **Step 3: Handle the all-present case**

    - If all positions contain their expected values (1, 2, 3, ..., n), the answer is `n + 1`

    &nbsp;

    The cyclic sort approach works because we're essentially building a perfect hash function: value `x` maps to index `x - 1`. By rearranging in-place, we use constant extra space while achieving linear time.

  common_pitfalls:
    - title: Using a Hash Set
      description: |
        The most natural approach is to use a hash set to store all positive numbers, then iterate from 1 upward to find the first missing:

        ```python
        seen = set(nums)
        for i in range(1, len(nums) + 2):
            if i not in seen:
                return i
        ```

        While this is O(n) time, it uses **O(n) space** for the hash set, violating the space constraint. The problem explicitly requires O(1) auxiliary space.
      wrong_approach: "Hash set for O(1) lookup"
      correct_approach: "Use the array itself as a hash table via cyclic sort"

    - title: Sorting the Array
      description: |
        Another tempting approach is to sort the array first, then scan for the first missing positive:

        ```python
        nums.sort()
        # Find first missing...
        ```

        Sorting takes **O(n log n)** time, which violates the O(n) time constraint. Even if you're okay with that, this approach still requires careful handling of duplicates and negatives.
      wrong_approach: "Sort first, then scan"
      correct_approach: "Cyclic sort achieves O(n) time"

    - title: Infinite Loop During Swapping
      description: |
        When implementing the swap logic, you must check if the target position already contains the correct value:

        ```python
        # Wrong: may infinite loop if duplicates exist
        while 1 <= nums[i] <= n:
            swap(nums[i], nums[nums[i] - 1])

        # Correct: stop if already in place or duplicate
        while 1 <= nums[i] <= n and nums[i] != nums[nums[i] - 1]:
            swap(...)
        ```

        Without the second condition, swapping identical values creates an infinite loop.
      wrong_approach: "Only check range bounds"
      correct_approach: "Also check if target position already has the correct value"

    - title: Forgetting the n+1 Case
      description: |
        If the array contains exactly [1, 2, 3, ..., n], then no number in the array is missing — the answer is `n + 1`. Make sure your final scan handles this edge case, typically by returning `n + 1` if the entire array is correctly positioned.
      wrong_approach: "Only scan the array without a fallback"
      correct_approach: "Return n + 1 if all positions are correct"

  key_takeaways:
    - "**Cyclic sort pattern**: When values have a natural position (like 1 to n mapping to indices 0 to n-1), consider rearranging the array in-place"
    - "**Array as hash table**: The array itself can serve as a constant-space lookup structure when the value range is bounded"
    - "**Constraint-driven design**: The O(1) space requirement is the key hint that we must modify the input array rather than use auxiliary data structures"
    - "**Related problems**: This technique applies to finding duplicates, missing numbers, and other permutation-based problems"

  time_complexity: "O(n). Each element is swapped at most once to its correct position, and we make two linear passes through the array."
  space_complexity: "O(1). We only use a constant number of variables; all rearrangement happens in-place."

solutions:
  - approach_name: Cyclic Sort
    is_optimal: true
    code: |
      def first_missing_positive(nums: list[int]) -> int:
          n = len(nums)

          # Phase 1: Place each value at its correct index
          # Value x should be at index x-1
          for i in range(n):
              # Keep swapping until current element is in place or invalid
              while 1 <= nums[i] <= n and nums[i] != nums[nums[i] - 1]:
                  # Swap nums[i] to its correct position
                  correct_idx = nums[i] - 1
                  nums[i], nums[correct_idx] = nums[correct_idx], nums[i]

          # Phase 2: Find first position where value doesn't match index + 1
          for i in range(n):
              if nums[i] != i + 1:
                  return i + 1

          # All values 1 to n are present, so answer is n + 1
          return n + 1
    explanation: |
      **Time Complexity:** O(n) — Although there's a nested while loop, each element is moved at most once to its final position, giving O(n) total swaps.

      **Space Complexity:** O(1) — Only a few variables are used; the array is modified in-place.

      The algorithm works in two phases: first, we rearrange the array so that value `i` sits at index `i-1`. Then we scan to find the first mismatch. This clever use of the input array as a hash table satisfies both the time and space constraints.

  - approach_name: Hash Set
    is_optimal: false
    code: |
      def first_missing_positive(nums: list[int]) -> int:
          # Store all positive numbers in a set
          num_set = set(nums)

          # Check each positive integer starting from 1
          for i in range(1, len(nums) + 2):
              if i not in num_set:
                  return i

          # This line is never reached given the loop bounds
          return len(nums) + 1
    explanation: |
      **Time Complexity:** O(n) — Building the set and scanning are both linear.

      **Space Complexity:** O(n) — The hash set stores up to n elements.

      This approach is intuitive and correct, but uses O(n) extra space, violating the problem's constraints. It's included to illustrate the natural solution that the cyclic sort approach improves upon.

  - approach_name: Index Marking
    is_optimal: true
    code: |
      def first_missing_positive(nums: list[int]) -> int:
          n = len(nums)

          # Step 1: Replace non-positive and out-of-range values with n+1
          for i in range(n):
              if nums[i] <= 0 or nums[i] > n:
                  nums[i] = n + 1

          # Step 2: Mark presence by negating values at corresponding indices
          for i in range(n):
              val = abs(nums[i])
              if val <= n:
                  # Mark index val-1 as "seen" by making it negative
                  nums[val - 1] = -abs(nums[val - 1])

          # Step 3: Find first positive value (indicates missing number)
          for i in range(n):
              if nums[i] > 0:
                  return i + 1

          return n + 1
    explanation: |
      **Time Complexity:** O(n) — Three linear passes through the array.

      **Space Complexity:** O(1) — Only modifies the array in-place.

      This alternative approach uses the sign of each element as a flag. After replacing invalid values with `n+1`, we mark the presence of value `x` by negating the element at index `x-1`. Finally, the first positive element indicates the missing number. Both this and cyclic sort are optimal solutions.