codetutor/backend/data/questions/missing-number.yaml

title: Missing Number
slug: missing-number
difficulty: easy
leetcode_id: 268
leetcode_url: https://leetcode.com/problems/missing-number/
categories:
  - arrays
  - math
patterns:
  - prefix-sum

function_signature: "def missing_number(nums: list[int]) -> int:"

test_cases:
  visible:
    - input: { nums: [3, 0, 1] }
      expected: 2
    - input: { nums: [0, 1] }
      expected: 2
    - input: { nums: [9, 6, 4, 2, 3, 5, 7, 0, 1] }
      expected: 8
  hidden:
    - input: { nums: [0] }
      expected: 1
    - input: { nums: [1] }
      expected: 0
    - input: { nums: [0, 1, 2, 3, 5] }
      expected: 4

description: |
  Given an array `nums` containing `n` distinct numbers in the range `[0, n]`, return *the only number in the range that is missing from the array*.

  **Example 1:**

  **Input:** `nums = [3,0,1]`

  **Output:** `2`

  **Explanation:** `n = 3` since there are 3 numbers, so all numbers are in the range `[0,3]`. 2 is the missing number in the range since it does not appear in `nums`.

  **Example 2:**

  **Input:** `nums = [0,1]`

  **Output:** `2`

  **Explanation:** `n = 2` since there are 2 numbers, so all numbers are in the range `[0,2]`. 2 is the missing number in the range since it does not appear in `nums`.

  **Example 3:**

  **Input:** `nums = [9,6,4,2,3,5,7,0,1]`

  **Output:** `8`

  **Explanation:** `n = 9` since there are 9 numbers, so all numbers are in the range `[0,9]`. 8 is the missing number in the range since it does not appear in `nums`.

constraints: |
  - `n == nums.length`
  - `1 <= n <= 10^4`
  - `0 <= nums[i] <= n`
  - All the numbers of `nums` are **unique**

examples:
  - input: "nums = [3,0,1]"
    output: "2"
    explanation: "n = 3 since there are 3 numbers, so all numbers are in the range [0,3]. 2 is the missing number."
  - input: "nums = [0,1]"
    output: "2"
    explanation: "n = 2 since there are 2 numbers, so all numbers are in the range [0,2]. 2 is the missing number."
  - input: "nums = [9,6,4,2,3,5,7,0,1]"
    output: "8"
    explanation: "n = 9 since there are 9 numbers, so all numbers are in the range [0,9]. 8 is the missing number."

explanation:
  intuition: |
    Imagine you have a row of numbered lockers from `0` to `n`, and each locker should contain a ball with a matching number. Someone has stolen exactly one ball, and you need to figure out which one is missing.

    The key insight is that we **know what the complete set should look like**. If no ball were missing, the sum of all numbers from `0` to `n` would follow a well-known mathematical formula: `n * (n + 1) / 2`. This is the sum of an arithmetic sequence.

    Think of it like this: if you calculate the expected total and then subtract everything you actually have, whatever remains must be the missing piece. It's like knowing you should have $55 in your wallet (1+2+3+...+10), counting what's there and finding $47, and immediately knowing the missing bill is $8.

    This approach transforms a search problem into a simple arithmetic problem, which is both elegant and efficient.

  approach: |
    We solve this using the **Gauss Sum Formula**:

    **Step 1: Calculate the expected sum**

    - Use the formula `n * (n + 1) / 2` to compute what the sum would be if all numbers from `0` to `n` were present
    - `n` is the length of the input array (since we have `n` numbers in range `[0, n]`, one is missing)

    &nbsp;

    **Step 2: Calculate the actual sum**

    - Sum all elements currently in the array
    - This can be done with a simple loop or the built-in `sum()` function

    &nbsp;

    **Step 3: Find the difference**

    - Subtract the actual sum from the expected sum
    - The result is the missing number

    &nbsp;

    This works because addition is commutative and associative — the order of elements doesn't matter. Whatever is "missing" from the actual sum compared to the expected sum must be our answer.

  common_pitfalls:
    - title: Using a Hash Set
      description: |
        A common approach is to put all numbers in a hash set, then check each number from `0` to `n` to find which one is missing.

        While this works and runs in O(n) time, it uses **O(n) extra space** for the set. The problem's follow-up asks for O(1) space, which the sum approach achieves.
      wrong_approach: "Hash set to track seen numbers"
      correct_approach: "Mathematical sum formula for O(1) space"

    - title: Sorting First
      description: |
        Another instinct is to sort the array and then scan for a gap where `nums[i] != i`.

        This works but takes **O(n log n) time** for the sort. The sum approach achieves O(n) time with O(1) space — strictly better on both dimensions.
      wrong_approach: "Sort then scan for gaps"
      correct_approach: "Sum formula for O(n) time"

    - title: Integer Overflow Concerns
      description: |
        With `n` up to `10^4`, the expected sum could be around `10^4 * 10^4 / 2 = 5 * 10^7`. This fits comfortably in a 32-bit integer (max ~2.1 billion), so overflow isn't a concern here.

        For larger constraints, you might need to interleave addition and subtraction to avoid overflow, or use the XOR approach which never overflows.

  key_takeaways:
    - "**Gauss sum formula**: `n * (n + 1) / 2` gives the sum of integers from `0` to `n` — memorise this for interview problems"
    - "**Think mathematically**: When you know what the complete set should look like, arithmetic properties can replace searching"
    - "**XOR alternative**: This problem can also be solved with XOR (a ^ a = 0, a ^ 0 = a) — XOR all indices with all values"
    - "**Space optimisation**: Many problems with O(n) hash set solutions have O(1) mathematical alternatives"

  time_complexity: "O(n). We traverse the array once to compute the sum."
  space_complexity: "O(1). We only use a constant number of variables regardless of input size."

solutions:
  - approach_name: Gauss Sum Formula
    is_optimal: true
    code: |
      def missing_number(nums: list[int]) -> int:
          n = len(nums)
          # Expected sum if all numbers 0 to n were present
          expected_sum = n * (n + 1) // 2
          # Actual sum of elements in the array
          actual_sum = sum(nums)
          # The difference is the missing number
          return expected_sum - actual_sum
    explanation: |
      **Time Complexity:** O(n) — Single pass to sum all elements.

      **Space Complexity:** O(1) — Only storing two integers.

      We use the mathematical formula for the sum of an arithmetic sequence. The missing number is simply the difference between what we expect and what we have.

  - approach_name: XOR Bit Manipulation
    is_optimal: true
    code: |
      def missing_number(nums: list[int]) -> int:
          n = len(nums)
          result = n  # Start with n (the largest possible value)

          for i in range(n):
              # XOR with both index and value
              result ^= i ^ nums[i]

          return result
    explanation: |
      **Time Complexity:** O(n) — Single pass through the array.

      **Space Complexity:** O(1) — Only one variable used.

      This leverages XOR properties: `a ^ a = 0` and `a ^ 0 = a`. By XORing all indices `0` to `n-1` with all values in the array, paired numbers cancel out, leaving only the missing one. We initialise with `n` since indices only go up to `n-1`.

  - approach_name: Hash Set
    is_optimal: false
    code: |
      def missing_number(nums: list[int]) -> int:
          num_set = set(nums)
          n = len(nums)

          # Check each number in range [0, n]
          for i in range(n + 1):
              if i not in num_set:
                  return i

          return -1  # Should never reach here
    explanation: |
      **Time Complexity:** O(n) — Building the set and searching are both O(n).

      **Space Complexity:** O(n) — The hash set stores all n elements.

      This approach uses extra space to enable O(1) lookups. While the time complexity is optimal, the space usage makes it suboptimal compared to the sum or XOR approaches.