codetutor/backend/data/questions/contains-duplicate-ii.yaml

title: Contains Duplicate II
slug: contains-duplicate-ii
difficulty: easy
leetcode_id: 219
leetcode_url: https://leetcode.com/problems/contains-duplicate-ii/
categories:
  - arrays
  - hash-tables
patterns:
  - sliding-window

function_signature: "def contains_nearby_duplicate(nums: list[int], k: int) -> bool:"

test_cases:
  visible:
    - input: { nums: [1, 2, 3, 1], k: 3 }
      expected: true
    - input: { nums: [1, 0, 1, 1], k: 1 }
      expected: true
    - input: { nums: [1, 2, 3, 1, 2, 3], k: 2 }
      expected: false
  hidden:
    - input: { nums: [1], k: 1 }
      expected: false
    - input: { nums: [1, 1], k: 0 }
      expected: false
    - input: { nums: [1, 2, 1], k: 2 }
      expected: true
    - input: { nums: [99, 99], k: 2 }
      expected: true
    - input: { nums: [1, 2, 3, 4, 5], k: 3 }
      expected: false
    - input: { nums: [0, 1, 2, 3, 4, 0, 0, 7, 8, 9, 10, 11, 12, 0], k: 1 }
      expected: true

description: |
  Given an integer array `nums` and an integer `k`, return `true` *if there are two **distinct indices*** `i` *and* `j` *in the array such that* `nums[i] == nums[j]` *and* `abs(i - j) <= k`.

constraints: |
  - `1 <= nums.length <= 10^5`
  - `-10^9 <= nums[i] <= 10^9`
  - `0 <= k <= 10^5`

examples:
  - input: "nums = [1,2,3,1], k = 3"
    output: "true"
    explanation: "The element 1 appears at index 0 and index 3. Since abs(0 - 3) = 3 <= k, we return true."
  - input: "nums = [1,0,1,1], k = 1"
    output: "true"
    explanation: "The element 1 appears at index 2 and index 3. Since abs(2 - 3) = 1 <= k, we return true."
  - input: "nums = [1,2,3,1,2,3], k = 2"
    output: "false"
    explanation: "While there are duplicates, no pair of duplicate values are within k = 2 indices of each other. The closest duplicate pair (1 at index 0 and 3) has distance 3 > k."

explanation:
  intuition: |
    Imagine you're walking through a hallway with numbered rooms, and you need to find if any room number repeats within the last `k` rooms you've passed.

    The core insight is that we don't need to remember *every* room we've ever seen — we only care about rooms within our **sliding window** of the last `k` positions. If we encounter a room number we've seen within this window, we've found our duplicate.

    Think of it like this: as you move forward, you maintain a "memory" of the last `k` rooms. When you see a new room number, you check if it's already in your memory. If yes, you found a nearby duplicate. If not, add it to your memory and forget the oldest room (the one that's now more than `k` steps behind).

    This naturally suggests using a **hash set** as our memory — it gives us O(1) lookups to check for duplicates and O(1) insertions/deletions to maintain our sliding window.

  approach: |
    We solve this using a **Sliding Window with Hash Set** approach:

    **Step 1: Initialise a hash set**

    - Create an empty set `window` to store elements within our current window of size `k`
    - The set will contain at most `k` elements at any time

    &nbsp;

    **Step 2: Iterate through the array**

    - For each element at index `i`, check if it already exists in our `window` set
    - If yes, we found a duplicate within distance `k` — return `true`
    - If no, add the current element to the window

    &nbsp;

    **Step 3: Maintain window size**

    - If the window size exceeds `k`, remove the oldest element (the one at index `i - k`)
    - This ensures we only track elements within the valid distance

    &nbsp;

    **Step 4: Return the result**

    - If we complete the loop without finding duplicates, return `false`

    &nbsp;

    This approach efficiently combines the sliding window pattern with a hash set for O(1) operations, giving us an optimal O(n) solution.

  common_pitfalls:
    - title: The Brute Force Trap
      description: |
        A naive approach checks every pair of elements to see if they're equal and within distance `k`:
        - Outer loop `i` from `0` to `n-1`
        - Inner loop `j` from `i+1` to `min(i+k+1, n)`

        While this limits the inner loop to `k` iterations, it's still **O(n × k)** in the worst case. When both `n` and `k` are at their maximum (`10^5`), this results in up to 10 billion operations — causing a **Time Limit Exceeded (TLE)** error.
      wrong_approach: "Nested loops checking pairs within distance k"
      correct_approach: "Sliding window with hash set for O(n) time"

    - title: Using a Hash Map Instead of a Set
      description: |
        While a hash map (storing value → index) works, it's more complex than necessary. You'd need to update indices as you go and compare distances.

        A hash set is simpler: by maintaining exactly the last `k` elements, we implicitly guarantee any match is within the valid distance. If it's in the set, it's within range.
      wrong_approach: "Hash map with index tracking and distance calculation"
      correct_approach: "Hash set with sliding window of size k"

    - title: Off-by-One in Window Size
      description: |
        Be careful about when to remove elements from the window. The condition `abs(i - j) <= k` means indices can be up to `k` apart, so your window should contain `k` previous elements (not `k-1` or `k+1`).

        Remove the element at index `i - k` only when `i >= k`, ensuring the window never exceeds `k` elements from the past.
      wrong_approach: "Removing when i > k or keeping k+1 elements"
      correct_approach: "Remove element at index i - k when i >= k"

  key_takeaways:
    - "**Sliding window + hash set**: When you need to find duplicates within a range, combine a fixed-size window with a set for O(1) lookups"
    - "**Implicit distance guarantee**: By maintaining exactly `k` elements, any match is automatically within the valid distance — no need to track indices"
    - "**Set vs Map tradeoff**: Choose the simpler data structure when it suffices; a set is often cleaner than a map when you don't need the stored values"
    - "**Related problems**: This pattern extends to 'Contains Duplicate III' (within range *and* value difference) and other sliding window problems"

  time_complexity: "O(n). We traverse the array once, with O(1) hash set operations (add, remove, lookup) at each step."
  space_complexity: "O(min(n, k)). The hash set stores at most `min(n, k)` elements at any time."

solutions:
  - approach_name: Sliding Window with Hash Set
    is_optimal: true
    code: |
      def contains_nearby_duplicate(nums: list[int], k: int) -> bool:
          # Set to track elements in our current window of size k
          window = set()

          for i, num in enumerate(nums):
              # If we've seen this number in our window, we found a duplicate
              if num in window:
                  return True

              # Add current element to the window
              window.add(num)

              # Maintain window size: remove element that's now too far behind
              if i >= k:
                  window.remove(nums[i - k])

          # No nearby duplicates found
          return False
    explanation: |
      **Time Complexity:** O(n) — Single pass through the array with O(1) set operations.

      **Space Complexity:** O(min(n, k)) — The set contains at most k elements.

      We maintain a sliding window of the last k elements using a hash set. For each new element, we check if it's already in the window (O(1) lookup). If found, we have a duplicate within distance k. Otherwise, we add it and remove the oldest element to maintain the window size.

  - approach_name: Hash Map with Index Tracking
    is_optimal: false
    code: |
      def contains_nearby_duplicate(nums: list[int], k: int) -> bool:
          # Map each value to its most recent index
          last_seen = {}

          for i, num in enumerate(nums):
              # Check if we've seen this number before
              if num in last_seen:
                  # Check if the previous occurrence is within distance k
                  if i - last_seen[num] <= k:
                      return True

              # Update the most recent index for this number
              last_seen[num] = i

          return False
    explanation: |
      **Time Complexity:** O(n) — Single pass with O(1) hash map operations.

      **Space Complexity:** O(n) — In the worst case, all elements are unique and stored in the map.

      This approach stores the last seen index for each value. When we encounter a number we've seen before, we check if the distance is within k. While correct and efficient, it uses more space than the sliding window approach when k is small relative to n.

  - approach_name: Brute Force
    is_optimal: false
    code: |
      def contains_nearby_duplicate(nums: list[int], k: int) -> bool:
          n = len(nums)

          # Check each element against the next k elements
          for i in range(n):
              # Only check within the valid range
              for j in range(i + 1, min(i + k + 1, n)):
                  if nums[i] == nums[j]:
                      return True

          return False
    explanation: |
      **Time Complexity:** O(n × k) — For each element, we check up to k subsequent elements.

      **Space Complexity:** O(1) — No additional data structures used.

      This straightforward approach checks every valid pair. While it passes small test cases, it will TLE on large inputs where both n and k approach 10^5. Included to illustrate why the hash-based approaches are necessary.