Files
codetutor/backend/data/questions/contains-duplicate-ii.yaml
2025-05-25 10:16:13 +01:00

184 lines
8.6 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
title: Contains Duplicate II
slug: contains-duplicate-ii
difficulty: easy
leetcode_id: 219
leetcode_url: https://leetcode.com/problems/contains-duplicate-ii/
categories:
- arrays
- hash-tables
patterns:
- sliding-window
description: |
Given an integer array `nums` and an integer `k`, return `true` *if there are two **distinct indices*** `i` *and* `j` *in the array such that* `nums[i] == nums[j]` *and* `abs(i - j) <= k`.
constraints: |
- `1 <= nums.length <= 10^5`
- `-10^9 <= nums[i] <= 10^9`
- `0 <= k <= 10^5`
examples:
- input: "nums = [1,2,3,1], k = 3"
output: "true"
explanation: "The element 1 appears at index 0 and index 3. Since abs(0 - 3) = 3 <= k, we return true."
- input: "nums = [1,0,1,1], k = 1"
output: "true"
explanation: "The element 1 appears at index 2 and index 3. Since abs(2 - 3) = 1 <= k, we return true."
- input: "nums = [1,2,3,1,2,3], k = 2"
output: "false"
explanation: "While there are duplicates, no pair of duplicate values are within k = 2 indices of each other. The closest duplicate pair (1 at index 0 and 3) has distance 3 > k."
explanation:
intuition: |
Imagine you're walking through a hallway with numbered rooms, and you need to find if any room number repeats within the last `k` rooms you've passed.
The core insight is that we don't need to remember *every* room we've ever seen — we only care about rooms within our **sliding window** of the last `k` positions. If we encounter a room number we've seen within this window, we've found our duplicate.
Think of it like this: as you move forward, you maintain a "memory" of the last `k` rooms. When you see a new room number, you check if it's already in your memory. If yes, you found a nearby duplicate. If not, add it to your memory and forget the oldest room (the one that's now more than `k` steps behind).
This naturally suggests using a **hash set** as our memory — it gives us O(1) lookups to check for duplicates and O(1) insertions/deletions to maintain our sliding window.
approach: |
We solve this using a **Sliding Window with Hash Set** approach:
**Step 1: Initialise a hash set**
- Create an empty set `window` to store elements within our current window of size `k`
- The set will contain at most `k` elements at any time
&nbsp;
**Step 2: Iterate through the array**
- For each element at index `i`, check if it already exists in our `window` set
- If yes, we found a duplicate within distance `k` — return `true`
- If no, add the current element to the window
&nbsp;
**Step 3: Maintain window size**
- If the window size exceeds `k`, remove the oldest element (the one at index `i - k`)
- This ensures we only track elements within the valid distance
&nbsp;
**Step 4: Return the result**
- If we complete the loop without finding duplicates, return `false`
&nbsp;
This approach efficiently combines the sliding window pattern with a hash set for O(1) operations, giving us an optimal O(n) solution.
common_pitfalls:
- title: The Brute Force Trap
description: |
A naive approach checks every pair of elements to see if they're equal and within distance `k`:
- Outer loop `i` from `0` to `n-1`
- Inner loop `j` from `i+1` to `min(i+k+1, n)`
While this limits the inner loop to `k` iterations, it's still **O(n × k)** in the worst case. When both `n` and `k` are at their maximum (`10^5`), this results in up to 10 billion operations — causing a **Time Limit Exceeded (TLE)** error.
wrong_approach: "Nested loops checking pairs within distance k"
correct_approach: "Sliding window with hash set for O(n) time"
- title: Using a Hash Map Instead of a Set
description: |
While a hash map (storing value → index) works, it's more complex than necessary. You'd need to update indices as you go and compare distances.
A hash set is simpler: by maintaining exactly the last `k` elements, we implicitly guarantee any match is within the valid distance. If it's in the set, it's within range.
wrong_approach: "Hash map with index tracking and distance calculation"
correct_approach: "Hash set with sliding window of size k"
- title: Off-by-One in Window Size
description: |
Be careful about when to remove elements from the window. The condition `abs(i - j) <= k` means indices can be up to `k` apart, so your window should contain `k` previous elements (not `k-1` or `k+1`).
Remove the element at index `i - k` only when `i >= k`, ensuring the window never exceeds `k` elements from the past.
wrong_approach: "Removing when i > k or keeping k+1 elements"
correct_approach: "Remove element at index i - k when i >= k"
key_takeaways:
- "**Sliding window + hash set**: When you need to find duplicates within a range, combine a fixed-size window with a set for O(1) lookups"
- "**Implicit distance guarantee**: By maintaining exactly `k` elements, any match is automatically within the valid distance — no need to track indices"
- "**Set vs Map tradeoff**: Choose the simpler data structure when it suffices; a set is often cleaner than a map when you don't need the stored values"
- "**Related problems**: This pattern extends to 'Contains Duplicate III' (within range *and* value difference) and other sliding window problems"
time_complexity: "O(n). We traverse the array once, with O(1) hash set operations (add, remove, lookup) at each step."
space_complexity: "O(min(n, k)). The hash set stores at most `min(n, k)` elements at any time."
solutions:
- approach_name: Sliding Window with Hash Set
is_optimal: true
code: |
def contains_nearby_duplicate(nums: list[int], k: int) -> bool:
# Set to track elements in our current window of size k
window = set()
for i, num in enumerate(nums):
# If we've seen this number in our window, we found a duplicate
if num in window:
return True
# Add current element to the window
window.add(num)
# Maintain window size: remove element that's now too far behind
if i >= k:
window.remove(nums[i - k])
# No nearby duplicates found
return False
explanation: |
**Time Complexity:** O(n) — Single pass through the array with O(1) set operations.
**Space Complexity:** O(min(n, k)) — The set contains at most k elements.
We maintain a sliding window of the last k elements using a hash set. For each new element, we check if it's already in the window (O(1) lookup). If found, we have a duplicate within distance k. Otherwise, we add it and remove the oldest element to maintain the window size.
- approach_name: Hash Map with Index Tracking
is_optimal: false
code: |
def contains_nearby_duplicate(nums: list[int], k: int) -> bool:
# Map each value to its most recent index
last_seen = {}
for i, num in enumerate(nums):
# Check if we've seen this number before
if num in last_seen:
# Check if the previous occurrence is within distance k
if i - last_seen[num] <= k:
return True
# Update the most recent index for this number
last_seen[num] = i
return False
explanation: |
**Time Complexity:** O(n) — Single pass with O(1) hash map operations.
**Space Complexity:** O(n) — In the worst case, all elements are unique and stored in the map.
This approach stores the last seen index for each value. When we encounter a number we've seen before, we check if the distance is within k. While correct and efficient, it uses more space than the sliding window approach when k is small relative to n.
- approach_name: Brute Force
is_optimal: false
code: |
def contains_nearby_duplicate(nums: list[int], k: int) -> bool:
n = len(nums)
# Check each element against the next k elements
for i in range(n):
# Only check within the valid range
for j in range(i + 1, min(i + k + 1, n)):
if nums[i] == nums[j]:
return True
return False
explanation: |
**Time Complexity:** O(n × k) — For each element, we check up to k subsequent elements.
**Space Complexity:** O(1) — No additional data structures used.
This straightforward approach checks every valid pair. While it passes small test cases, it will TLE on large inputs where both n and k approach 10^5. Included to illustrate why the hash-based approaches are necessary.