Files
codetutor/backend/data/questions/check-if-a-string-contains-all-binary-codes-of-size-k.yaml

240 lines
11 KiB
YAML

title: Check If a String Contains All Binary Codes of Size K
slug: check-if-a-string-contains-all-binary-codes-of-size-k
difficulty: medium
leetcode_id: 1461
leetcode_url: https://leetcode.com/problems/check-if-a-string-contains-all-binary-codes-of-size-k/
categories:
- strings
- hash-tables
patterns:
- slug: sliding-window
is_optimal: true
function_signature: "def has_all_codes(s: str, k: int) -> bool:"
test_cases:
visible:
- input: { s: "00110110", k: 2 }
expected: true
- input: { s: "0110", k: 1 }
expected: true
- input: { s: "0110", k: 2 }
expected: false
hidden:
- input: { s: "0", k: 1 }
expected: false
- input: { s: "01", k: 1 }
expected: true
- input: { s: "00110", k: 2 }
expected: true
- input: { s: "0000000001011100", k: 4 }
expected: false
- input: { s: "11111111", k: 3 }
expected: false
- input: { s: "00011100101", k: 3 }
expected: true
description: |
Given a binary string `s` and an integer `k`, return `true` *if every binary code of length* `k` *is a substring of* `s`. Otherwise, return `false`.
A binary code of length `k` is any string consisting of exactly `k` characters, where each character is either `'0'` or `'1'`. For example, when `k = 2`, all possible binary codes are: `"00"`, `"01"`, `"10"`, and `"11"`.
You need to verify that **all** `2^k` possible binary codes appear somewhere in `s` as contiguous substrings.
constraints: |
- `1 <= s.length <= 5 * 10^5`
- `s[i]` is either `'0'` or `'1'`
- `1 <= k <= 20`
examples:
- input: 's = "00110110", k = 2'
output: "true"
explanation: "The binary codes of length 2 are \"00\", \"01\", \"10\" and \"11\". They can be all found as substrings at indices 0, 1, 3 and 2 respectively."
- input: 's = "0110", k = 1'
output: "true"
explanation: "The binary codes of length 1 are \"0\" and \"1\", it is clear that both exist as a substring."
- input: 's = "0110", k = 2'
output: "false"
explanation: 'The binary code "00" is of length 2 and does not exist in the string.'
explanation:
intuition: |
Think of this problem like checking off items on a checklist. For a given `k`, there are exactly `2^k` unique binary codes (just like how there are 4 two-digit binary numbers: 00, 01, 10, 11).
As you slide through the string `s` with a window of size `k`, each window gives you one binary code substring. The question becomes: do you encounter **all** `2^k` possible codes while sliding through?
Imagine you have a box of crayons where each crayon represents a unique binary code. As you slide through the string, every window "touches" one crayon. If by the end you've touched all crayons in the box, you return `true`.
The key insight is that instead of generating all `2^k` codes and checking if each exists in `s` (which would be expensive), you simply **collect all unique substrings of length `k`** from `s` and check if you collected exactly `2^k` of them. A hash set naturally handles the uniqueness for you.
approach: |
We use a **Sliding Window with Hash Set** approach:
**Step 1: Calculate the target count**
- Compute `required = 2^k`, which is the total number of unique binary codes of length `k`
- If `s` is too short to even contain `required` substrings, we can return `false` early
&nbsp;
**Step 2: Early termination check**
- The number of substrings of length `k` in `s` is `len(s) - k + 1`
- If `len(s) - k + 1 < required`, it's impossible to have all codes, so return `false`
&nbsp;
**Step 3: Slide through and collect unique substrings**
- Create an empty hash set to store unique binary codes
- Iterate through `s` with a sliding window of size `k`
- For each position `i` from `0` to `len(s) - k`, extract the substring `s[i:i+k]`
- Add each substring to the set (duplicates are automatically ignored)
&nbsp;
**Step 4: Compare the count**
- If the size of the set equals `required` (`2^k`), return `true`
- Otherwise, return `false`
&nbsp;
This approach works because a set only keeps unique elements. If we've seen all `2^k` unique codes, the set size will be exactly `2^k`.
common_pitfalls:
- title: Generating All Codes First
description: |
A tempting approach is to first generate all `2^k` binary codes, then check if each one exists in `s` using string searching.
This is inefficient for two reasons:
1. Generating all codes takes O(k * 2^k) time
2. Searching for each code in `s` takes O(n) per code, leading to O(n * 2^k) total
With `k = 20`, you'd have over 1 million codes to generate and search for!
The sliding window approach is O(n * k) instead, much better for large inputs.
wrong_approach: "Generate all 2^k codes, search for each in s"
correct_approach: "Collect unique substrings from s, count them"
- title: Off-by-One in Window Iteration
description: |
When iterating to extract substrings of length `k`, the loop should run from index `0` to `len(s) - k` (inclusive).
A common mistake is iterating to `len(s) - k + 1` or `len(s)`, which either causes index out of bounds or misses the last valid window.
For `s = "0110"` with `k = 2`:
- Valid indices: 0, 1, 2 (giving "01", "11", "10")
- Loop should be `for i in range(len(s) - k + 1)` or `for i in range(3)`
wrong_approach: "range(len(s)) or range(len(s) - k)"
correct_approach: "range(len(s) - k + 1)"
- title: Forgetting the Early Return Optimisation
description: |
While not strictly a bug, failing to add the early termination check can hurt performance.
If `len(s) < k`, there are zero substrings of length `k`. If `len(s) - k + 1 < 2^k`, it's mathematically impossible to have all codes.
Example: For `k = 20`, you need at least `2^20 = 1,048,576` substrings, meaning `s` must have length at least `1,048,595`.
wrong_approach: "Always iterate through the entire string"
correct_approach: "Check if len(s) - k + 1 >= 2^k before iterating"
key_takeaways:
- "**Hash sets for counting unique items**: When you need to count distinct elements, a set automatically handles duplicates"
- "**Sliding window for substrings**: Extracting all substrings of a fixed length is a classic sliding window pattern"
- "**Think about the inverse**: Instead of checking if all codes exist, collect what exists and compare the count"
- "**Early termination**: Mathematical bounds can save computation - if there aren't enough windows, the answer is definitely `false`"
time_complexity: "O(n * k). We slide through the string once (O(n) positions), and at each position we extract a substring of length k (O(k) for hashing/copying)."
space_complexity: "O(2^k * k). In the worst case, the set stores all 2^k unique binary codes, each of length k characters."
solutions:
- approach_name: Sliding Window with Hash Set
is_optimal: true
code: |
def has_all_codes(s: str, k: int) -> bool:
# Total number of unique binary codes of length k
required = 1 << k # Same as 2^k
# Early termination: not enough substrings possible
if len(s) - k + 1 < required:
return False
# Collect all unique substrings of length k
seen = set()
for i in range(len(s) - k + 1):
# Extract the substring at this window position
code = s[i:i + k]
seen.add(code)
# Optimisation: stop early if we've found all codes
if len(seen) == required:
return True
return len(seen) == required
explanation: |
**Time Complexity:** O(n * k) — We visit each of the n - k + 1 positions once, and extracting/hashing a substring of length k takes O(k) time.
**Space Complexity:** O(2^k * k) — The set can hold up to 2^k strings, each of length k.
We slide a window of size k across the string, collecting each unique substring in a hash set. If the set reaches size 2^k, we've found all possible binary codes. The early termination when `len(seen) == required` provides a small optimisation.
- approach_name: Bit Manipulation (Rolling Hash)
is_optimal: true
code: |
def has_all_codes(s: str, k: int) -> bool:
required = 1 << k # 2^k
if len(s) - k + 1 < required:
return False
# Use a set of integers instead of strings
seen = set()
# Mask to keep only k bits (e.g., k=3 -> mask=0b111=7)
mask = required - 1
# Convert first k-1 characters to a number
current = 0
for i in range(k - 1):
current = (current << 1) | (ord(s[i]) - ord('0'))
# Slide through, updating the hash in O(1) per step
for i in range(k - 1, len(s)):
# Shift left and add new bit, then mask to keep k bits
current = ((current << 1) | (ord(s[i]) - ord('0'))) & mask
seen.add(current)
if len(seen) == required:
return True
return len(seen) == required
explanation: |
**Time Complexity:** O(n) — Each position is processed in O(1) time since we update the hash with bit operations.
**Space Complexity:** O(2^k) — The set stores up to 2^k integers.
Instead of storing substrings, we convert each k-length window to an integer. For example, "101" becomes 5. The rolling hash uses bit shifts: shift left by 1, add the new bit, and mask off the oldest bit. This avoids the O(k) cost of substring extraction and hashing, reducing time complexity from O(n * k) to O(n).
- approach_name: Brute Force (Generate and Search)
is_optimal: false
code: |
def has_all_codes(s: str, k: int) -> bool:
# Generate all 2^k binary codes
required = 1 << k
for code_num in range(required):
# Convert number to binary string of length k
code = bin(code_num)[2:].zfill(k)
# Check if this code exists in s
if code not in s:
return False
return True
explanation: |
**Time Complexity:** O(n * 2^k) — For each of the 2^k codes, we search the string which takes O(n) time.
**Space Complexity:** O(k) — We only store one code string at a time.
This approach generates every possible binary code and checks if it's a substring of s. While intuitive, it's inefficient for large k values. With k = 20, we'd perform over a million substring searches. This solution may cause TLE on LeetCode but illustrates the straightforward approach that the optimal solutions improve upon.