239 lines
11 KiB
YAML
239 lines
11 KiB
YAML
title: Check If a String Contains All Binary Codes of Size K
|
|
slug: check-if-a-string-contains-all-binary-codes-of-size-k
|
|
difficulty: medium
|
|
leetcode_id: 1461
|
|
leetcode_url: https://leetcode.com/problems/check-if-a-string-contains-all-binary-codes-of-size-k/
|
|
categories:
|
|
- strings
|
|
- hash-tables
|
|
patterns:
|
|
- sliding-window
|
|
|
|
function_signature: "def has_all_codes(s: str, k: int) -> bool:"
|
|
|
|
test_cases:
|
|
visible:
|
|
- input: { s: "00110110", k: 2 }
|
|
expected: true
|
|
- input: { s: "0110", k: 1 }
|
|
expected: true
|
|
- input: { s: "0110", k: 2 }
|
|
expected: false
|
|
hidden:
|
|
- input: { s: "0", k: 1 }
|
|
expected: false
|
|
- input: { s: "01", k: 1 }
|
|
expected: true
|
|
- input: { s: "00110", k: 2 }
|
|
expected: true
|
|
- input: { s: "0000000001011100", k: 4 }
|
|
expected: false
|
|
- input: { s: "11111111", k: 3 }
|
|
expected: false
|
|
- input: { s: "00011100101", k: 3 }
|
|
expected: true
|
|
|
|
description: |
|
|
Given a binary string `s` and an integer `k`, return `true` *if every binary code of length* `k` *is a substring of* `s`. Otherwise, return `false`.
|
|
|
|
A binary code of length `k` is any string consisting of exactly `k` characters, where each character is either `'0'` or `'1'`. For example, when `k = 2`, all possible binary codes are: `"00"`, `"01"`, `"10"`, and `"11"`.
|
|
|
|
You need to verify that **all** `2^k` possible binary codes appear somewhere in `s` as contiguous substrings.
|
|
|
|
constraints: |
|
|
- `1 <= s.length <= 5 * 10^5`
|
|
- `s[i]` is either `'0'` or `'1'`
|
|
- `1 <= k <= 20`
|
|
|
|
examples:
|
|
- input: 's = "00110110", k = 2'
|
|
output: "true"
|
|
explanation: "The binary codes of length 2 are \"00\", \"01\", \"10\" and \"11\". They can be all found as substrings at indices 0, 1, 3 and 2 respectively."
|
|
- input: 's = "0110", k = 1'
|
|
output: "true"
|
|
explanation: "The binary codes of length 1 are \"0\" and \"1\", it is clear that both exist as a substring."
|
|
- input: 's = "0110", k = 2'
|
|
output: "false"
|
|
explanation: 'The binary code "00" is of length 2 and does not exist in the string.'
|
|
|
|
explanation:
|
|
intuition: |
|
|
Think of this problem like checking off items on a checklist. For a given `k`, there are exactly `2^k` unique binary codes (just like how there are 4 two-digit binary numbers: 00, 01, 10, 11).
|
|
|
|
As you slide through the string `s` with a window of size `k`, each window gives you one binary code substring. The question becomes: do you encounter **all** `2^k` possible codes while sliding through?
|
|
|
|
Imagine you have a box of crayons where each crayon represents a unique binary code. As you slide through the string, every window "touches" one crayon. If by the end you've touched all crayons in the box, you return `true`.
|
|
|
|
The key insight is that instead of generating all `2^k` codes and checking if each exists in `s` (which would be expensive), you simply **collect all unique substrings of length `k`** from `s` and check if you collected exactly `2^k` of them. A hash set naturally handles the uniqueness for you.
|
|
|
|
approach: |
|
|
We use a **Sliding Window with Hash Set** approach:
|
|
|
|
**Step 1: Calculate the target count**
|
|
|
|
- Compute `required = 2^k`, which is the total number of unique binary codes of length `k`
|
|
- If `s` is too short to even contain `required` substrings, we can return `false` early
|
|
|
|
|
|
|
|
**Step 2: Early termination check**
|
|
|
|
- The number of substrings of length `k` in `s` is `len(s) - k + 1`
|
|
- If `len(s) - k + 1 < required`, it's impossible to have all codes, so return `false`
|
|
|
|
|
|
|
|
**Step 3: Slide through and collect unique substrings**
|
|
|
|
- Create an empty hash set to store unique binary codes
|
|
- Iterate through `s` with a sliding window of size `k`
|
|
- For each position `i` from `0` to `len(s) - k`, extract the substring `s[i:i+k]`
|
|
- Add each substring to the set (duplicates are automatically ignored)
|
|
|
|
|
|
|
|
**Step 4: Compare the count**
|
|
|
|
- If the size of the set equals `required` (`2^k`), return `true`
|
|
- Otherwise, return `false`
|
|
|
|
|
|
|
|
This approach works because a set only keeps unique elements. If we've seen all `2^k` unique codes, the set size will be exactly `2^k`.
|
|
|
|
common_pitfalls:
|
|
- title: Generating All Codes First
|
|
description: |
|
|
A tempting approach is to first generate all `2^k` binary codes, then check if each one exists in `s` using string searching.
|
|
|
|
This is inefficient for two reasons:
|
|
1. Generating all codes takes O(k * 2^k) time
|
|
2. Searching for each code in `s` takes O(n) per code, leading to O(n * 2^k) total
|
|
|
|
With `k = 20`, you'd have over 1 million codes to generate and search for!
|
|
|
|
The sliding window approach is O(n * k) instead, much better for large inputs.
|
|
wrong_approach: "Generate all 2^k codes, search for each in s"
|
|
correct_approach: "Collect unique substrings from s, count them"
|
|
|
|
- title: Off-by-One in Window Iteration
|
|
description: |
|
|
When iterating to extract substrings of length `k`, the loop should run from index `0` to `len(s) - k` (inclusive).
|
|
|
|
A common mistake is iterating to `len(s) - k + 1` or `len(s)`, which either causes index out of bounds or misses the last valid window.
|
|
|
|
For `s = "0110"` with `k = 2`:
|
|
- Valid indices: 0, 1, 2 (giving "01", "11", "10")
|
|
- Loop should be `for i in range(len(s) - k + 1)` or `for i in range(3)`
|
|
wrong_approach: "range(len(s)) or range(len(s) - k)"
|
|
correct_approach: "range(len(s) - k + 1)"
|
|
|
|
- title: Forgetting the Early Return Optimisation
|
|
description: |
|
|
While not strictly a bug, failing to add the early termination check can hurt performance.
|
|
|
|
If `len(s) < k`, there are zero substrings of length `k`. If `len(s) - k + 1 < 2^k`, it's mathematically impossible to have all codes.
|
|
|
|
Example: For `k = 20`, you need at least `2^20 = 1,048,576` substrings, meaning `s` must have length at least `1,048,595`.
|
|
wrong_approach: "Always iterate through the entire string"
|
|
correct_approach: "Check if len(s) - k + 1 >= 2^k before iterating"
|
|
|
|
key_takeaways:
|
|
- "**Hash sets for counting unique items**: When you need to count distinct elements, a set automatically handles duplicates"
|
|
- "**Sliding window for substrings**: Extracting all substrings of a fixed length is a classic sliding window pattern"
|
|
- "**Think about the inverse**: Instead of checking if all codes exist, collect what exists and compare the count"
|
|
- "**Early termination**: Mathematical bounds can save computation - if there aren't enough windows, the answer is definitely `false`"
|
|
|
|
time_complexity: "O(n * k). We slide through the string once (O(n) positions), and at each position we extract a substring of length k (O(k) for hashing/copying)."
|
|
space_complexity: "O(2^k * k). In the worst case, the set stores all 2^k unique binary codes, each of length k characters."
|
|
|
|
solutions:
|
|
- approach_name: Sliding Window with Hash Set
|
|
is_optimal: true
|
|
code: |
|
|
def has_all_codes(s: str, k: int) -> bool:
|
|
# Total number of unique binary codes of length k
|
|
required = 1 << k # Same as 2^k
|
|
|
|
# Early termination: not enough substrings possible
|
|
if len(s) - k + 1 < required:
|
|
return False
|
|
|
|
# Collect all unique substrings of length k
|
|
seen = set()
|
|
for i in range(len(s) - k + 1):
|
|
# Extract the substring at this window position
|
|
code = s[i:i + k]
|
|
seen.add(code)
|
|
|
|
# Optimisation: stop early if we've found all codes
|
|
if len(seen) == required:
|
|
return True
|
|
|
|
return len(seen) == required
|
|
explanation: |
|
|
**Time Complexity:** O(n * k) — We visit each of the n - k + 1 positions once, and extracting/hashing a substring of length k takes O(k) time.
|
|
|
|
**Space Complexity:** O(2^k * k) — The set can hold up to 2^k strings, each of length k.
|
|
|
|
We slide a window of size k across the string, collecting each unique substring in a hash set. If the set reaches size 2^k, we've found all possible binary codes. The early termination when `len(seen) == required` provides a small optimisation.
|
|
|
|
- approach_name: Bit Manipulation (Rolling Hash)
|
|
is_optimal: true
|
|
code: |
|
|
def has_all_codes(s: str, k: int) -> bool:
|
|
required = 1 << k # 2^k
|
|
|
|
if len(s) - k + 1 < required:
|
|
return False
|
|
|
|
# Use a set of integers instead of strings
|
|
seen = set()
|
|
# Mask to keep only k bits (e.g., k=3 -> mask=0b111=7)
|
|
mask = required - 1
|
|
|
|
# Convert first k-1 characters to a number
|
|
current = 0
|
|
for i in range(k - 1):
|
|
current = (current << 1) | (ord(s[i]) - ord('0'))
|
|
|
|
# Slide through, updating the hash in O(1) per step
|
|
for i in range(k - 1, len(s)):
|
|
# Shift left and add new bit, then mask to keep k bits
|
|
current = ((current << 1) | (ord(s[i]) - ord('0'))) & mask
|
|
seen.add(current)
|
|
|
|
if len(seen) == required:
|
|
return True
|
|
|
|
return len(seen) == required
|
|
explanation: |
|
|
**Time Complexity:** O(n) — Each position is processed in O(1) time since we update the hash with bit operations.
|
|
|
|
**Space Complexity:** O(2^k) — The set stores up to 2^k integers.
|
|
|
|
Instead of storing substrings, we convert each k-length window to an integer. For example, "101" becomes 5. The rolling hash uses bit shifts: shift left by 1, add the new bit, and mask off the oldest bit. This avoids the O(k) cost of substring extraction and hashing, reducing time complexity from O(n * k) to O(n).
|
|
|
|
- approach_name: Brute Force (Generate and Search)
|
|
is_optimal: false
|
|
code: |
|
|
def has_all_codes(s: str, k: int) -> bool:
|
|
# Generate all 2^k binary codes
|
|
required = 1 << k
|
|
|
|
for code_num in range(required):
|
|
# Convert number to binary string of length k
|
|
code = bin(code_num)[2:].zfill(k)
|
|
|
|
# Check if this code exists in s
|
|
if code not in s:
|
|
return False
|
|
|
|
return True
|
|
explanation: |
|
|
**Time Complexity:** O(n * 2^k) — For each of the 2^k codes, we search the string which takes O(n) time.
|
|
|
|
**Space Complexity:** O(k) — We only store one code string at a time.
|
|
|
|
This approach generates every possible binary code and checks if it's a substring of s. While intuitive, it's inefficient for large k values. With k = 20, we'd perform over a million substring searches. This solution may cause TLE on LeetCode but illustrates the straightforward approach that the optimal solutions improve upon.
|