codetutor/backend/data/questions/check-if-a-string-contains-all-binary-codes-of-size-k.yaml

title: Check If a String Contains All Binary Codes of Size K
slug: check-if-a-string-contains-all-binary-codes-of-size-k
difficulty: medium
leetcode_id: 1461
leetcode_url: https://leetcode.com/problems/check-if-a-string-contains-all-binary-codes-of-size-k/
categories:
  - strings
  - hash-tables
patterns:
  - sliding-window

function_signature: "def has_all_codes(s: str, k: int) -> bool:"

test_cases:
  visible:
    - input: { s: "00110110", k: 2 }
      expected: true
    - input: { s: "0110", k: 1 }
      expected: true
    - input: { s: "0110", k: 2 }
      expected: false
  hidden:
    - input: { s: "0", k: 1 }
      expected: false
    - input: { s: "01", k: 1 }
      expected: true
    - input: { s: "00110", k: 2 }
      expected: true
    - input: { s: "0000000001011100", k: 4 }
      expected: false
    - input: { s: "11111111", k: 3 }
      expected: false
    - input: { s: "00011100101", k: 3 }
      expected: true

description: |
  Given a binary string `s` and an integer `k`, return `true` *if every binary code of length* `k` *is a substring of* `s`. Otherwise, return `false`.

  A binary code of length `k` is any string consisting of exactly `k` characters, where each character is either `'0'` or `'1'`. For example, when `k = 2`, all possible binary codes are: `"00"`, `"01"`, `"10"`, and `"11"`.

  You need to verify that **all** `2^k` possible binary codes appear somewhere in `s` as contiguous substrings.

constraints: |
  - `1 <= s.length <= 5 * 10^5`
  - `s[i]` is either `'0'` or `'1'`
  - `1 <= k <= 20`

examples:
  - input: 's = "00110110", k = 2'
    output: "true"
    explanation: "The binary codes of length 2 are \"00\", \"01\", \"10\" and \"11\". They can be all found as substrings at indices 0, 1, 3 and 2 respectively."
  - input: 's = "0110", k = 1'
    output: "true"
    explanation: "The binary codes of length 1 are \"0\" and \"1\", it is clear that both exist as a substring."
  - input: 's = "0110", k = 2'
    output: "false"
    explanation: 'The binary code "00" is of length 2 and does not exist in the string.'

explanation:
  intuition: |
    Think of this problem like checking off items on a checklist. For a given `k`, there are exactly `2^k` unique binary codes (just like how there are 4 two-digit binary numbers: 00, 01, 10, 11).

    As you slide through the string `s` with a window of size `k`, each window gives you one binary code substring. The question becomes: do you encounter **all** `2^k` possible codes while sliding through?

    Imagine you have a box of crayons where each crayon represents a unique binary code. As you slide through the string, every window "touches" one crayon. If by the end you've touched all crayons in the box, you return `true`.

    The key insight is that instead of generating all `2^k` codes and checking if each exists in `s` (which would be expensive), you simply **collect all unique substrings of length `k`** from `s` and check if you collected exactly `2^k` of them. A hash set naturally handles the uniqueness for you.

  approach: |
    We use a **Sliding Window with Hash Set** approach:

    **Step 1: Calculate the target count**

    - Compute `required = 2^k`, which is the total number of unique binary codes of length `k`
    - If `s` is too short to even contain `required` substrings, we can return `false` early

    &nbsp;

    **Step 2: Early termination check**

    - The number of substrings of length `k` in `s` is `len(s) - k + 1`
    - If `len(s) - k + 1 < required`, it's impossible to have all codes, so return `false`

    &nbsp;

    **Step 3: Slide through and collect unique substrings**

    - Create an empty hash set to store unique binary codes
    - Iterate through `s` with a sliding window of size `k`
    - For each position `i` from `0` to `len(s) - k`, extract the substring `s[i:i+k]`
    - Add each substring to the set (duplicates are automatically ignored)

    &nbsp;

    **Step 4: Compare the count**

    - If the size of the set equals `required` (`2^k`), return `true`
    - Otherwise, return `false`

    &nbsp;

    This approach works because a set only keeps unique elements. If we've seen all `2^k` unique codes, the set size will be exactly `2^k`.

  common_pitfalls:
    - title: Generating All Codes First
      description: |
        A tempting approach is to first generate all `2^k` binary codes, then check if each one exists in `s` using string searching.

        This is inefficient for two reasons:
        1. Generating all codes takes O(k * 2^k) time
        2. Searching for each code in `s` takes O(n) per code, leading to O(n * 2^k) total

        With `k = 20`, you'd have over 1 million codes to generate and search for!

        The sliding window approach is O(n * k) instead, much better for large inputs.
      wrong_approach: "Generate all 2^k codes, search for each in s"
      correct_approach: "Collect unique substrings from s, count them"

    - title: Off-by-One in Window Iteration
      description: |
        When iterating to extract substrings of length `k`, the loop should run from index `0` to `len(s) - k` (inclusive).

        A common mistake is iterating to `len(s) - k + 1` or `len(s)`, which either causes index out of bounds or misses the last valid window.

        For `s = "0110"` with `k = 2`:
        - Valid indices: 0, 1, 2 (giving "01", "11", "10")
        - Loop should be `for i in range(len(s) - k + 1)` or `for i in range(3)`
      wrong_approach: "range(len(s)) or range(len(s) - k)"
      correct_approach: "range(len(s) - k + 1)"

    - title: Forgetting the Early Return Optimisation
      description: |
        While not strictly a bug, failing to add the early termination check can hurt performance.

        If `len(s) < k`, there are zero substrings of length `k`. If `len(s) - k + 1 < 2^k`, it's mathematically impossible to have all codes.

        Example: For `k = 20`, you need at least `2^20 = 1,048,576` substrings, meaning `s` must have length at least `1,048,595`.
      wrong_approach: "Always iterate through the entire string"
      correct_approach: "Check if len(s) - k + 1 >= 2^k before iterating"

  key_takeaways:
    - "**Hash sets for counting unique items**: When you need to count distinct elements, a set automatically handles duplicates"
    - "**Sliding window for substrings**: Extracting all substrings of a fixed length is a classic sliding window pattern"
    - "**Think about the inverse**: Instead of checking if all codes exist, collect what exists and compare the count"
    - "**Early termination**: Mathematical bounds can save computation - if there aren't enough windows, the answer is definitely `false`"

  time_complexity: "O(n * k). We slide through the string once (O(n) positions), and at each position we extract a substring of length k (O(k) for hashing/copying)."
  space_complexity: "O(2^k * k). In the worst case, the set stores all 2^k unique binary codes, each of length k characters."

solutions:
  - approach_name: Sliding Window with Hash Set
    is_optimal: true
    code: |
      def has_all_codes(s: str, k: int) -> bool:
          # Total number of unique binary codes of length k
          required = 1 << k  # Same as 2^k

          # Early termination: not enough substrings possible
          if len(s) - k + 1 < required:
              return False

          # Collect all unique substrings of length k
          seen = set()
          for i in range(len(s) - k + 1):
              # Extract the substring at this window position
              code = s[i:i + k]
              seen.add(code)

              # Optimisation: stop early if we've found all codes
              if len(seen) == required:
                  return True

          return len(seen) == required
    explanation: |
      **Time Complexity:** O(n * k) — We visit each of the n - k + 1 positions once, and extracting/hashing a substring of length k takes O(k) time.

      **Space Complexity:** O(2^k * k) — The set can hold up to 2^k strings, each of length k.

      We slide a window of size k across the string, collecting each unique substring in a hash set. If the set reaches size 2^k, we've found all possible binary codes. The early termination when `len(seen) == required` provides a small optimisation.

  - approach_name: Bit Manipulation (Rolling Hash)
    is_optimal: true
    code: |
      def has_all_codes(s: str, k: int) -> bool:
          required = 1 << k  # 2^k

          if len(s) - k + 1 < required:
              return False

          # Use a set of integers instead of strings
          seen = set()
          # Mask to keep only k bits (e.g., k=3 -> mask=0b111=7)
          mask = required - 1

          # Convert first k-1 characters to a number
          current = 0
          for i in range(k - 1):
              current = (current << 1) | (ord(s[i]) - ord('0'))

          # Slide through, updating the hash in O(1) per step
          for i in range(k - 1, len(s)):
              # Shift left and add new bit, then mask to keep k bits
              current = ((current << 1) | (ord(s[i]) - ord('0'))) & mask
              seen.add(current)

              if len(seen) == required:
                  return True

          return len(seen) == required
    explanation: |
      **Time Complexity:** O(n) — Each position is processed in O(1) time since we update the hash with bit operations.

      **Space Complexity:** O(2^k) — The set stores up to 2^k integers.

      Instead of storing substrings, we convert each k-length window to an integer. For example, "101" becomes 5. The rolling hash uses bit shifts: shift left by 1, add the new bit, and mask off the oldest bit. This avoids the O(k) cost of substring extraction and hashing, reducing time complexity from O(n * k) to O(n).

  - approach_name: Brute Force (Generate and Search)
    is_optimal: false
    code: |
      def has_all_codes(s: str, k: int) -> bool:
          # Generate all 2^k binary codes
          required = 1 << k

          for code_num in range(required):
              # Convert number to binary string of length k
              code = bin(code_num)[2:].zfill(k)

              # Check if this code exists in s
              if code not in s:
                  return False

          return True
    explanation: |
      **Time Complexity:** O(n * 2^k) — For each of the 2^k codes, we search the string which takes O(n) time.

      **Space Complexity:** O(k) — We only store one code string at a time.

      This approach generates every possible binary code and checks if it's a substring of s. While intuitive, it's inefficient for large k values. With k = 20, we'd perform over a million substring searches. This solution may cause TLE on LeetCode but illustrates the straightforward approach that the optimal solutions improve upon.