Files
codetutor/backend/data/questions/permutation-in-string.yaml

263 lines
10 KiB
YAML

title: Permutation in String
slug: permutation-in-string
difficulty: medium
leetcode_id: 567
leetcode_url: https://leetcode.com/problems/permutation-in-string/
categories:
- strings
- hash-tables
- two-pointers
patterns:
- slug: sliding-window
is_optimal: true
function_signature: "def check_inclusion(s1: str, s2: str) -> bool:"
test_cases:
visible:
- input: { s1: "ab", s2: "eidbaooo" }
expected: true
- input: { s1: "ab", s2: "eidboaoo" }
expected: false
- input: { s1: "adc", s2: "dcda" }
expected: true
hidden:
- input: { s1: "a", s2: "a" }
expected: true
- input: { s1: "ab", s2: "a" }
expected: false
- input: { s1: "abc", s2: "ccccbbbbaaaa" }
expected: false
- input: { s1: "hello", s2: "ooolleoooleh" }
expected: false
description: |
Given two strings `s1` and `s2`, return `true` if `s2` contains a permutation of `s1`, or `false` otherwise.
In other words, return `true` if one of `s1`'s permutations is the substring of `s2`.
constraints: |
- `1 <= s1.length, s2.length <= 10^4`
- `s1` and `s2` consist of lowercase English letters.
examples:
- input: 's1 = "ab", s2 = "eidbaooo"'
output: "true"
explanation: "s2 contains one permutation of s1 (\"ba\")."
- input: 's1 = "ab", s2 = "eidboaoo"'
output: "false"
explanation: "No permutation of s1 exists as a contiguous substring in s2."
explanation:
intuition: |
Think of this problem as searching for an **anagram** of `s1` hidden somewhere within `s2`.
A permutation of a string is simply a rearrangement of its characters — which means any permutation has **exactly the same character frequencies** as the original. For example, "ab", "ba" are both permutations of each other because they both contain one 'a' and one 'b'.
The key insight is that we don't need to generate all permutations of `s1` (which would be factorial in complexity). Instead, we can slide a **window of size `len(s1)`** across `s2` and check if the characters in that window match the character frequencies of `s1`.
Imagine you have a magnifying glass the exact width of `s1`. As you slide it across `s2` one character at a time, you're checking: "Do the characters under my magnifying glass form an anagram of `s1`?"
This transforms the problem from "find any permutation" to "find a window with matching character counts" — a classic sliding window pattern.
approach: |
We solve this using a **Fixed-Size Sliding Window** with character frequency counting:
**Step 1: Handle edge cases**
- If `s1` is longer than `s2`, it's impossible for `s2` to contain any permutation of `s1` — return `false` immediately
&nbsp;
**Step 2: Build the frequency map for s1**
- Count the frequency of each character in `s1`
- This is our "target" that we want to match
&nbsp;
**Step 3: Initialise the sliding window**
- Create a frequency map for the first `len(s1)` characters of `s2`
- This is our initial window
&nbsp;
**Step 4: Check initial window**
- If the window's frequency map matches `s1`'s frequency map, we found a permutation — return `true`
&nbsp;
**Step 5: Slide the window across s2**
- For each new position, add the incoming character (right side) to the window
- Remove the outgoing character (left side) from the window
- If a character's count drops to zero, remove it from the map entirely (for clean comparison)
- Compare the window's frequency map with `s1`'s — if they match, return `true`
&nbsp;
**Step 6: Return the result**
- If no matching window is found after sliding through all of `s2`, return `false`
common_pitfalls:
- title: Generating All Permutations
description: |
A naive approach might try to generate all permutations of `s1` and check if any exists in `s2`.
For a string of length `n`, there are `n!` (factorial) permutations. With `s1.length <= 10^4`, this would mean up to `10000!` permutations — an astronomically large number that's computationally impossible.
The sliding window approach avoids this entirely by recognising that **character frequency equality implies permutation**.
wrong_approach: "Generate all permutations of s1 and search for each"
correct_approach: "Compare character frequencies using sliding window"
- title: Comparing Strings Instead of Frequencies
description: |
Sorting each window and comparing to sorted `s1` works but is inefficient.
Sorting a window of size `k` takes O(k log k). Doing this for each of the `n - k + 1` windows gives O(n * k log k) overall. For large inputs, this is too slow.
Using hash maps for frequency comparison gives O(1) comparison per window slide (amortised), resulting in O(n) total time.
wrong_approach: "Sort each window and compare to sorted s1"
correct_approach: "Use hash maps to track and compare character frequencies"
- title: Not Cleaning Up Zero Counts
description: |
When a character's count reaches zero in the window map, failing to remove it can break map equality comparisons.
For example, `{'a': 1, 'b': 0}` is not equal to `{'a': 1}` in most implementations, even though they represent the same character set.
Always remove characters from the map when their count reaches zero.
- title: Off-by-One Errors in Window Boundaries
description: |
The window size must be exactly `len(s1)`. Common mistakes include:
- Starting the slide from index 0 instead of `len(s1)`
- Removing the wrong character when sliding (should remove `s2[i - len(s1)]`)
Trace through a small example manually to verify your indices.
key_takeaways:
- "**Permutation = same character frequencies**: Recognising this transforms the problem from combinatorial to linear"
- "**Fixed-size sliding window**: When searching for a pattern of known length, use a window of that exact size"
- "**Hash map comparison**: Comparing character counts is more efficient than generating/sorting permutations"
- "**Pattern recognition**: This problem is nearly identical to *Find All Anagrams in a String* (LeetCode 438) — same technique, different return type"
time_complexity: "O(n). We traverse `s2` once, and each character is added to and removed from the window exactly once. Hash map operations are O(1) amortised."
space_complexity: "O(1). The frequency maps store at most 26 entries (lowercase English letters), which is constant regardless of input size."
solutions:
- approach_name: Sliding Window with Hash Map
is_optimal: true
code: |
from collections import Counter
def check_inclusion(s1: str, s2: str) -> bool:
# Edge case: s1 longer than s2
if len(s1) > len(s2):
return False
# Build frequency map for s1 (our target)
s1_count = Counter(s1)
window_size = len(s1)
# Build frequency map for initial window in s2
window_count = Counter(s2[:window_size])
# Check if initial window matches
if window_count == s1_count:
return True
# Slide the window across s2
for i in range(window_size, len(s2)):
# Add incoming character (right side of window)
window_count[s2[i]] += 1
# Remove outgoing character (left side of window)
left_char = s2[i - window_size]
window_count[left_char] -= 1
# Clean up zero counts for proper comparison
if window_count[left_char] == 0:
del window_count[left_char]
# Check if current window matches s1
if window_count == s1_count:
return True
return False
explanation: |
**Time Complexity:** O(n) — We iterate through `s2` once, with O(1) operations per step.
**Space Complexity:** O(1) — The hash maps contain at most 26 keys (one per lowercase letter).
The `Counter` class from Python's collections module provides a clean way to count character frequencies. Comparing two `Counter` objects with `==` checks if they have the same keys with the same values.
- approach_name: Sliding Window with Array (Optimised)
is_optimal: true
code: |
def check_inclusion(s1: str, s2: str) -> bool:
if len(s1) > len(s2):
return False
# Use arrays instead of hash maps (26 lowercase letters)
s1_count = [0] * 26
window_count = [0] * 26
# Build frequency array for s1
for c in s1:
s1_count[ord(c) - ord('a')] += 1
# Build frequency array for initial window
for i in range(len(s1)):
window_count[ord(s2[i]) - ord('a')] += 1
# Check initial window
if window_count == s1_count:
return True
# Slide the window
for i in range(len(s1), len(s2)):
# Add incoming character
window_count[ord(s2[i]) - ord('a')] += 1
# Remove outgoing character
window_count[ord(s2[i - len(s1)]) - ord('a')] -= 1
if window_count == s1_count:
return True
return False
explanation: |
**Time Complexity:** O(n) — Same as the hash map approach.
**Space Complexity:** O(1) — Fixed arrays of size 26.
This variant uses fixed-size arrays instead of hash maps. Since we know the input contains only lowercase English letters, we can map each character to an index (0-25). Array comparison is slightly faster than hash map comparison in practice.
- approach_name: Sorting Each Window
is_optimal: false
code: |
def check_inclusion(s1: str, s2: str) -> bool:
if len(s1) > len(s2):
return False
# Sort s1 once as our target
sorted_s1 = sorted(s1)
window_size = len(s1)
# Check each window by sorting and comparing
for i in range(len(s2) - window_size + 1):
window = s2[i:i + window_size]
if sorted(window) == sorted_s1:
return True
return False
explanation: |
**Time Complexity:** O(n * k log k) — For each of the `n - k + 1` windows, we sort `k` characters.
**Space Complexity:** O(k) — Space for the sorted window.
While correct, this approach is inefficient for large inputs. Sorting each window repeatedly wastes computation. The sliding window with frequency counting avoids this by incrementally updating counts instead of recomputing from scratch.