feat(patterns): data structure tutorials

2025-08-23 19:25:47 +01:00
parent f105ffa677
commit 7bf6d1f472
4 changed files with 1194 additions and 0 deletions
@@ -0,0 +1,307 @@
 name: Heap / Priority Queue
 slug: heap
 difficulty_level: 3
 description: >
  A data structure that efficiently maintains the minimum or maximum element,
  supporting O(log n) insertion and extraction. Heaps are essential when you
  repeatedly need to access the smallest or largest element from a changing set.
 when_to_use: |
  - Finding K largest/smallest elements
  - K-way merge of sorted lists
  - Finding median from data stream
  - Task scheduling by priority
  - Dijkstra's shortest path algorithm
 metaphor: |
  Imagine a hospital emergency room where patients are treated by urgency, not
  arrival time. A priority queue (heap) lets you always know who's next without
  sorting everyone whenever someone new arrives. The most urgent patient "bubbles
  up" to the front automatically.
  Another analogy: a to-do list that always shows your most important task first.
  When you add or complete tasks, the list reorganizes itself so the highest
  priority is always accessible in O(1) time.
 core_concept: |
  A **heap** is a complete binary tree where each parent is smaller (min-heap) or
  larger (max-heap) than its children. This property guarantees:
  - **Peek min/max**: O(1) — it's always at the root
  - **Insert**: O(log n) — bubble up to maintain heap property
  - **Extract min/max**: O(log n) — remove root, bubble down to reheapify
  Key insight: heaps don't fully sort the data. They only guarantee the root is
  the min/max. This partial ordering is enough for many problems and is more
  efficient than maintaining full sorted order.
  **When to use heaps:**
  - Need repeated access to min/max element
  - Data changes frequently (insertions/deletions)
  - Full sorting is overkill (only need top K, not all elements sorted)
 visualization: |
  **Min-Heap Structure:**
  ```
  Array: [1, 3, 2, 7, 6, 4, 5]
  As tree:
         1          (index 0)
        / \
       3   2        (indices 1, 2)
      / \ / \
     7  6 4  5      (indices 3, 4, 5, 6)
  Parent of index i: (i-1) // 2
  Left child: 2*i + 1
  Right child: 2*i + 2
  ```
  **Inserting 0 into heap:**
  ```
  Add 0 at end:
         1
        / \
       3   2
      / \ / \
     7  6 4  5
    /
   0
  Bubble up (0 < 7, swap):
         1
        / \
       3   2
      / \ / \
     0  6 4  5
    /
   7
  Bubble up (0 < 3, swap):
         1
        / \
       0   2
      / \ / \
     3  6 4  5
    /
   7
  Bubble up (0 < 1, swap):
         0
        / \
       1   2
      / \ / \
     3  6 4  5
    /
   7
  ```
  **Top K Elements using Min-Heap:**
  ```
  Find 3 largest from [3, 1, 4, 1, 5, 9, 2, 6]
  Maintain min-heap of size 3:
  Process 3: heap = [3]
  Process 1: heap = [1, 3]
  Process 4: heap = [1, 3, 4]
  Process 1: 1 <= heap[0]=1, skip
  Process 5: 5 > 1, remove 1, add 5 → heap = [3, 5, 4]
  Process 9: 9 > 3, remove 3, add 9 → heap = [4, 5, 9]
  Process 2: 2 <= 4, skip
  Process 6: 6 > 4, remove 4, add 6 → heap = [5, 9, 6]
  Result: [5, 6, 9] are the top 3
  ```
 code_template: |
  import heapq
  def find_k_largest(nums: list[int], k: int) -> list[int]:
      """Find k largest elements using min-heap."""
      # Min-heap of size k keeps k largest
      heap = []
      for num in nums:
          if len(heap) < k:
              heapq.heappush(heap, num)
          elif num > heap[0]:
              heapq.heapreplace(heap, num)  # Pop min, push new
      return heap
  def find_k_smallest(nums: list[int], k: int) -> list[int]:
      """Find k smallest elements using max-heap (negated values)."""
      # Max-heap (negated) of size k keeps k smallest
      heap = []
      for num in nums:
          if len(heap) < k:
              heapq.heappush(heap, -num)
          elif num < -heap[0]:
              heapq.heapreplace(heap, -num)
      return [-x for x in heap]
  def merge_k_sorted_lists(lists: list[list[int]]) -> list[int]:
      """Merge k sorted lists using min-heap."""
      heap = []
      result = []
      # Initialize heap with first element from each list
      for i, lst in enumerate(lists):
          if lst:
              heapq.heappush(heap, (lst[0], i, 0))
      while heap:
          val, list_idx, elem_idx = heapq.heappop(heap)
          result.append(val)
          # Add next element from same list
          if elem_idx + 1 < len(lists[list_idx]):
              next_val = lists[list_idx][elem_idx + 1]
              heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
      return result
  class MedianFinder:
      """Find median from data stream using two heaps."""
      def __init__(self):
          self.small = []  # Max-heap (negated) for smaller half
          self.large = []  # Min-heap for larger half
      def add_num(self, num: int) -> None:
          # Add to max-heap (smaller half)
          heapq.heappush(self.small, -num)
          # Balance: largest of small should be <= smallest of large
          if self.large and -self.small[0] > self.large[0]:
              heapq.heappush(self.large, -heapq.heappop(self.small))
          # Size balance: small can have at most 1 more element
          if len(self.small) > len(self.large) + 1:
              heapq.heappush(self.large, -heapq.heappop(self.small))
          elif len(self.large) > len(self.small):
              heapq.heappush(self.small, -heapq.heappop(self.large))
      def find_median(self) -> float:
          if len(self.small) > len(self.large):
              return -self.small[0]
          return (-self.small[0] + self.large[0]) / 2
  def kth_smallest_in_matrix(matrix: list[list[int]], k: int) -> int:
      """Find kth smallest in row-wise and column-wise sorted matrix."""
      n = len(matrix)
      heap = [(matrix[0][0], 0, 0)]
      visited = {(0, 0)}
      for _ in range(k - 1):
          val, r, c = heapq.heappop(heap)
          # Add right neighbor
          if c + 1 < n and (r, c + 1) not in visited:
              visited.add((r, c + 1))
              heapq.heappush(heap, (matrix[r][c + 1], r, c + 1))
          # Add bottom neighbor
          if r + 1 < n and (r + 1, c) not in visited:
              visited.add((r + 1, c))
              heapq.heappush(heap, (matrix[r + 1][c], r + 1, c))
      return heap[0][0]
 recognition_signals:
  - "kth largest"
  - "kth smallest"
  - "top k"
  - "merge sorted"
  - "median"
  - "priority"
  - "schedule"
  - "Dijkstra"
  - "frequency"
  - "closest points"
 common_mistakes:
  - title: Using max-heap when min-heap needed (or vice versa)
    description: |
      Python's heapq is a min-heap. Using it directly for "k largest" keeps
      k smallest instead.
    fix: |
      For max-heap behavior, negate values:
      ```python
      heapq.heappush(heap, -num)  # Push negative
      max_val = -heapq.heappop(heap)  # Negate back
      ```
  - title: Wrong heap size for "top K" problems
    description: |
      For "k largest," keeping a max-heap of all elements and extracting k times
      is O(n + k log n). Using min-heap of size k is O(n log k).
    fix: |
      For k largest: use min-heap of size k, remove smallest when full.
      For k smallest: use max-heap of size k, remove largest when full.
  - title: Forgetting tuple comparison order
    description: |
      When heap contains tuples, Python compares by first element, then second,
      etc. If first elements are equal, comparison moves to second element.
    fix: |
      Put the comparison key first in the tuple:
      ```python
      heapq.heappush(heap, (priority, item))
      ```
      If items aren't comparable, use a counter as tiebreaker.
  - title: Modifying heap elements directly
    description: |
      Changing an element's value after it's in the heap breaks heap property.
    fix: |
      Heaps don't support "decrease key" directly. Either: (1) use lazy deletion
      (mark as invalid, skip when popped), or (2) re-heapify the entire heap.
 variations:
  - name: Top K elements
    description: |
      Keep k largest using min-heap of size k, or k smallest using max-heap
      of size k.
    example: "Kth Largest Element, Top K Frequent Elements"
  - name: K-way merge
    description: |
      Merge k sorted lists efficiently by maintaining heap of current elements
      from each list.
    example: "Merge K Sorted Lists, Smallest Range Covering K Lists"
  - name: Two heaps (median)
    description: |
      Maintain two heaps: max-heap for smaller half, min-heap for larger half.
      Median is at the roots.
    example: "Find Median from Data Stream, Sliding Window Median"
  - name: Dijkstra's algorithm
    description: |
      Min-heap tracks vertices by shortest known distance. Extract minimum,
      relax edges, update heap.
    example: "Network Delay Time, Cheapest Flights Within K Stops"
  - name: Task scheduling
    description: |
      Prioritize tasks by some criteria (deadline, duration). Process highest
      priority first.
    example: "Task Scheduler, Meeting Rooms III"
 related_patterns:
  - binary-search
  - two-pointers
 prerequisite_patterns: []
@@ -0,0 +1,269 @@
 name: Monotonic Stack
 slug: monotonic-stack
 difficulty_level: 3
 description: >
  Maintain a stack where elements are always in sorted order (either increasing or
  decreasing). This enables efficient solutions for "next greater element" problems
  by leveraging the stack's ability to track candidates that might be the answer
  for future elements.
 when_to_use: |
  - Next greater/smaller element
  - Previous greater/smaller element
  - Largest rectangle in histogram
  - Daily temperatures
  - Stock span problems
 metaphor: |
  Imagine standing in a line of people of varying heights, all facing forward.
  You want to know who's the next taller person for each person in line. The
  trick: as you walk backward through the line, keep track of "potentially
  useful" tall people. When you encounter someone taller than people you're
  tracking, those shorter people will never be the answer—remove them. The
  remaining stack always contains candidates in decreasing height order.
  Another analogy: a bouncer at a club with height requirements. As people line
  up, anyone shorter than the person in front can be removed from consideration—
  they'll never be visible from the front.
 core_concept: |
  A **monotonic stack** maintains elements in sorted order by popping elements
  that violate the ordering when pushing new ones:
  - **Monotonically decreasing**: Pop elements smaller than current before pushing
  - **Monotonically increasing**: Pop elements larger than current before pushing
  The key insight is that when we pop an element, we've found its "next
  greater/smaller"—it's the current element we're about to push. The stack
  efficiently tracks candidates that might be answers for future elements.
  **Pattern recognition:**
  - "Next greater" → decreasing stack (pop when current > top)
  - "Next smaller" → increasing stack (pop when current < top)
  - "Previous greater/smaller" → process elements and query stack before pushing
 visualization: |
  **Next Greater Element:**
  ```
  Array: [4, 5, 2, 10, 8]
  Find next greater element for each
  Process right to left (or left to right with index tracking):
  Process 8:  stack=[]        → no greater, push 8
              stack=[8]       → answer[4] = -1
  Process 10: stack=[8]       → 10 > 8, pop 8
              stack=[]        → no greater, push 10
              stack=[10]      → answer[3] = -1
  Process 2:  stack=[10]      → 2 < 10, don't pop
              stack=[10,2]    → answer[2] = 10
  Process 5:  stack=[10,2]    → 5 > 2, pop 2
              stack=[10]      → 5 < 10, don't pop
              stack=[10,5]    → answer[1] = 10
  Process 4:  stack=[10,5]    → 4 < 5, don't pop
              stack=[10,5,4]  → answer[0] = 5
  Result: [5, 10, 10, -1, -1]
  ```
  **Largest Rectangle in Histogram:**
  ```
  Heights: [2, 1, 5, 6, 2, 3]
  Use increasing stack (pop when current < top)
  When popping, calculate rectangle with popped height as the smallest bar.
  Process each bar:
  - 2: push (0,2)
  - 1: 1 < 2, pop (0,2) → width=1, area=2×1=2
       push (0,1) [take popped index]
  - 5: push (2,5)
  - 6: push (3,6)
  - 2: 2 < 6, pop (3,6) → width=1, area=6×1=6
       2 < 5, pop (2,5) → width=2, area=5×2=10
       push (2,2)
  - 3: push (5,3)
  - end: pop remaining, calculate areas
  Max area = 10
  ```
 code_template: |
  def next_greater_element(nums: list[int]) -> list[int]:
      """Find next greater element for each position."""
      n = len(nums)
      result = [-1] * n
      stack = []  # Stack of indices
      for i in range(n):
          # Pop elements smaller than current
          while stack and nums[stack[-1]] < nums[i]:
              idx = stack.pop()
              result[idx] = nums[i]
          stack.append(i)
      return result
  def next_smaller_element(nums: list[int]) -> list[int]:
      """Find next smaller element for each position."""
      n = len(nums)
      result = [-1] * n
      stack = []
      for i in range(n):
          # Pop elements larger than current
          while stack and nums[stack[-1]] > nums[i]:
              idx = stack.pop()
              result[idx] = nums[i]
          stack.append(i)
      return result
  def daily_temperatures(temperatures: list[int]) -> list[int]:
      """Days until warmer temperature."""
      n = len(temperatures)
      result = [0] * n
      stack = []  # Stack of indices
      for i in range(n):
          while stack and temperatures[stack[-1]] < temperatures[i]:
              idx = stack.pop()
              result[idx] = i - idx  # Days difference
          stack.append(i)
      return result
  def largest_rectangle_histogram(heights: list[int]) -> int:
      """Largest rectangle area in histogram."""
      stack = []  # Stack of (index, height)
      max_area = 0
      for i, h in enumerate(heights):
          start = i
          while stack and stack[-1][1] > h:
              idx, height = stack.pop()
              max_area = max(max_area, height * (i - idx))
              start = idx  # This index can extend back
          stack.append((start, h))
      # Process remaining in stack
      for idx, height in stack:
          max_area = max(max_area, height * (len(heights) - idx))
      return max_area
  def stock_span(prices: list[int]) -> list[int]:
      """Days since last higher price (inclusive of today)."""
      n = len(prices)
      result = [0] * n
      stack = []  # Stack of indices
      for i in range(n):
          while stack and prices[stack[-1]] <= prices[i]:
              stack.pop()
          # Span = distance to previous higher (or from start)
          result[i] = i - stack[-1] if stack else i + 1
          stack.append(i)
      return result
 recognition_signals:
  - "next greater element"
  - "next smaller element"
  - "previous greater"
  - "daily temperatures"
  - "stock span"
  - "largest rectangle"
  - "histogram"
  - "trapping rain water"
  - "132 pattern"
  - "buildings with ocean view"
 common_mistakes:
  - title: Wrong comparison direction
    description: |
      Using `<` when you should use `>` (or vice versa) results in the wrong
      type of monotonic stack.
    fix: |
      Remember: "next greater" needs decreasing stack, so pop when `nums[top] < current`.
      "Next smaller" needs increasing stack, so pop when `nums[top] > current`.
  - title: Storing values instead of indices
    description: |
      Storing just values makes it impossible to calculate distances (like
      "how many days until...").
    fix: |
      Store indices in the stack. You can always access `nums[stack[-1]]` for
      the value when needed.
  - title: Not processing remaining stack elements
    description: |
      Elements left in the stack after processing all input have no "next
      greater/smaller" in the array.
    fix: |
      After the main loop, process remaining elements. For histogram problems,
      their rectangle extends to the end. For "next greater," their answer is -1.
  - title: Off-by-one with span calculations
    description: |
      Forgetting whether to include the current element in span calculations
      gives wrong results.
    fix: |
      For span problems, if stack is empty, span = i + 1 (from beginning).
      If stack has elements, span = i - stack[-1] (not +1 because previous
      greater is exclusive).
 variations:
  - name: Next greater element
    description: |
      Find the first element to the right that is greater than current.
      Decreasing monotonic stack.
    example: "Next Greater Element I/II, Daily Temperatures"
  - name: Next smaller element
    description: |
      Find the first element to the right that is smaller than current.
      Increasing monotonic stack.
    example: "Next Smaller Element"
  - name: Previous greater/smaller
    description: |
      Query the stack before pushing to find the previous greater/smaller.
      The top of stack is the answer.
    example: "Stock Span, Buildings With Ocean View"
  - name: Largest rectangle
    description: |
      Use increasing stack. When popping, calculate area using popped height
      and width from popped index to current index.
    example: "Largest Rectangle in Histogram, Maximal Rectangle"
  - name: Trapping rain water
    description: |
      Can use monotonic stack to track left boundaries, calculating trapped
      water when finding right boundary. (Alternative: two-pointer approach)
    example: "Trapping Rain Water"
 related_patterns:
  - two-pointers
  - sliding-window
 prerequisite_patterns: []
@@ -0,0 +1,305 @@
 name: Trie
 slug: trie
 difficulty_level: 3
 description: >
  A tree-like data structure for efficient string prefix operations. Each node
  represents a character, and paths from root to nodes spell out prefixes. Tries
  enable O(m) search, insert, and prefix queries where m is the word length.
 when_to_use: |
  - Autocomplete systems
  - Spell checkers
  - Word dictionary with prefix search
  - Word break problems
  - IP routing (longest prefix matching)
 metaphor: |
  Imagine a filing cabinet where files are organized by name, one letter per
  drawer. To find "apple," you open drawer 'a', then find sub-drawer 'p', then
  'p', then 'l', then 'e'. You don't search through all files—you navigate
  directly to the right location. Finding "application" shares the same path
  up to "appl" before diverging.
  Another analogy: a phone book organized as a tree. Instead of a flat
  alphabetical list, common prefixes are grouped, making it fast to find all
  names starting with "Joh" or check if "Johnson" exists.
 core_concept: |
  A **Trie** (pronounced "try") stores strings character by character:
  - **Root**: Empty node representing the empty prefix
  - **Edges**: Labeled with characters
  - **Nodes**: Represent prefixes; may be marked as "end of word"
  Key insight: all words sharing a prefix share the same path from root.
  This makes prefix operations extremely efficient:
  - **Insert word**: O(m) — create path from root
  - **Search word**: O(m) — follow path, check end marker
  - **Starts with prefix**: O(m) — just follow path, no end check needed
  **Trade-off**: Tries use more memory than hash sets (each character is a node),
  but enable prefix queries that hash sets cannot support.
 visualization: |
  **Trie containing: ["app", "apple", "apply", "apt", "bat"]**
  ```
           (root)
          /      \
         a        b
         |        |
         p        a
        / \       |
       p   t*     t*
       |
       l
      / \
     e*  y*
  * = end of word marker
  Paths:
  - "app" → a-p-p*
  - "apple" → a-p-p-l-e*
  - "apply" → a-p-p-l-y*
  - "apt" → a-p-t*
  - "bat" → b-a-t*
  ```
  **Search for "apple":**
  ```
  Start at root
  → 'a': found, move to 'a' node
  → 'p': found, move to 'p' node
  → 'p': found, move to second 'p' node
  → 'l': found, move to 'l' node
  → 'e': found, move to 'e' node
  → end of word marker? Yes!
  "apple" exists ✓
  ```
  **Search for "app":**
  ```
  Follow path a-p-p
  → end of word marker on second 'p'? Yes!
  "app" exists ✓
  ```
  **Starts with "ap":**
  ```
  Follow path a-p
  → reached end of prefix successfully
  Words with prefix "ap" exist ✓
  ```
 code_template: |
  class TrieNode:
      def __init__(self):
          self.children = {}
          self.is_end = False
  class Trie:
      def __init__(self):
          self.root = TrieNode()
      def insert(self, word: str) -> None:
          """Insert a word into the trie."""
          node = self.root
          for char in word:
              if char not in node.children:
                  node.children[char] = TrieNode()
              node = node.children[char]
          node.is_end = True
      def search(self, word: str) -> bool:
          """Check if word exists in trie."""
          node = self._traverse(word)
          return node is not None and node.is_end
      def starts_with(self, prefix: str) -> bool:
          """Check if any word starts with prefix."""
          return self._traverse(prefix) is not None
      def _traverse(self, s: str) -> TrieNode:
          """Traverse trie following string s."""
          node = self.root
          for char in s:
              if char not in node.children:
                  return None
              node = node.children[char]
          return node
  class WordDictionary:
      """Trie with wildcard search support."""
      def __init__(self):
          self.root = TrieNode()
      def add_word(self, word: str) -> None:
          node = self.root
          for char in word:
              if char not in node.children:
                  node.children[char] = TrieNode()
              node = node.children[char]
          node.is_end = True
      def search(self, word: str) -> bool:
          """Search with '.' as wildcard for any character."""
          def dfs(node: TrieNode, i: int) -> bool:
              if i == len(word):
                  return node.is_end
              char = word[i]
              if char == '.':
                  # Try all children
                  return any(dfs(child, i + 1)
                             for child in node.children.values())
              else:
                  if char not in node.children:
                      return False
                  return dfs(node.children[char], i + 1)
          return dfs(self.root, 0)
  def word_break(s: str, word_dict: list[str]) -> bool:
      """Check if string can be segmented into dictionary words."""
      trie = Trie()
      for word in word_dict:
          trie.insert(word)
      n = len(s)
      dp = [False] * (n + 1)
      dp[0] = True  # Empty string can be segmented
      for i in range(n):
          if not dp[i]:
              continue
          node = trie.root
          for j in range(i, n):
              if s[j] not in node.children:
                  break
              node = node.children[s[j]]
              if node.is_end:
                  dp[j + 1] = True
      return dp[n]
  def find_words_with_prefix(trie: Trie, prefix: str) -> list[str]:
      """Find all words starting with prefix."""
      node = trie._traverse(prefix)
      if not node:
          return []
      results = []
      def dfs(node: TrieNode, path: str):
          if node.is_end:
              results.append(path)
          for char, child in node.children.items():
              dfs(child, path + char)
      dfs(node, prefix)
      return results
 recognition_signals:
  - "prefix"
  - "autocomplete"
  - "word dictionary"
  - "spell check"
  - "word search"
  - "word break"
  - "longest common prefix"
  - "starts with"
  - "implement trie"
  - "wildcard"
 common_mistakes:
  - title: Confusing search vs starts_with
    description: |
      Search checks if the exact word exists (must have end marker).
      Starts_with only checks if the prefix path exists.
    fix: |
      For search, always check `node.is_end` at the end:
      ```python
      def search(self, word):
          node = self._traverse(word)
          return node is not None and node.is_end
      ```
  - title: Not handling empty string
    description: |
      Empty string is a valid prefix (everything starts with it) but may not
      be a valid word in the dictionary.
    fix: |
      starts_with("") should return True if trie has any words.
      search("") should return True only if empty string was explicitly inserted.
  - title: Using array instead of dict for children
    description: |
      Using `children = [None] * 26` assumes only lowercase letters. This fails
      for other character sets.
    fix: |
      Use a dictionary for flexibility:
      ```python
      self.children = {}  # Works for any characters
      ```
      Or use array only when character set is known and fixed.
  - title: Memory leaks when deleting words
    description: |
      Simply unmarking is_end doesn't free memory for nodes that are no longer
      part of any word.
    fix: |
      For deletion, either: (1) accept memory isn't freed (common), or
      (2) implement proper deletion that removes orphaned nodes bottom-up.
 variations:
  - name: Basic Trie
    description: |
      Standard insert, search, and prefix check operations.
    example: "Implement Trie (Prefix Tree)"
  - name: Wildcard search
    description: |
      Support '.' as wildcard matching any single character. Requires DFS
      to explore all possibilities when encountering wildcard.
    example: "Design Add and Search Words Data Structure"
  - name: Word search in grid
    description: |
      Use Trie to efficiently search for multiple words in a 2D grid.
      Prune branches that don't match any word prefix.
    example: "Word Search II"
  - name: Autocomplete
    description: |
      Find all words starting with a given prefix. DFS from the prefix
      endpoint to collect all words.
    example: "Design Search Autocomplete System"
  - name: Compressed Trie (Radix Tree)
    description: |
      Merge chains of single-child nodes into one node with a string label.
      Saves space for sparse tries.
    example: "Longest Common Prefix optimizations"
 related_patterns:
  - dfs
  - backtracking
  - dynamic-programming
 prerequisite_patterns: []
@@ -0,0 +1,313 @@
 name: Union Find
 slug: union-find
 difficulty_level: 3
 description: >
  Track disjoint sets with efficient union and find operations. Union-Find
  (also called Disjoint Set Union) excels at dynamically grouping elements and
  answering "are these two elements in the same group?" queries.
 when_to_use: |
  - Finding connected components
  - Detecting cycles in undirected graphs
  - Kruskal's minimum spanning tree
  - Dynamic connectivity queries
  - Grouping related items (accounts merge, friend circles)
 metaphor: |
  Imagine a social network where you want to know if two people are connected
  (directly or through friends of friends). Instead of searching the entire
  network each time, everyone in a connected group points to a group leader.
  To check if two people are connected, just check if they have the same leader.
  When groups merge (someone bridges two groups), you just update one leader to
  point to the other.
  Another analogy: corporate acquisitions. Each company has a parent company
  (possibly itself). When companies merge, one becomes a subsidiary of the other.
  To find the ultimate parent, you follow the chain of ownership.
 core_concept: |
  Union-Find maintains a forest of trees where each tree represents a set.
  Each element points to its parent, and the root of the tree is the set's
  representative.
  **Two key operations:**
  - **Find(x)**: Return the root (representative) of x's set
  - **Union(x, y)**: Merge the sets containing x and y
  **Two key optimizations:**
  - **Path compression**: During Find, make each node point directly to root.
    This flattens the tree for future queries.
  - **Union by rank/size**: Always attach the smaller tree under the larger.
    This keeps trees shallow.
  With both optimizations, operations run in nearly O(1) time—technically
  O(α(n)) where α is the inverse Ackermann function, which is ≤ 4 for any
  practical input size.
 visualization: |
  **Initial state (each element is its own set):**
  ```
  parent: [0, 1, 2, 3, 4]  (each points to itself)
  Sets: {0}, {1}, {2}, {3}, {4}
  ```
  **Union(0, 1):**
  ```
  parent: [0, 0, 2, 3, 4]  (1 now points to 0)
      0
      |
      1
  Sets: {0, 1}, {2}, {3}, {4}
  ```
  **Union(2, 3) then Union(3, 4):**
  ```
  parent: [0, 0, 2, 2, 3]
      0       2
      |      / \
      1     3   (direct)
            |
            4
  Sets: {0, 1}, {2, 3, 4}
  ```
  **Union(1, 4) — merges the two trees:**
  ```
  Find(1) = 0, Find(4) = 2
  Union by rank: attach smaller under larger
  parent: [0, 0, 0, 2, 3]
          0
         /|
        1 2
         / \
        3   (direct)
        |
        4
  Sets: {0, 1, 2, 3, 4}
  ```
  **Path compression during Find(4):**
  ```
  Find(4): 4 → 3 → 2 → 0 (found root)
  Compress: make 4, 3, 2 all point directly to 0
  parent: [0, 0, 0, 0, 0]
      0
     /|\ \
    1 2 3 4
  Now Find(4) is O(1)!
  ```
 code_template: |
  class UnionFind:
      """Union-Find with path compression and union by rank."""
      def __init__(self, n: int):
          self.parent = list(range(n))
          self.rank = [0] * n
          self.count = n  # Number of disjoint sets
      def find(self, x: int) -> int:
          """Find root with path compression."""
          if self.parent[x] != x:
              self.parent[x] = self.find(self.parent[x])
          return self.parent[x]
      def union(self, x: int, y: int) -> bool:
          """Union by rank. Returns True if x and y were in different sets."""
          root_x, root_y = self.find(x), self.find(y)
          if root_x == root_y:
              return False  # Already in same set
          # Union by rank
          if self.rank[root_x] < self.rank[root_y]:
              root_x, root_y = root_y, root_x
          self.parent[root_y] = root_x
          if self.rank[root_x] == self.rank[root_y]:
              self.rank[root_x] += 1
          self.count -= 1
          return True
      def connected(self, x: int, y: int) -> bool:
          """Check if x and y are in the same set."""
          return self.find(x) == self.find(y)
  def count_components(n: int, edges: list[list[int]]) -> int:
      """Count connected components in undirected graph."""
      uf = UnionFind(n)
      for u, v in edges:
          uf.union(u, v)
      return uf.count
  def has_cycle(n: int, edges: list[list[int]]) -> bool:
      """Detect cycle in undirected graph."""
      uf = UnionFind(n)
      for u, v in edges:
          if uf.connected(u, v):
              return True  # Adding edge creates cycle
          uf.union(u, v)
      return False
  def kruskal_mst(n: int, edges: list[tuple[int, int, int]]) -> int:
      """Kruskal's MST algorithm using Union-Find."""
      # edges are (weight, u, v)
      edges.sort()  # Sort by weight
      uf = UnionFind(n)
      mst_weight = 0
      edges_used = 0
      for weight, u, v in edges:
          if uf.union(u, v):
              mst_weight += weight
              edges_used += 1
              if edges_used == n - 1:
                  break
      return mst_weight if edges_used == n - 1 else -1
  def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
      """Merge accounts with common emails."""
      email_to_id = {}
      email_to_name = {}
      uf = UnionFind(len(accounts))
      # Map emails to account indices
      for i, account in enumerate(accounts):
          name = account[0]
          for email in account[1:]:
              email_to_name[email] = name
              if email in email_to_id:
                  uf.union(i, email_to_id[email])
              else:
                  email_to_id[email] = i
      # Group emails by root account
      from collections import defaultdict
      root_to_emails = defaultdict(set)
      for email, idx in email_to_id.items():
          root = uf.find(idx)
          root_to_emails[root].add(email)
      # Build result
      return [[email_to_name[next(iter(emails))]] + sorted(emails)
              for emails in root_to_emails.values()]
 recognition_signals:
  - "connected components"
  - "disjoint sets"
  - "union"
  - "groups"
  - "merge accounts"
  - "friend circles"
  - "detect cycle undirected"
  - "Kruskal"
  - "minimum spanning tree"
  - "redundant connection"
  - "equivalence"
 common_mistakes:
  - title: Forgetting path compression
    description: |
      Without path compression, repeated Find operations can be O(n) each,
      degrading overall performance.
    fix: |
      Always compress paths during Find:
      ```python
      if self.parent[x] != x:
          self.parent[x] = self.find(self.parent[x])
      ```
  - title: Using Union-Find for directed graphs
    description: |
      Union-Find assumes undirected connections. For directed graphs, cycles
      mean something different (back edges in DFS).
    fix: |
      Use DFS with coloring (WHITE/GRAY/BLACK) for cycle detection in directed
      graphs. Union-Find is for undirected connectivity.
  - title: Not tracking component count
    description: |
      For problems asking "how many components," manually counting at the end
      is inefficient.
    fix: |
      Decrement count in union when merging two different sets:
      ```python
      if root_x != root_y:
          self.count -= 1
      ```
  - title: Union returning wrong information
    description: |
      Some solutions need to know if a union actually merged two sets or if
      they were already connected.
    fix: |
      Return boolean from union indicating if merge happened:
      ```python
      if root_x == root_y:
          return False  # Already same set
      # ... do union ...
      return True  # Merged
      ```
 variations:
  - name: Basic connectivity
    description: |
      Track whether elements are in the same connected component.
    example: "Number of Connected Components, Friend Circles"
  - name: Cycle detection
    description: |
      If union is called on two already-connected elements, adding that edge
      would create a cycle.
    example: "Redundant Connection, Graph Valid Tree"
  - name: Kruskal's MST
    description: |
      Sort edges by weight, greedily add edges that don't create cycles
      (checked via Union-Find).
    example: "Min Cost to Connect All Points, Connecting Cities With Minimum Cost"
  - name: Dynamic connectivity
    description: |
      Handle streaming edge insertions while answering connectivity queries.
    example: "Evaluate Division, Accounts Merge"
  - name: Weighted Union-Find
    description: |
      Track relative weights/distances between elements and their roots.
      Used in problems with equivalence relationships.
    example: "Evaluate Division (weighted paths)"
 related_patterns:
  - dfs
  - bfs
 prerequisite_patterns: []