questions D-E

2025-05-25 11:08:40 +01:00
parent 615e3f1291
commit 360b5fa255
18 changed files with 4022 additions and 0 deletions
@@ -0,0 +1,233 @@
+title: Design HashMap
+slug: design-hashmap
+difficulty: easy
+leetcode_id: 706
+leetcode_url: https://leetcode.com/problems/design-hashmap/
+categories:
+  - arrays
+  - hash-tables
+  - linked-lists
+patterns:
+  - heap
+
+description: |
+  Design a HashMap without using any built-in hash table libraries.
+
+  Implement the `MyHashMap` class:
+
+  - `MyHashMap()` initialises the object with an empty map.
+  - `void put(int key, int value)` inserts a `(key, value)` pair into the HashMap. If the `key` already exists in the map, update the corresponding `value`.
+  - `int get(int key)` returns the `value` to which the specified `key` is mapped, or `-1` if this map contains no mapping for the `key`.
+  - `void remove(int key)` removes the `key` and its corresponding `value` if the map contains the mapping for the `key`.
+
+constraints: |
+  - `0 <= key, value <= 10^6`
+  - At most `10^4` calls will be made to `put`, `get`, and `remove`.
+
+examples:
+  - input: |
+      ["MyHashMap", "put", "put", "get", "get", "put", "get", "remove", "get"]
+      [[], [1, 1], [2, 2], [1], [3], [2, 1], [2], [2], [2]]
+    output: "[null, null, null, 1, -1, null, 1, null, -1]"
+    explanation: |
+      MyHashMap myHashMap = new MyHashMap();
+      myHashMap.put(1, 1); // The map is now [[1,1]]
+      myHashMap.put(2, 2); // The map is now [[1,1], [2,2]]
+      myHashMap.get(1);    // return 1
+      myHashMap.get(3);    // return -1 (not found)
+      myHashMap.put(2, 1); // The map is now [[1,1], [2,1]] (update existing value)
+      myHashMap.get(2);    // return 1
+      myHashMap.remove(2); // remove the mapping for 2, map is now [[1,1]]
+      myHashMap.get(2);    // return -1 (not found)
+
+explanation:
+  intuition: |
+    A hash map is one of the most fundamental data structures in computer science. At its core, it provides **constant-time average lookup** by converting keys into array indices using a *hash function*.
+
+    Think of it like a library with numbered shelves. Instead of searching every shelf for a book, you use the book's title to calculate which shelf number it belongs on. When you want to find it later, you perform the same calculation and go directly to that shelf.
+
+    The challenge arises when two different keys "hash" to the same index — this is called a **collision**. Imagine two books that both map to shelf #42. We need a strategy to handle this:
+
+    1. **Chaining**: Each shelf holds a list (or linked list) of all items that hashed there
+    2. **Open Addressing**: If a shelf is occupied, look for the next empty shelf
+
+    For this problem, **chaining with linked lists** is the most intuitive approach. Each "bucket" in our array holds a linked list of key-value pairs. When we add, get, or remove, we first hash the key to find the bucket, then traverse the linked list to find the specific entry.
+
+  approach: |
+    We implement a hash map using **chaining** with an array of linked list heads.
+
+    **Step 1: Choose an array size and hash function**
+
+    - Create an array of size `1000` (a reasonable size that balances memory and collision rate)
+    - Use a simple modulo hash function: `hash(key) = key % size`
+    - Each array position is a "bucket" that will hold a linked list head
+
+    &nbsp;
+
+    **Step 2: Define the linked list node structure**
+
+    - Each node stores a `key`, `value`, and pointer to the `next` node
+    - This allows multiple key-value pairs with the same hash to coexist in one bucket
+
+    &nbsp;
+
+    **Step 3: Implement `put(key, value)`**
+
+    - Calculate the bucket index using the hash function
+    - Traverse the linked list at that bucket
+    - If the key already exists, update its value
+    - If not found, append a new node to the end of the list
+
+    &nbsp;
+
+    **Step 4: Implement `get(key)`**
+
+    - Calculate the bucket index
+    - Traverse the linked list looking for the key
+    - Return the value if found, or `-1` if not found
+
+    &nbsp;
+
+    **Step 5: Implement `remove(key)`**
+
+    - Calculate the bucket index
+    - Traverse the linked list to find the key
+    - Remove the node by updating the previous node's `next` pointer
+    - Use a dummy head node to simplify edge cases (removing the first node)
+
+  common_pitfalls:
+    - title: Forgetting to Handle Collisions
+      description: |
+        A naive approach might use the hash directly as an array index without handling cases where multiple keys hash to the same bucket.
+
+        For example, with `size = 1000`, both `key = 5` and `key = 1005` hash to index `5`. Without a collision resolution strategy, one would overwrite the other.
+
+        Using linked list chaining ensures all keys with the same hash coexist in the same bucket.
+      wrong_approach: "Single value per bucket (overwrites on collision)"
+      correct_approach: "Linked list per bucket to chain colliding entries"
+
+    - title: Not Updating Existing Keys
+      description: |
+        The `put` operation must check if the key already exists before inserting. If it exists, we update the value rather than adding a duplicate entry.
+
+        Failing to do this creates multiple nodes with the same key, causing `get` to return stale values and wasting memory.
+      wrong_approach: "Always append without checking for existing key"
+      correct_approach: "Traverse the list first, update if found, else append"
+
+    - title: Removal Edge Cases
+      description: |
+        Removing a node from a linked list requires updating the `next` pointer of the **previous** node. This is tricky when removing the first node in a bucket.
+
+        Using a dummy head node as a sentinel simplifies the logic — the dummy's `next` always points to the real first node, so removal logic is uniform.
+      wrong_approach: "Special-case removal of head node separately"
+      correct_approach: "Use a dummy head node so all removals follow the same pattern"
+
+  key_takeaways:
+    - "**Hash function basics**: A good hash function distributes keys uniformly across buckets to minimise collisions"
+    - "**Chaining vs open addressing**: Chaining uses linked lists for collision resolution; open addressing probes for empty slots"
+    - "**Trade-off between space and time**: More buckets mean fewer collisions (faster lookup) but more memory; fewer buckets save memory but increase collision likelihood"
+    - "**Foundation for real hash tables**: This simple implementation illustrates the core mechanics behind Python's `dict`, Java's `HashMap`, and other production hash tables"
+
+  time_complexity: "O(n/k) average for all operations, where `n` is the number of keys and `k` is the number of buckets. With a good hash function and sufficient buckets, this approaches O(1)."
+  space_complexity: "O(k + n). We use `k` buckets for the array plus `n` nodes for the stored key-value pairs."
+
+solutions:
+  - approach_name: Chaining with Linked Lists
+    is_optimal: true
+    code: |
+      class ListNode:
+          """Node for the linked list in each bucket."""
+          def __init__(self, key: int = -1, value: int = -1, next: 'ListNode' = None):
+              self.key = key
+              self.value = value
+              self.next = next
+
+
+      class MyHashMap:
+          def __init__(self):
+              # Choose a prime number for better distribution
+              self.size = 1000
+              # Each bucket starts with a dummy head node
+              self.buckets = [ListNode() for _ in range(self.size)]
+
+          def _hash(self, key: int) -> int:
+              """Simple modulo hash function."""
+              return key % self.size
+
+          def put(self, key: int, value: int) -> None:
+              # Find the bucket for this key
+              index = self._hash(key)
+              curr = self.buckets[index]
+
+              # Traverse to find if key exists (skip dummy head)
+              while curr.next:
+                  if curr.next.key == key:
+                      # Key exists - update value
+                      curr.next.value = value
+                      return
+                  curr = curr.next
+
+              # Key not found - append new node
+              curr.next = ListNode(key, value)
+
+          def get(self, key: int) -> int:
+              # Find the bucket for this key
+              index = self._hash(key)
+              curr = self.buckets[index].next  # Skip dummy head
+
+              # Traverse looking for the key
+              while curr:
+                  if curr.key == key:
+                      return curr.value
+                  curr = curr.next
+
+              # Key not found
+              return -1
+
+          def remove(self, key: int) -> None:
+              # Find the bucket for this key
+              index = self._hash(key)
+              curr = self.buckets[index]
+
+              # Find the node before the one to remove
+              while curr.next:
+                  if curr.next.key == key:
+                      # Remove by skipping over the node
+                      curr.next = curr.next.next
+                      return
+                  curr = curr.next
+    explanation: |
+      **Time Complexity:** O(n/k) average for `put`, `get`, and `remove`, where `n` is the total number of keys and `k` is the number of buckets (1000). In the worst case (all keys hash to one bucket), operations are O(n).
+
+      **Space Complexity:** O(k + n) — `k` bucket heads plus `n` nodes for stored entries.
+
+      This implementation uses chaining with linked lists. Each bucket has a dummy head node to simplify insertion and removal logic. The hash function is a simple modulo operation. While not optimal for production use, it demonstrates the core concepts of hash table implementation.
+
+  - approach_name: Direct Array (Large Array)
+    is_optimal: false
+    code: |
+      class MyHashMap:
+          def __init__(self):
+              # Use array size of 10^6 + 1 to cover all possible keys
+              # Each position stores the value, or -1 if empty
+              self.data = [-1] * (10**6 + 1)
+
+          def put(self, key: int, value: int) -> None:
+              # Direct indexing - no hash function needed
+              self.data[key] = value
+
+          def get(self, key: int) -> int:
+              # Return value at index, -1 if never set
+              return self.data[key]
+
+          def remove(self, key: int) -> None:
+              # Reset to -1 to indicate removed
+              self.data[key] = -1
+    explanation: |
+      **Time Complexity:** O(1) for all operations — direct array indexing.
+
+      **Space Complexity:** O(10^6) — fixed array size regardless of actual usage.
+
+      This approach exploits the constraint that keys are in range `[0, 10^6]`. By allocating an array large enough to hold all possible keys, we avoid hashing and collisions entirely. Each key maps directly to its index.
+
+      While this achieves true O(1) time, it wastes significant memory when few keys are stored. It's included to show the trade-off between time and space. The chaining approach is preferred in practice as it scales with actual usage.