Files
codetutor/backend/data/questions/median-of-two-sorted-arrays.yaml

169 lines
6.9 KiB
YAML

title: Median of Two Sorted Arrays
slug: median-of-two-sorted-arrays
difficulty: hard
leetcode_id: 4
leetcode_url: https://leetcode.com/problems/median-of-two-sorted-arrays/
categories:
- arrays
- binary-search
patterns:
- binary-search
description: |
Given two sorted arrays `nums1` and `nums2` of size `m` and `n` respectively, return **the median** of the two sorted arrays.
The overall run time complexity should be **O(log(m+n))**.
constraints: |
- `nums1.length == m`
- `nums2.length == n`
- `0 <= m <= 1000`
- `0 <= n <= 1000`
- `1 <= m + n <= 2000`
- `-10^6 <= nums1[i], nums2[i] <= 10^6`
examples:
- input: "nums1 = [1,3], nums2 = [2]"
output: "2.0"
explanation: "Merged array is [1,2,3]. Median is 2."
- input: "nums1 = [1,2], nums2 = [3,4]"
output: "2.5"
explanation: "Merged array is [1,2,3,4]. Median is (2+3)/2 = 2.5."
explanation:
intuition: |
The median divides a sorted array into two equal halves. For two sorted arrays, we need to find a **partition** that puts exactly half the total elements on the left and half on the right.
Think of it like this: imagine cutting both arrays with vertical lines. If we take `i` elements from `nums1` and `j` elements from `nums2` for the "left half", we need `i + j = (m + n + 1) // 2`. For this partition to be valid:
- Everything in the left half ≤ Everything in the right half
The key insight: once we choose `i` (how many from `nums1`), `j` is determined. So we **binary search on `i`**!
For a valid partition:
- `nums1[i-1] <= nums2[j]` (left of nums1 ≤ right of nums2)
- `nums2[j-1] <= nums1[i]` (left of nums2 ≤ right of nums1)
If not valid, adjust `i`: if `nums1[i-1] > nums2[j]`, we took too many from nums1 — decrease `i`.
approach: |
We solve this using **Binary Search on Partition**:
**Step 1: Ensure nums1 is the smaller array**
- If `m > n`, swap the arrays
- This guarantees a valid `j` always exists and improves efficiency
&nbsp;
**Step 2: Binary search for the correct partition**
- Search for `i` in range `[0, m]` (elements taken from nums1)
- Calculate `j = half_len - i` where `half_len = (m + n + 1) // 2`
- For each `i`, check if partition is valid
&nbsp;
**Step 3: Handle boundary cases with infinity**
- If `i = 0`, there's no left element in nums1 → use `-infinity`
- If `i = m`, there's no right element in nums1 → use `+infinity`
- Same for `j = 0` and `j = n` in nums2
&nbsp;
**Step 4: Compute the median**
- If partition is valid:
- **Odd total**: median = `max(left1, left2)`
- **Even total**: median = `(max(left1, left2) + min(right1, right2)) / 2`
- If not valid, adjust binary search bounds
&nbsp;
The median is formed by the boundary elements at the valid partition.
common_pitfalls:
- title: Not Handling Boundary Cases
description: |
When `i = 0` or `i = m`, there's no left or right element in nums1. Accessing `nums1[i-1]` or `nums1[i]` would be out of bounds.
Use `float('-inf')` for missing left elements and `float('inf')` for missing right elements. This ensures comparisons always work correctly.
wrong_approach: "Accessing nums1[i-1] when i = 0"
correct_approach: "nums1_left = float('-inf') if i == 0 else nums1[i-1]"
- title: Binary Searching on the Longer Array
description: |
Always search on the shorter array. If `m > n` and we search on nums1, `j = half_len - i` might become negative (invalid).
Swapping ensures `j` is always valid: `0 <= j <= n`.
wrong_approach: "Binary searching on the longer array"
correct_approach: "if m > n: swap arrays, then binary search on the shorter one"
- title: Odd vs Even Total Length
description: |
For **odd** total `(m + n)`: the median is a single value — `max(left1, left2)`.
For **even** total: the median is the average of two middle values.
Getting this wrong produces incorrect results for half the test cases.
wrong_approach: "Always averaging two values"
correct_approach: "Check (m + n) % 2 and handle odd/even separately"
key_takeaways:
- "**Binary search on partition, not values**: Search for how many elements to take from nums1"
- "**Partition both arrays to split total elements in half**: Once we choose `i`, `j` is determined"
- "**Handle boundaries with infinity**: Prevents index errors at array edges"
- "**O(log min(m,n))**: Binary search on the smaller array is sufficient"
time_complexity: "O(log min(m, n)). Binary search on the smaller array."
space_complexity: "O(1). Only constant extra variables for pointers and boundary values."
solutions:
- approach_name: Binary Search on Partition
is_optimal: true
code: |
def find_median_sorted_arrays(nums1: list[int], nums2: list[int]) -> float:
# Ensure nums1 is the smaller array for valid j values
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1
m, n = len(nums1), len(nums2)
half_len = (m + n + 1) // 2 # Size of left half (ceiling for odd total)
left, right = 0, m # Binary search bounds for i
while left <= right:
i = (left + right) // 2 # Elements from nums1 in left half
j = half_len - i # Elements from nums2 in left half
# Handle boundary cases with infinity
nums1_left = float('-inf') if i == 0 else nums1[i - 1]
nums1_right = float('inf') if i == m else nums1[i]
nums2_left = float('-inf') if j == 0 else nums2[j - 1]
nums2_right = float('inf') if j == n else nums2[j]
# Check if partition is valid
if nums1_left <= nums2_right and nums2_left <= nums1_right:
# Valid partition found — compute median
if (m + n) % 2 == 1:
# Odd total: median is max of left half
return max(nums1_left, nums2_left)
else:
# Even total: median is average of middle two
return (max(nums1_left, nums2_left) +
min(nums1_right, nums2_right)) / 2
elif nums1_left > nums2_right:
# Too many from nums1, decrease i
right = i - 1
else:
# Too few from nums1, increase i
left = i + 1
return 0.0 # Should never reach here with valid input
explanation: |
**Time Complexity:** O(log min(m, n)) — Binary search on the smaller array.
**Space Complexity:** O(1) — Only constant extra variables.
We binary search for the correct partition point in the smaller array. A valid partition has all left elements ≤ all right elements. Once found, the median is computed from the four boundary elements: max of left side for odd totals, average of max-left and min-right for even totals.