Problem Statement:
Giving m unsorted arrays, return the k smallest elements
Method #1: Min-heap
Method 1: Using bottom-up heap construction to build a min-heap, then pop the smallest element n times
Complexity analysis:
Time to construct min-heap: O(n)
Time to extract min k times: O(k * lgn)
Total time complexity: O(n + k * lgn)
Total space complexity: O(1)
def k_smallest_1(array, k):
"""
input: int[] array, int k
return: int[]
Notice:
1. the children for node i is 2*i+1 and 2*i+2
"""
def swap(i, j):
array[i], array[j] = array[j], array[i]
def heapify(i):
left, right = 2 * i + 1, 2 * i + 2
if left >= len(array):
return
index = left if right >= len(array) or array[left] < right else right
if array[index] < array[i]:
swap(i, index)
heapify(index)
for i in range(len(array)):
heapify(len(array) - i - 1)
res = []
for i in range(k):
res.append(array.pop(0))
heapify(0)
return res
Method #2: Max-heap
Method 2: Using a size k max-heap, and loop through the rest n-k element, if element e is smaller than the
largest number in the heap, extract the max element and insert element e into the heap
Complexity analysis:
Time to construct a size k max-heap: O(k)
Time to extract-max and insert n-k times: O((n-k) * lgk)
Time to sort the max-heap by extract min k times= O(k * lgk)
Total time complexity: O(k + (n-k) * lgk + k * lgk) = O((n * lgk)
def k_smallest_2(array, k):
"""
Notice:
1. we can use the array[:k] as our max-heap so when we do the heapify, the boundary should be k instead of n
"""
def swap(i, j):
array[i], array[j] = array[j], array[i]
# heapify in max-heap
def heapify(i):
left, right = 2 * i + 1, 2 * i + 2
if left >= k:
return
index = left if right >= k or array[left] > right else right
if array[index] > array[i]:
swap(i, index)
heapify(index)
for i in range(k):
heapify(k - i - 1)
for i in range(k, len(array)):
if array[i] < array[0]:
swap(i, 0)
heapify(0)
return sorted(array[:k])
Method #3: Quick Select
Method 3: Using quick select to find the k th element: randomly pick a pivot point and sort the array s.t. all
elements smaller or equal than pivot are on the left and all elements larger than pivot are on the right. Compare
pivot's index p with k:
1. if p == k: pivot is the kth smallest element
2. if p > k: search the kth smallest in the array[0:p]
3. if p < k: search the (k-p-1) in array[p+1, n]
After we find the kth smallest element as the pivot, return a sorted array of the first k element.
Complexity analysis:
Time to find out the kth element as pivot:
- Assume the randomly picked pivot is always the mid point, that is, it can always separate the array into two
equal parts. Then the total element we need to examine is O(n + n/2 + n/4 + ...) = O(n).
- Or we can use the master theorem, f(n) = O(time to divide) + O(time to combine) = O(n/2) + O(1) = O(n) in
average, a = # of sub-problems = 1, n/b = size of sub-problem: n/2 in average => n ** log_b(a) = O(1). This
is case #3: total time complexity if O(n)
Time to sort the first k elements: O(k * lgk)
Average total time complexity: O(k * lgk + n)
def k_smallest_3(array, k):
"""
Notice:
1. In actually implementation, since we are using the left, right index boundary, we are actually always looking for if the pivot's index between left and right is k-1 or not.
2. Like quick sort, in quick selection, after finding the pivot point, swap pivot with the rightmost index, perform a rainbow sort and swap it back with the left index
"""
if k == 0:
return []
def swap(i, j):
array[i], array[j] = array[j], array[i]
def quick_select(left, right):
pivot = random.randint(left, right)
swap(pivot, right)
l, r = left, right - 1
while l <= r:
if array[l] <= array[right]:
l += 1
elif array[r] > array[right]:
r -= 1
else:
swap(l, r)
l += 1
r -= 1
swap(l, right)
if l == k - 1:
return
elif l > k - 1:
quick_select(left, l - 1)
else:
quick_select(left + 1, right)
quick_select(0, len(array) - 1)
return sorted(array[:k])
Variant #1: Find k smallest elements in m sorted arrays
Method #4: Min heap
Method #4: Initialize a min-heap of size m containing the first element of each element. Execute extract min k times
and after popping an element, add the next element into the heap
Complexity analysis:
Time to construct heap: O(m)
Time to extract-min and insert k times: O(k * lgm)
Total time complexity: O(m + k * lgm)
def kth_smallest_1(matrix, k):
"""
Notice:
1. heapq.heapify() method transforms the list in-place without returned the modified list
2. syntax for enumerate a list: for idx, item in enumerate(list)
3. how to override comparator of an object:
setattr(obj, "__lt__", lambda self, other: self.val <= other.val)
4. how to make a max-heap:
using -1 * val in a min-heap
"""
queue = [(row[0], idx, 0) for idx, row in enumerate(matrix)]
heapq.heapify(queue)
heapq._heapify_max(queue)
res = None
for i in range(k):
res, row_idx, col_idx = heapq.heappop(queue)
if col_idx + 1 < len(matrix[row_idx]):
heapq.heappush(queue, (matrix[row_idx][col_idx + 1], row_idx, col_idx + 1))
return res
Method #5: Max-heap
Method #5: Initialize a max-heap of size k, scan through the rest (N - k) elements (N is the total number elements
in m sorted array, if e < root, execute extract max and insert(e)
Complexity analysis:
Time to construct max-heap: O(k)
Time to extract max and insert N - k times: O((N - k) * lg(k))
This is slower than method #4 since it's a general method to find k smallest in unsorted array
Variant #2: Sorted Matrix
if m sorted array have the same size n, and the columns are also sorted. Find the kth smallest number
Method #6: Binary Search
Method #6: Since the minimum number in matrix of size m*n is matrix[0][0] and the maximum number is matrix[m-1][n-1]. We can do binary search within range [matrix[0][0], matrix[m-1][n-1]]. kth smallest is equivalent to the smallest number num in the range s.t. |elements in matrix that are less than or equal to num| >= k.
Complexity Analysis:
To check if a number num satisfies such quality needs only O(m+n) time: Notice if we know the number in row X-1 androw X+1 that are less or equal to num, i and j, we only need O(i-j) time to check row X.
row X-1 xxxxxxxxxxxxxxxxxxixxxxxxxxx
row X xxxxxxxxxxxxx[ ]xxxxxxxxx
row X+1 xxxxxxxxxxxxxjxxxxxxxxxxxxxx
X[j] <= X+1[j] <= num and X[i+1] >= X-1[i+1] > num => We only need to examine X[j+1:i+1] to find out the index of the largest number smaller than or equal to num. We can apply this recursively to find out the time complexity of checking all m rows is just O(n + m).
Total time complexity = O(lg(MAX - MIN) * (m + n))
Follow-up: can we use another binary search in finding the inf(num) in X[i+1:j+1] ?
Time complexity: O(lg n * m) worse than O(m + n)
Compare with method #1:
O(lg(MAX - MIN) * (m + n)) vs O(m + k * lgm)
def kth_smallest_3(matrix, k):
"""
Notice:
1. we can count num of element le than num by scanning from the top right corner to the bottom left corner
"""
m, n = len(matrix), len(matrix[0])
def count_num_le(num):
cnt = 0
row_idx, col_idx = 0, n - 1 # start from row 0
while col_idx >= 0 and row_idx < m:
if matrix[row_idx][col_idx] <= num:
cnt += col_idx + 1
row_idx += 1
else:
col_idx -= 1
return cnt
left, right = matrix[0][0], matrix[m - 1][n - 1]
while left < right - 1:
mid = left + (right - left >> 1)
cnt = count_num_le(mid)
if cnt < k:
left = mid + 1
else:
right = mid
return left if count_num_le(left) >= k else right
Variant #3 (Leetcode #4) Find medium in two sorted array
What if m == 2? Find the medium, assume arr1 has n1 elements, arr2 has n2 elements and n = n1 + n2.
Method #7: Medium Cut
Method #7:
Consider the definition of medium: a number that can separate the array into two equal half of size n // 2
Let's say a cut i is a number between 0 to n1 that separate arr1 into two parts: LEFT = arr1[0 ~ (i - 1)] and
RIGHT = arr1[i ~ (n1- 1)]. Let j = n // 2 - i. We can now add j more element from arr2 into LEFT s.t. LEFT is the
combination of arr1[0 ~ (i - 1)] and arr2[0 ~ (j - 1)] where the size(LEFT) = i + j = i + n // 2 - 1 = n // 2.
LEFT RIGHT
arr1 0 ~ (i - 1) | i ~ (n1 - 1)
arr2 0 ~ (j - 1) | j ~ (n2 - 1)
Notice the range of i can be 0 to n1, range of j can be 0 to n2. The difference between size(LEFT) and size(
RIGHT) is at most 1 (when n is odd).
For a given cut i, i is a valid cut if 0 <= j <= n2. j < 0 or j > n2 means for cut i there is no ways for us to
add j elements form arr2 to make two equal parts.
A valid cut is a medium cut if and only if max(LEFT) <= min(RIGHT), i.e. arr1[i - 1] <= arr2[j]
and arr2[j - 1] <= arr1[i].
Then the problem becomes finding a valid cut within range [0, n1] s.t. this cut is a medium cut. We can use
binary search to find such cut.
For any valid cut i, compare arr1[i-1] vs arr2[j] and arr2[j-1] vs arr1[i]
1. if arr1[i-1] <= arr2[j] and arr2[j-1] <= arr1[i]:
we found the medium
2. if j < 0 or arr1[i-1] > arr2[j]:
move i to the left
3. if j > n2 or arr2[j-1] > arr1[i]:
move i to the right
Notice when we want to compare the value, we need to make sure that their index is valid. Special cases are when
i = 0, i = n1, j = 0 and j = n2.
When we find cut i successfully separate the arrays into two equal half, we output the medium by following:
if i is odd:
min(arr1[i], arr2[j]) is the medium: if i is odd, size(RIGHT) = size(LEFT) + 1. The minimum element
in the RIGHT is the medium
if i is even:
(max(arr1[i - 1], arr2[j - 1]) + min(arr1[i], arr2[j])) / 2 is the medium: if i is even,
size(LEFT) = size(RIGHT). The medium is the average element cross the cut i. Notice we also need to make
sure these indexes are valid.
Complexity Analysis:
O(min(n1, n2))
def find_medium_in_two_sorted_array(arr1, arr2):
n1, n2 = len(arr1), len(arr2)
n = n1 + n2
left, right = 0, n1
while left <= right:
i = left + (right - left >> 1)
j = n // 2 - i
# compare arr1[i-1] and arr2[j] if they exist
if j < 0 or (i > 0 and j < n2 and arr1[i - 1] > arr2[j]):
right = i - 1
# compare arr2[j-1] and arr1[i] if they exist
elif j > n2 or (i < n1 and j > 0 and arr2[j - 1] > arr1[i]):
left = i + 1
else:
left_max = max(arr1[i - 1] if i > 0 else float('-inf'), arr2[j - 1] if j > 0 else float('-inf'))
right_min = min(arr1[i] if i < n1 else float('inf'), arr2[j] if j < n2 else float('inf'))
return right_min if n & 1 else (left_max + right_min) / 2
Variant #4: Find kth smallest in two sorted array
Method #8: K-th Cut
Method #8: This is similar to Method #7. A cut i is the k-th cut is size(LEFT) = k
Let j = k - i
LEFT RIGHT
arr1 0 ~ (i - 1) | i ~ (n1 - 1)
arr2 0 ~ (j - 1) | j ~ (n2 - 1)
We can perform binary search to find i in range 0 ~ n1. Notice j is in range of 0 to n2 as well.
Compare arr1[i-1] vs arr2[j] and arr2[j-1] vs arr1[i]:
if j < 0 or arr1[i-1] > arr2[j]:
move i to the left
if j > n2 or arr2[j-1] > arr1[i]:
move i to the right
else
return max(arr1[i-1], arr2[j-1])
Notice we need to examine if these indexes are valid before accessing
def find_kth_smallest_in_two_sorted_1(arr1, arr2, k):
n1, n2 = len(arr1), len(arr2)
left, right = 0, len(arr1)
while left <= right:
i = left + (right - left >> 1)
j = k - i
if j < 0 or (i > 0 and j < n2 and arr1[i - 1] > arr2[j]):
right = i - 1
elif j > n2 or (i < n1 and j > 0 and arr2[j - 1] > arr1[i]):
left = i + 1
else:
a_left = float('-inf') if i == 0 else arr1[i - 1]
b_left = float('-inf') if j == 0 else arr2[j - 1]
return max(a_left, b_left)
Method #9: Binary Search
Method #9: To find the kth smallest element in the combination of arr1 and arr2, we can compare arr1[k//2 - 1] and
arr2[k//2 - 1]: we know the total number of elements in arr1[0 : k//2] + arr2[0 : k//2] is k//2 * 2 <= k:
if arr1[k//2 - 1] < arr2[k//2 - 1]:
arr1[0 : k//2] is less than arr2[k//2 - 1], they can not be the kth element. Remove arr1[0: k//2] and update
k to k - k // 2
if arr1[k//2 - 1] < arr2[k//2 - 1]:
arr2[0 : k//2] is less than arr1[k//2 - 1], they can not be the kth element. Remove arr2[0: k//2] and update
k to k - k // 2
if arr1[k//2 - 1] = arr2[k//2 - 1]:
we found the kth element:
if k is even, the kth element is arr1[k//2 - 1] = arr2[k//2 - 1]
if k is odd, the kth element is min(arr1[k//2 - 1] = arr2[k//2 - 1])
Notice we need to examine if these indexes are valid before accessing
Complexity analysis:
In each iteration, k decrease to k - k // 2 until 1 or one array has less elements than k // 2. Total complexity
is O(lgk) = O(lg(m+n)) in the worst case
def find_kth_smallest_in_two_sorted_2(arr1, arr2, k):
n1, n2 = len(arr1), len(arr2)
a_left, b_left = 0, 0
while k > 0:
if a_left >= n1:
return arr2[b_left + k - 1]
elif b_left >= n2:
return arr1[a_left + k - 1]
elif k == 1:
return min(arr1[a_left], arr2[b_left])
idx = k // 2 - 1
a_val = float('inf') if idx + a_left >= n1 else arr1[idx + a_left]
b_val = float('inf') if idx + b_left >= n2 else arr2[idx + b_left]
if a_val >= b_val:
b_left += k // 2
elif a_val < b_val:
a_left += k // 2
k -= k // 2
Like this:
Like Loading...