Full Analysis of Finding K Smallest in Sorted/Unsorted Arrays

Problem Statement:

Giving m unsorted arrays, return the k smallest elements

Method #1: Min-heap

Method 1: Using bottom-up heap construction to build a min-heap, then pop the smallest element n times

Complexity analysis:
    Time to construct min-heap: O(n)
    Time to extract min k times: O(k * lgn)
    Total time complexity: O(n + k * lgn)
    Total space complexity: O(1)
def k_smallest_1(array, k):
    """
    input: int[] array, int k
    return: int[]

    Notice:
    1. the children for node i is 2*i+1 and 2*i+2
    """

    def swap(i, j):
        array[i], array[j] = array[j], array[i]

    def heapify(i):
        left, right = 2 * i + 1, 2 * i + 2
        if left >= len(array):
            return

        index = left if right >= len(array) or array[left] < right else right
        if array[index] < array[i]:
            swap(i, index)
            heapify(index)

    for i in range(len(array)):
        heapify(len(array) - i - 1)

    res = []
    for i in range(k):
        res.append(array.pop(0))
        heapify(0)

    return res

Method #2: Max-heap

Method 2: Using a size k max-heap, and loop through the rest n-k element, if element e is smaller than the 
largest number in the heap, extract the max element and insert element e into the heap

Complexity analysis:
    Time to construct a size k max-heap: O(k)
    Time to extract-max and insert n-k times: O((n-k) * lgk)
    Time to sort the max-heap by extract min k times= O(k * lgk)
    Total time complexity: O(k + (n-k) * lgk + k * lgk) = O((n * lgk)
def k_smallest_2(array, k):
    """
    Notice:
    1. we can use the array[:k] as our max-heap so when we do the heapify, the boundary should be k instead of n
    """

    def swap(i, j):
        array[i], array[j] = array[j], array[i]

    # heapify in max-heap
    def heapify(i):
        left, right = 2 * i + 1, 2 * i + 2
        if left >= k:
            return

        index = left if right >= k or array[left] > right else right
        if array[index] > array[i]:
            swap(i, index)
            heapify(index)

    for i in range(k):
        heapify(k - i - 1)

    for i in range(k, len(array)):
        if array[i] < array[0]:
            swap(i, 0)
            heapify(0)

    return sorted(array[:k])

Method #3: Quick Select

Method 3: Using quick select to find the k th element: randomly pick a pivot point and sort the array s.t. all 
elements smaller or equal than pivot are on the left and all elements larger than pivot are on the right. Compare 
pivot's index p with k:
    1. if p == k: pivot is the kth smallest element
    2. if p > k: search the kth smallest in the array[0:p]
    3. if p < k: search the (k-p-1) in array[p+1, n] 
After we find the kth smallest element as the pivot, return a sorted array of the first k element.

Complexity analysis:
    Time to find out the kth element as pivot:
        - Assume the randomly picked pivot is always the mid point, that is, it can always separate the array into two 
equal parts. Then the total element we need to examine is O(n + n/2 + n/4 + ...) = O(n).
        - Or we can use the master theorem, f(n) = O(time to divide) + O(time to combine) = O(n/2) + O(1) = O(n) in 
average, a = # of sub-problems = 1, n/b = size of sub-problem: n/2 in average => n ** log_b(a) = O(1). This 
is case #3: total time complexity if O(n)
    Time to sort the first k elements: O(k * lgk)
    Average total time complexity: O(k * lgk + n)
def k_smallest_3(array, k):
    """
    Notice:
    1. In actually implementation, since we are using the left, right index boundary, we are actually always looking for if the pivot's index between left and right is k-1 or not.

    2. Like quick sort, in quick selection, after finding the pivot point, swap pivot with the rightmost index, perform a rainbow sort and swap it back with the left index

    """
    if k == 0:
        return []

    def swap(i, j):
        array[i], array[j] = array[j], array[i]

    def quick_select(left, right):
        pivot = random.randint(left, right)
        swap(pivot, right)
        l, r = left, right - 1
        while l <= r:
            if array[l] <= array[right]:
                l += 1
            elif array[r] > array[right]:
                r -= 1
            else:
                swap(l, r)
                l += 1
                r -= 1
        swap(l, right)
        if l == k - 1:
            return
        elif l > k - 1:
            quick_select(left, l - 1)
        else:
            quick_select(left + 1, right)

    quick_select(0, len(array) - 1)
    return sorted(array[:k])

Variant #1: Find k smallest elements in m sorted arrays

Method #4: Min heap

Method #4: Initialize a min-heap of size m containing the first element of each element. Execute extract min k times 
and after popping an element, add the next element into the heap

Complexity analysis:
    Time to construct heap: O(m)
    Time to extract-min and insert k times: O(k * lgm)
    Total time complexity: O(m + k * lgm)
def kth_smallest_1(matrix, k):
    """
    Notice:
        1. heapq.heapify() method transforms the list in-place without returned the modified list
        2. syntax for enumerate a list: for idx, item in enumerate(list)
        3. how to override comparator of an object:
            setattr(obj, "__lt__", lambda self, other: self.val <= other.val)
        4. how to make a max-heap:
            using -1 * val in a min-heap
    """
    queue = [(row[0], idx, 0) for idx, row in enumerate(matrix)]
    heapq.heapify(queue)
    heapq._heapify_max(queue)
    res = None
    for i in range(k):
        res, row_idx, col_idx = heapq.heappop(queue)
        if col_idx + 1 < len(matrix[row_idx]):
            heapq.heappush(queue, (matrix[row_idx][col_idx + 1], row_idx, col_idx + 1))
    return res

Method #5: Max-heap

Method #5: Initialize a max-heap of size k, scan through the rest (N - k) elements (N is the total number elements 
in m sorted array, if e < root, execute extract max and insert(e)

Complexity analysis:
    Time to construct max-heap: O(k)
    Time to extract max and insert N - k times: O((N - k) * lg(k))
    
This is slower than method #4 since it's a general method to find k smallest in unsorted array

Variant #2: Sorted Matrix

if m sorted array have the same size n, and the columns are also sorted. Find the kth smallest number

Method #6: Binary Search

Method #6: Since the minimum number in matrix of size m*n is matrix[0][0] and the maximum number is matrix[m-1][n-1]. We can do binary search within range [matrix[0][0], matrix[m-1][n-1]]. kth smallest is equivalent to the smallest number num in the range s.t. |elements in matrix that are less than or equal to num| >= k. 

Complexity Analysis:
    To check if a number num satisfies such quality needs only O(m+n) time: Notice if we know the number in row X-1 androw X+1 that are less or equal to num, i and j, we only need O(i-j) time to check row X. 

        row X-1   xxxxxxxxxxxxxxxxxxixxxxxxxxx
        row X     xxxxxxxxxxxxx[    ]xxxxxxxxx
        row X+1   xxxxxxxxxxxxxjxxxxxxxxxxxxxx
    
    X[j] <= X+1[j] <= num and X[i+1] >= X-1[i+1] > num => We only need to examine X[j+1:i+1] to find out the index of the largest number smaller than or equal to num. We can apply this recursively to find out the time complexity of checking all m rows is just O(n + m).

    Total time complexity = O(lg(MAX - MIN) * (m + n))

Follow-up: can we use another binary search in finding the inf(num) in X[i+1:j+1] ?
    Time complexity: O(lg n * m) worse than O(m + n)
    
Compare with method #1:
    O(lg(MAX - MIN) * (m + n)) vs O(m + k * lgm)
def kth_smallest_3(matrix, k):
    """
    Notice:
        1. we can count num of element le than num by scanning from the top right corner to the bottom left corner
    """
    m, n = len(matrix), len(matrix[0])

    def count_num_le(num):
        cnt = 0
        row_idx, col_idx = 0, n - 1  # start from row 0
        while col_idx >= 0 and row_idx < m:
            if matrix[row_idx][col_idx] <= num:
                cnt += col_idx + 1
                row_idx += 1
            else:
                col_idx -= 1
        return cnt

    left, right = matrix[0][0], matrix[m - 1][n - 1]
    while left < right - 1:
        mid = left + (right - left >> 1)
        cnt = count_num_le(mid)
        if cnt < k:
            left = mid + 1
        else:
            right = mid

    return left if count_num_le(left) >= k else right

Variant #3 (Leetcode #4) Find medium in two sorted array

What if m == 2? Find the medium, assume arr1 has n1 elements, arr2 has n2 elements and n = n1 + n2. 

Method #7: Medium Cut

Method #7: 
    Consider the definition of medium: a number that can separate the array into two equal half of size n // 2
    
    Let's say a cut i is a number between 0 to n1 that separate arr1 into two parts: LEFT = arr1[0 ~ (i - 1)] and 
RIGHT = arr1[i ~ (n1- 1)]. Let j = n // 2 - i. We can now add j more element from arr2 into LEFT s.t. LEFT is the 
combination of arr1[0 ~ (i - 1)] and arr2[0 ~ (j - 1)] where the size(LEFT) = i + j = i + n // 2 - 1 = n // 2.
    
                      LEFT             RIGHT
        arr1      0 ~ (i - 1)   |   i ~ (n1 - 1)
        arr2      0 ~ (j - 1)   |   j ~ (n2 - 1)
    
    Notice the range of i can be 0 to n1, range of j can be 0 to n2. The difference between size(LEFT) and size(
RIGHT) is at most 1 (when n is odd).
    
    For a given cut i, i is a valid cut if 0 <= j <= n2. j < 0 or j > n2 means for cut i there is no ways for us to 
add j elements form arr2 to make two equal parts.
    
    A valid cut is a medium cut if and only if max(LEFT) <= min(RIGHT), i.e. arr1[i - 1] <= arr2[j] 
and arr2[j - 1] <= arr1[i]. 
    
    Then the problem becomes finding a valid cut within range [0, n1] s.t. this cut is a medium cut. We can use 
binary search to find such cut.
    
    For any valid cut i, compare arr1[i-1] vs arr2[j] and arr2[j-1] vs arr1[i]
        1. if arr1[i-1] <= arr2[j] and arr2[j-1] <= arr1[i]:
            we found the medium
        2. if j < 0 or arr1[i-1] > arr2[j]:
            move i to the left
        3. if j > n2 or arr2[j-1] > arr1[i]:
            move i to the right
    Notice when we want to compare the value, we need to make sure that their index is valid. Special cases are when 
i = 0, i = n1, j = 0 and j = n2.

    When we find cut i successfully separate the arrays into two equal half, we output the medium by following:
        if i is odd:
            min(arr1[i], arr2[j]) is the medium: if i is odd, size(RIGHT) = size(LEFT) + 1. The minimum element 
in the RIGHT is the medium
        
        if i is even:
            (max(arr1[i - 1], arr2[j - 1]) + min(arr1[i], arr2[j])) / 2 is the medium: if i is even, 
size(LEFT) = size(RIGHT). The medium is the average element cross the cut i. Notice we also need to make 
sure these indexes are valid.
            
Complexity Analysis:
    O(min(n1, n2))
def find_medium_in_two_sorted_array(arr1, arr2):
    n1, n2 = len(arr1), len(arr2)
    n = n1 + n2
    left, right = 0, n1
    while left <= right:
        i = left + (right - left >> 1)
        j = n // 2 - i

        # compare arr1[i-1] and arr2[j] if they exist
        if j < 0 or (i > 0 and j < n2 and arr1[i - 1] > arr2[j]):
            right = i - 1
        # compare arr2[j-1] and arr1[i] if they exist
        elif j > n2 or (i < n1 and j > 0 and arr2[j - 1] > arr1[i]):
            left = i + 1
        else:
            left_max = max(arr1[i - 1] if i > 0 else float('-inf'), arr2[j - 1] if j > 0 else float('-inf'))
            right_min = min(arr1[i] if i < n1 else float('inf'), arr2[j] if j < n2 else float('inf'))
            return right_min if n & 1 else (left_max + right_min) / 2

Variant #4: Find kth smallest in two sorted array

Method #8: K-th Cut

Method #8: This is similar to Method #7. A cut i is the k-th cut is size(LEFT) = k

    Let j = k - i 
                      LEFT             RIGHT
        arr1      0 ~ (i - 1)   |   i ~ (n1 - 1)
        arr2      0 ~ (j - 1)   |   j ~ (n2 - 1)
        
    We can perform binary search to find i in range 0 ~ n1. Notice j is in range of 0 to n2 as well.
    
    Compare arr1[i-1] vs arr2[j] and arr2[j-1] vs arr1[i]:
        if j < 0 or arr1[i-1] > arr2[j]:
            move i to the left
        if j > n2 or arr2[j-1] > arr1[i]:
            move i to the right
        else
            return max(arr1[i-1], arr2[j-1])
        Notice we need to examine if these indexes are valid before accessing 
def find_kth_smallest_in_two_sorted_1(arr1, arr2, k):
    n1, n2 = len(arr1), len(arr2)
    left, right = 0, len(arr1)
    while left <= right:
        i = left + (right - left >> 1)
        j = k - i
        if j < 0 or (i > 0 and j < n2 and arr1[i - 1] > arr2[j]):
            right = i - 1
        elif j > n2 or (i < n1 and j > 0 and arr2[j - 1] > arr1[i]):
            left = i + 1
        else:
            a_left = float('-inf') if i == 0 else arr1[i - 1]
            b_left = float('-inf') if j == 0 else arr2[j - 1]
            return max(a_left, b_left)

Method #9: Binary Search

Method #9: To find the kth smallest element in the combination of arr1 and arr2, we can compare arr1[k//2 - 1] and 
arr2[k//2 - 1]: we know the total number of elements in arr1[0 : k//2] + arr2[0 : k//2] is k//2 * 2 <= k:
    if arr1[k//2 - 1] < arr2[k//2 - 1]:
        arr1[0 : k//2] is less than arr2[k//2 - 1], they can not be the kth element. Remove arr1[0: k//2] and update 
k to k - k // 2

    if arr1[k//2 - 1] < arr2[k//2 - 1]:
        arr2[0 : k//2] is less than arr1[k//2 - 1], they can not be the kth element. Remove arr2[0: k//2] and update 
k to k - k // 2

    if arr1[k//2 - 1] = arr2[k//2 - 1]:
        we found the kth element:
            if k is even, the kth element is arr1[k//2 - 1] = arr2[k//2 - 1]
            if k is odd, the kth element is min(arr1[k//2 - 1] = arr2[k//2 - 1])
    
    Notice we need to examine if these indexes are valid before accessing 
    
Complexity analysis:
    In each iteration, k decrease to k - k // 2 until 1 or one array has less elements than k // 2. Total complexity 
is O(lgk) = O(lg(m+n)) in the worst case
def find_kth_smallest_in_two_sorted_2(arr1, arr2, k):
    n1, n2 = len(arr1), len(arr2)
    a_left, b_left = 0, 0
    while k > 0:
        if a_left >= n1:
            return arr2[b_left + k - 1]
        elif b_left >= n2:
            return arr1[a_left + k - 1]
        elif k == 1:
            return min(arr1[a_left], arr2[b_left])

        idx = k // 2 - 1
        a_val = float('inf') if idx + a_left >= n1 else arr1[idx + a_left]
        b_val = float('inf') if idx + b_left >= n2 else arr2[idx + b_left]
        if a_val >= b_val:
            b_left += k // 2
        elif a_val < b_val:
            a_left += k // 2
        k -= k // 2
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: