Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding kth smallest number from n sorted arrays

Tags:

So, you have n sorted arrays (not necessarily of equal length), and you are to return the kth smallest element in the combined array (i.e the combined array formed by merging all the n sorted arrays)

I have been trying it and its other variants for quite a while now, and till now I only feel comfortable in the case where there are two arrays of equal length, both sorted and one has to return the median of these two. This has logarithmic time complexity.

After this I tried to generalize it to finding kth smallest among two sorted arrays. Here is the question on SO. Even here the solution given is not obvious to me. But even if I somehow manage to convince myself of this solution, I am still curious as to how to solve the absolute general case (which is my question)

Can somebody explain me a step by step solution (which again in my opinion should take logarithmic time i.e O( log(n1) + log(n2) ... + log(nN) where n1, n2...nN are the lengths of the n arrays) which starts from the more specific cases and moves on to the more general one?

I know similar questions for more specific cases are there all over the internet, but I haven't found a convincing and clear answer.

Here is a link to a question (and its answer) on SO which deals with 5 sorted arrays and finding the median of the combined array. The answer just gets too complicated for me to able to generalize it.

Even clean approaches for the more specific cases (as I mentioned during the post) are welcome.

PS: Do you think this can be further generalized to the case of unsorted arrays?

PPS: It's not a homework problem, I am just preparing for interviews.

like image 961
Sushant Avatar asked Jan 06 '12 04:01

Sushant


People also ask

How do you find the kth smallest element in a sorted array?

Algorithm. The following steps are involved in finding the k t h k^{th} kth smallest element using sorting. Sort the given array in ascending order using a sorting algorithm like merge sort, bubble sort, or heap sort. Return the element at index k − 1 k - 1 k−1 in the sorted array.

What is used to find the kth smallest element of a list of numbers?

The basic idea here is to create a min-heap of all n elements and then extract the minimum element K times. The last element to be extracted will be the Kth smallest element. Extract the minimum elements K-1 times, i.e. delete the root and perform heapify operation K times.

Which of the following method is used for finding kth smallest element?

This is how you find the kth smallest element using QuickSort (well not entirely quicksort, I will tell why). In quicksort you choose a random pivot element and divide the entire array into two and recursively sort the left and the right subarrays to form the entire sorted array.


1 Answers

This doesn't generalize the links, but does solve the problem:

  1. Go through all the arrays and if any have length > k, truncate to length k (this is silly, but we'll mess with k later, so do it anyway)
  2. Identify the largest remaining array A. If more than one, pick one.
  3. Pick the middle element M of the largest array A.
  4. Use a binary search on the remaining arrays to find the same element (or the largest element <= M).
  5. Based on the indexes of the various elements, calculate the total number of elements <= M and > M. This should give you two numbers: L, the number <= M and G, the number > M
  6. If k < L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the bottom halves).
  7. If k > L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the top halves, and search for element (k-L).

When you get to the point where you only have one element per array (or 0), make a new array of size n with those data, sort, and pick the kth element.

Because you're always guaranteed to remove at least half of one array, in N iterations, you'll get rid of half the elements. That means there are N log k iterations. Each iteration is of order N log k (due to the binary searches), so the whole thing is N^2 (log k)^2 That's all, of course, worst case, based on the assumption that you only get rid of half of the largest array, not of the other arrays. In practice, I imagine the typical performance would be quite a bit better than the worst case.

like image 144
Michael Avatar answered Oct 03 '22 15:10

Michael