I'm working on a sorting/ranking algorithm that works with quite large number of items and I need to implement the following algorithm in an efficient way to make it work:
There are two lists of numbers. They are equally long, about 100-500 thousand items. From this I need to find the n-th biggest product between these lists, ie. if you create a matrix where on top you have one list, on the side you have the other one and each cell is the product of the number above and the number on the side.
Example: The lists are A=[1, 3, 4]
and B=[2, 2, 5]
. Then the products are [2, 2, 5, 6, 6, 15, 8, 8, 20]
. If I wanted the 3rd biggest from that it would be 8.
The naive solution would be to simply generate those numbers, sort them and then select the n-th biggest. But that is O(m^2 * log m^2)
where m is the number of elements in the small lists, and that is just not fast enough.
I think what I need is to first sort the two small lists. That is O(m * log m)
. Then I know for sure that the biggest one A[0]*B[0]. Second biggest one is either A[0]*B[1] or A[1]*B[0], ...
I feel like this could be done in O(f(n))
steps, independent of the size of the matrix. But I can't figure out an efficient way to do this part.
Edit: There was an answer that got deleted, which suggested to remember position in the two sorted sets and then look at A[a]*B[b+1] and A[a+1]*B[b], returning the bigger one and incrementing a/b. I was going to post this comment before it got deleted:
This won't work. Imagine two lists A=B=[3,2,1]. This will give you matrix like [9,6,3 ; 6,4,2 ; 3,2,1]. So you start at (0,0)=9, go to (0,1)=6 and then the choice is (0,2)=3 or (1,1)=4. However, this will miss the (1,0)=6 which is bigger then both. So you can't just look to the two neighbors but you have to backtrack.
Method 1 (Use Bubble k times) 1) Modify Bubble Sort to run the outer loop at most k times. 2) Print the last k elements of the array obtained in step 1. Like Bubble sort, other sorting algorithms like Selection Sort can also be modified to get the k largest elements.
=LARGE(IF(range=criteria,values),n) The main purpose of this formula is to find out the nth largest value. It is the value among the database which meets a specified criteria.
To find the largest element from the array, a simple way is to arrange the elements in ascending order. After sorting, the first element will represent the smallest element, the next element will be the second smallest, and going on, the last element will be the largest element of the array.
Take two variables and initiliaze them with zero. Iterate through each element of the array and compare each number against these two number. If current number is greater than maxOne then maxOne = number and maxTwo = maxOne. Otherwise if it only greater than maxTwo then we only update maxTwo with current number.
I think it can be done in O(n log n + n log m)
. Here's a sketch of my algorithm, which I think will work. It's a little rough.
O(m log m)
)O(m log m)
)s
be min(m, n)
. (takes O(1)
)s
lazy sequence iterators L[0]
through L[s-1]
. L[i]
will iterate through the s
values A[i]*B[0]
, A[i]*B[1]
, ..., A[i]*B[s-1]
. (takes O(s)
)q
. The iterators will be prioritized according to their current value. (takes O(s)
because initially they are already in order)n
values from q
. The last value pulled will be the desired result. When an iterator is pulled, it is re-inserted in q
using its next value as the new priority. If the iterator has been exhausted, do not re-insert it. (takes O(n log s)
)In all, this algorithm will take O(m log m + (s + n)log s)
, but s
is equal to either m
or n
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With