I'm working on a sorting/ranking algorithm that works with quite large number of items and I need to implement the following algorithm in an efficient way to make it work: <hr> There are two lists of numbers. They are equally long, about 100-500 thousand items. From this I need to find the n-th biggest product between these lists, ie. if you create a matrix where on top you have one list, on the side you have the other one and each cell is the product of the number above and the number on the side. Example: The lists are <code>A=[1, 3, 4]</code> and <code>B=[2, 2, 5]</code>. Then the products are <code>[2, 2, 5, 6, 6, 15, 8, 8, 20]</code>. If I wanted the 3rd biggest from that it would be 8. The naive solution would be to simply generate those numbers, sort them and then select the n-th biggest. But that is <code>O(m^2 * log m^2)</code> where m is the number of elements in the small lists, and that is just not fast enough. I think what I need is to first sort the two small lists. That is <code>O(m * log m)</code>. Then I know for sure that the biggest one A[0]*B[0]. Second biggest one is either A[0]*B[1] or A[1]*B[0], ... I feel like this could be done in <code>O(f(n))</code> steps, independent of the size of the matrix. But I can't figure out an efficient way to do this part. <hr> Edit: There was an answer that got deleted, which suggested to remember position in the two sorted sets and then look at A[a]*B[b+1] and A[a+1]*B[b], returning the bigger one and incrementing a/b. I was going to post this comment before it got deleted: <blockquote> This won't work. Imagine two lists A=B=[3,2,1]. This will give you matrix like [9,6,3 ; 6,4,2 ; 3,2,1]. So you start at (0,0)=9, go to (0,1)=6 and then the choice is (0,2)=3 or (1,1)=4. However, this will miss the (1,0)=6 which is bigger then both. So you can't just look to the two neighbors but you have to backtrack. </blockquote>

I think it can be done in <code>O(n log n + n log m)</code>. Here's a sketch of my algorithm, which I think will work. It's a little rough. <ol> <li>Sort A descending. (takes <code>O(m log m)</code>)</li> <li>Sort B descending. (takes <code>O(m log m)</code>)</li> <li>Let <code>s</code> be <code>min(m, n)</code>. (takes <code>O(1)</code>)</li> <li>Create <code>s</code> lazy sequence iterators <code>L[0]</code> through <code>L[s-1]</code>. <code>L[i]</code> will iterate through the <code>s</code> values <code>A[i]*B[0]</code>, <code>A[i]*B[1]</code>, ..., <code>A[i]*B[s-1]</code>. (takes <code>O(s)</code>)</li> <li>Put the iterators in a priority queue <code>q</code>. The iterators will be prioritized according to their current value. (takes <code>O(s)</code> because initially they are already in order)</li> <li>Pull <code>n</code> values from <code>q</code>. The last value pulled will be the desired result. When an iterator is pulled, it is re-inserted in <code>q</code> using its next value as the new priority. If the iterator has been exhausted, do not re-insert it. (takes <code>O(n log s)</code>)</li> </ol> In all, this algorithm will take <code>O(m log m + (s + n)log s)</code>, but <code>s</code> is equal to either <code>m</code> or <code>n</code>.

Finding n-th biggest product in a large matrix of numbers, fast

Tags:

performance

language-agnostic

algorithm

sorting

search

I'm working on a sorting/ranking algorithm that works with quite large number of items and I need to implement the following algorithm in an efficient way to make it work:

There are two lists of numbers. They are equally long, about 100-500 thousand items. From this I need to find the n-th biggest product between these lists, ie. if you create a matrix where on top you have one list, on the side you have the other one and each cell is the product of the number above and the number on the side.

Example: The lists are A=[1, 3, 4] and B=[2, 2, 5]. Then the products are [2, 2, 5, 6, 6, 15, 8, 8, 20]. If I wanted the 3rd biggest from that it would be 8.

The naive solution would be to simply generate those numbers, sort them and then select the n-th biggest. But that is O(m^2 * log m^2) where m is the number of elements in the small lists, and that is just not fast enough.

I think what I need is to first sort the two small lists. That is O(m * log m). Then I know for sure that the biggest one A[0]*B[0]. Second biggest one is either A[0]*B[1] or A[1]*B[0], ...

I feel like this could be done in O(f(n)) steps, independent of the size of the matrix. But I can't figure out an efficient way to do this part.

Edit: There was an answer that got deleted, which suggested to remember position in the two sorted sets and then look at A[a]*B[b+1] and A[a+1]*B[b], returning the bigger one and incrementing a/b. I was going to post this comment before it got deleted:

This won't work. Imagine two lists A=B=[3,2,1]. This will give you matrix like [9,6,3 ; 6,4,2 ; 3,2,1]. So you start at (0,0)=9, go to (0,1)=6 and then the choice is (0,2)=3 or (1,1)=4. However, this will miss the (1,0)=6 which is bigger then both. So you can't just look to the two neighbors but you have to backtrack.

275

asked May 17 '12 13:05

Timmy

1 Answers

I think it can be done in O(n log n + n log m). Here's a sketch of my algorithm, which I think will work. It's a little rough.

Sort A descending. (takes O(m log m))
Sort B descending. (takes O(m log m))
Let s be min(m, n). (takes O(1))
Create s lazy sequence iterators L[0] through L[s-1]. L[i] will iterate through the s values A[i]*B[0], A[i]*B[1], ..., A[i]*B[s-1]. (takes O(s))
Put the iterators in a priority queue q. The iterators will be prioritized according to their current value. (takes O(s) because initially they are already in order)
Pull n values from q. The last value pulled will be the desired result. When an iterator is pulled, it is re-inserted in q using its next value as the new priority. If the iterator has been exhausted, do not re-insert it. (takes O(n log s))

In all, this algorithm will take O(m log m + (s + n)log s), but s is equal to either m or n.

183

answered Sep 29 '22 20:09

recursive

Related questions
                            
                                severside processing vs client side processing + ajax?
                            
                                How is the performance of entity framework 4 vs entity framework 3.5?
                            
                                Ever any performance different between Java >> and >>> right shift operators?
                            
                                SIMD/SSE newbie: simple image filtering
                            
                                Does the placement of a try-catch block affect performance?
                            
                                Loading Javascript : HTTP Requests -v- Asynchronous Loading
                            
                                HTML5 video performance
                            
                                Image processing on the GPU with OpenGL, GLSL and Framebuffer Objects - questions about performance
                            
                                How to make boost::serialization deserialization faster?
                            
                                SQL Server 2008 indexes - performance gain on queries vs. loss on INSERT/UPDATE
                            
                                Move rectangles so they don't overlap
                            
                                Flush InnoDB cache
                            
                                What are the performance implications of using require_dependency in Rails 3 applications?
                            
                                Data structure with O(1) insertion time and O(log m) lookup?
                            
                                Best and fastest way to round UIView's corners?
                            
                                MySQL & nested set: slow JOIN (not using index)
                            
                                Application.Run is the top CPU consuming function in my application; what can I optimize?
                            
                                Javascript: why the access to closure variable might be slow
                            
                                Multi-threaded performance and profiling
                            
                                Log the Bandwidth usage and request time in ASP.NET 4.0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With