Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to intersect two sorted arrays the fastest possible way?

I have two huge sorted arrays (~100K items each). I need to intersect them really fast. Now I am doing it the standard way:

  • if a[i] < b[j] then i++
  • if a[i] > b[j] then j++
  • else: add a[i] to intersection, i++, j++

But it takes too long (~ 350 microseconds) to complete, which results in quite poor overall performance. Is there a way to do it quicker?

P.S. Intersection size is not larger that 1000 items (on average) and I need only 25 to 100 of them.

like image 908
Denis Kulagin Avatar asked Oct 18 '22 16:10

Denis Kulagin


1 Answers

Running through 2 100k arrays in parallel requires about 200k comparisons. You are currently completing it in 350 microseconds = 350k nanoseconds. So your per comparison time is just under 2 nanoseconds. If your CPU is around 4 GHz, then that's 8 clock cycles.

That's good. You could try to be sophisticated, detecting runs and so on, but probably you'll hurt yourself more with pipeline stalls than you'll save work.

There are only 2 ways to speed this up. Do less work, or add more workers.

You've indicating that doing less work is feasible, which is why Tamas Hegedus suggested that. Instead of creating the intersection, create an Iterator that will return the next thing in the intersection. This will require you to rewrite the logic that uses said iterator, but you'll do under 10% of the current computation. Which is going to be close to 10x faster.

As for adding workers, you'll want to divide the work among worker threads and keep them from stepping all over each other. For k small (no larger than your number of CPUs!), with a logarithmic amount of work in the size of your arrays, you can do a quickselect to find k-1 values that break the combined arrays into k even chunks (oops Adapt http://www.geeksforgeeks.org/median-of-two-sorted-arrays/ instead of doing a quickselect...), and the indexes of those values in each array. This creates k problems of even difficulty, each of which can be specified as 4 numbers. Spin up k threads and let each get a chunk of the answer. This will be roughly k times faster than what you are currently doing.

At the cost of a lot more effort, these approaches can be combined. What you do is have the iterator create, say, 4 workers and hand out blocks to each. When you call iter.next() the iterator will hand you a next value that it has if it has one. If it doesn't have one it will wait for the worker that is producing its next block to complete, grab that block, hand that worker another block if one is ready, and then hand out the first value in that block. You can play with the block size. You want it large enough that the CPU does a good job of figuring out that it should be streaming from RAM to CPU caches, and doesn't think that there is synchronization contention between threads.

My guess given the size and synchronization constraints, the hybrid approach won't be much of a win, if any, over the iterator approach. But if you're really desperate, you can try it.

like image 199
btilly Avatar answered Oct 21 '22 05:10

btilly