Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find common elements from two very large Arrays

Tags:

c++

algorithm

There are two integer arrays ,each in very large files (size of each is larger than RAM). How would you find the common elements in the arrays in linear time.

I cant find a decent solution to this problem. Any ideas?

like image 468
Kshitij Banerjee Avatar asked Jan 23 '12 14:01

Kshitij Banerjee


1 Answers

One pass on one file build a bitmap (or a Bloom filter if the integer range is too large for a bitmap in memory).

One pass on the other file find the duplicates (or candidates if using a Bloom filter).

If you use a Bloom filter, the result is probabilistic. New passes can reduce the false positive (Bloom filters don't have false negative).

like image 118
AProgrammer Avatar answered Oct 31 '22 14:10

AProgrammer