Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programming Pearls: find one integer appears at least twice

It's in the section 2.6 and problem 2, the original problem is like this:

"Given a sequential file containing 4,300,000,000 32-bit integers, how can you find one that appears at least twice?"

My question toward this exercise is that: what is the tricks of the above problem and what kind of general algorithm category this problem is in?

like image 754
Haiyuan Zhang Avatar asked Apr 02 '11 06:04

Haiyuan Zhang


1 Answers

Create a bit array of length 2^32 bits (initialize to zero), that would be about 512MB and will fit into RAM on any modern machine.

Start reading the file, int by int, check bit with the same index as the value of the int, if the bit is set you have found a duplicate, if it is zero, set to one and proceed with the next int from the file.

The trick is to find a suitable data structure and algorithm. In this case everything fits into RAM with a suitable data structure and a simple and efficient algorithm can be used.
If the numbers are int64 you need to find a suitable sorting strategy or make multiple passes, depending on how much additional storage you have available.

like image 74
Albin Sunnanbo Avatar answered Sep 28 '22 05:09

Albin Sunnanbo