Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with M occurrences among N

Question I've been given at the job interview. I was close to the solution but did not solve it unfortunately.

Assume we have a sequence that contains N numbers of type long. And we know for sure that among this sequence each number does occur exactly n times except for the one number that occurs exactly m times (0 < m < n). How do we find that number with O(N) operations and O(1) additional memory?

For the simplest case (when n = 2 and m = 1) all that we should do is just to perform accumulative xor on every number in sequence. The result will be equal to the desired number. But i'm stuck while trying to deal with arbitrary m and n.

I would appreciate an actual C++ solution.


EDIT: We know actual values of m and n a priori.

Example. We know that n = 3 and m = 2. The sequence (N = 8) is

5 11 5 2 11 5 2 11

And the right answer is 2 in this particular case because it occurs only twice.

like image 486
Keynslug Avatar asked Oct 18 '10 21:10

Keynslug


3 Answers

You do 64 sums, one for each bit, for each of the sums you calculate sum mod n, this calculation return m for each bit that should to be set in the result, and 0 for each bit that should not be set.

Example:
n = 3, m = 2. list = [5 11 5 2 11 5 2 11]

              5  11   5   2  11   5   2  11
sum of bit 0: 1 + 1 + 1 + 0 + 1 + 1 + 0 + 1 = 6   6 % 3 = 0
sum of bit 1: 0 + 1 + 0 + 1 + 1 + 0 + 1 + 1 = 5   5 % 3 = 2
sum of bit 2: 1 + 0 + 1 + 0 + 0 + 1 + 0 + 0 = 3   3 % 3 = 0
sum of bit 3: 0 + 1 + 0 + 0 + 1 + 0 + 0 + 1 = 3   3 % 3 = 0

So only bit 1 is set, which means the result is 2.

Optimizing implementation:
(Tricks and considerations that are also useful for real problems)
It is worth noting that when iterating over an array, execution speed will to some extend be limited by memory access, if one need to perform multiple operations with each element it is usually fastest to perform them all on one element at a time, thus the processor only need to load each element from memory once. Interesting blog post on memory and cache.

It is possible to sum multiple bits in a single integer, rather than applying 64 different bitmasks to get each bit on it's own one could for instance use only 4 bitmasks that each extract 16 bits with 3 bits of space between each, as long as no overflow occur a normal addition operation will work exactly as if dealing with 16 4-bit integers, thus this method will work for 15 numbers. After 15 numbers have been processed this way the results must be added to a storage capable of holding bigger integers (could be 8 64-bit integers each holding 8 8-bit integers, they must of course in turn be emptied into bigger integers etc.).
The result is that rather than for each value doing 64 bitmasks, 63 bitshifts and 64 additions one need only do 4 bitmasks, 3 bitshifts and 4 additions, plus for every 15 values 8 bitmasks, 4 bitshifts and 8 additions, plus for every 255 values 16 bitmasks, 8 bitshifts and 16 additions etc.

Visualization:
(Summing 4x4-bit integers using 16-bit integers)

1000 1000 1000 1000 +
1000 0000 0000 1000 +
0000 0000 0000 1000 +
1000 1000 0000 0000 +
1000 0000 1000 0000 +
0000 0000 1000 1000 =
0010 0100 1100 0010

The result is the same whether you consider this to be 4 columns of 4-bit integers or 1 column of 16-bit integers, this is only true as long as long the 4-bit integers do not overflow.

like image 81
aaaaaaaaaaaa Avatar answered Oct 30 '22 17:10

aaaaaaaaaaaa


edit) Okay, this method isn't as sound as I initially thought. eBusiness's solution is much simpler and works correctly for cases such as n=4, m=2.

We can generalise the xor method to work with arbitrary m and n. We first need to pick a base b such that gcd(n, b) = b, and gcd(m, b) < b. As odd n/even m pairs satisfy this for base 2, the standard binary xor works for these pairs.

Firstly we define (a^^n) to mean (a^a^...^a) for n a's, with generalised xor of base b. For example, with standard binary xor, a^^2 = 0.

We need to define our generalised xor. The properties we desire are basically the same as of addition (commutativity, associativity), and we need a^^b = 0. The obvious solution is (x^y) = (x+y)%b for each digit in the base b representation (convince yourself this works, and is the same as binary xor for base 2). Then, we simply apply this to all numbers in the sequence, and end up with result = s^^(m%b), where s is the special number.
Lastly, we need to revert our 'xor'ed base b number to the expected number. We can simply compute i^^(m%b) for i=0..b-1, and then look up which value we have in s for each digit in result.

This algorithm is be O(N). For each number in the list, we have a constant number of operations to do, because we have at most 64 digits. Reverting at the end is at worst O(N) for large b. We can do this last step in constant space by computing i^^(m%b) for all i for each digit (again, we have a constant number of digits).


Example:

n = 3, m = 2. list = [5 11 5 2 11 5 2 11]

First we choose a base b. Obviously we have to choose base 3.

The xor table for reference:

  0|1|2
0|0|1|2
1|1|2|0
2|2|0|1

The computation:

  5     11      5      2     11      5      2     11
0^0=0. 0^1=1. 1^0=1. 1^0=1. 1^1=2. 2^0=2. 2^0=2. 2^1=0.
0^1=1. 1^0=1. 1^1=2. 2^0=2. 2^0=2. 2^1=0. 0^0=0. 0^0=0.
0^2=2. 2^2=1. 1^2=0. 0^2=2. 2^2=1. 1^2=0. 0^2=2. 2^2=1.

m % b = 2.

Thus we have s^^2 = [001]. We generate a table of i^^2 for each digit i, and then do a reverse lookup.

   i | 0 | 1 | 2 |
i^^2 | 0 | 2 | 1 |

0 -> 0
0 -> 0
1 -> 2

We lastly convert our result back into binary/decimal. [002] = 2.

like image 24
Nabb Avatar answered Oct 30 '22 17:10

Nabb


You simplest case can be more general, you can use the same technique you described for an odd number m and even number n.

like image 33
grokus Avatar answered Oct 30 '22 16:10

grokus