Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the count of a particular number in an infinite stream of numbers at a particular moment

I faced this problem in a recent interview:

You have a stream of incoming numbers in range 0 to 60000 and you have a function which will take a number from that range and return the count of occurrence of that number till that moment. Give a suitable Data structure/algorithm to implement this system.

My solution is:

Make an array of size 60001 pointing to bit-vectors. These bit vectors will contain the count of the incoming numbers and the incoming numbers will also be used to index into the array for the corresponding number. Bit-vectors will dynamically increase as the count gets too big to hold in them.

So, if the numbers are coming at rate 100numbers/sec then, in 1million years total numbers will be = (100*3600*24)*365*1000000 = 3.2*10^15. In the worst case where all numbers in the stream is same it will take ceil((log(3.2*10^15) / log 2) )= 52bits and if the numbers are uniformly distributed the we will have (3.2*10^15) / 60001 = 5.33*10^10 number of occurrences for each number which will require total of 36 bits for each numbers. So, assuming 4byte pointers we need (60001 * 4)/1024 = 234 KB memory for the array and for the case with same numbers, we need bit vector size = 52/8 = 7.5 bytes which is still around 234KB. And for the other case we need (60001 * 36 / 8)/1024 = 263.7 KB for bit vector totaling about 500KB. So, it is very much feasible to do this with ordinary PC and memory.

But the interviewer said, as it is infinite stream it will eventually overflow and gave me hint like how can we do this if there were many PCs and we could pass messages between them or think about file system etc. But I kept thinking if this solution was not working then, others would too. Needless to say, I did not get the job.

How to do this problem with less memory? Can you think of an alternative approach (using network of PCs may be)?

like image 514
Nilanjan Basu Avatar asked Oct 08 '22 02:10

Nilanjan Basu


1 Answers

A formal model for the problem could be the following.

We want to know if it exists a constant space bounded Turing machine such that, in any given time it recognizes the language L of all couples (number,number of occurrences so far). This means that all correct couples will be accepted and all incorrect couples will be rejected.

As a corollary of the Theorem 3.13 in Hopcroft-Ullman we know that every language recognized by a constant space bounded machine is regular.

It can be proven by using the pumping lemma for regular languages that the language described above is not a regular language. So you can't recognize it with a constant space bounded machine.

like image 91
Vitaly Olegovitch Avatar answered Oct 13 '22 11:10

Vitaly Olegovitch