Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set S of n numbers - have a subset with the probability of each element of S occuring in it equal

Tags:

algorithm

set

I have this problem

You are given a set S of n numbers. You must pick a subset S′ of k numbers from S such that the probability of each element of S occurring in S′ is equal (i.e., each is selected with probability k/n). You may make only one pass over the numbers. What if n is unknown?

and I even have a solution: http://www.algorithm.cs.sunysb.edu/algowiki/index.php/TADM2E_2.43

Still: I don't understand the problem text at all.

I'm given a set S of n numbers. Fine. I need to pick a subset (2^n subsets are possible) of k numbers such that the probability of each element of S occurring in S' is equal... the obvious answer to me would be to just grab the empty S' set: each element in S would have 0 probability to be in S'.

If this is not acceptable (and it should have been stated), I suppose I should count for the most-recurring element (appearing T times) in S and make every other element have exactly T instances in S' (it should still be a subset if the elements are contained in S).

I don't really understand the priority queue solution, nor the k/n probability. Can somebody help me out with this?

like image 749
Albert Avatar asked May 30 '15 13:05

Albert


1 Answers

This is a very famous problem with a resulting technique called Reservoir Sampling - a very useful algorithm for stream processing of big data. The preceding link can give you the precise setting, motivation, and explanation of the solution.

like image 120
Ami Tavory Avatar answered Nov 05 '22 09:11

Ami Tavory