Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating m distinct random numbers in the range [0..n-1]

I have two methods of generating m distinct random numbers in the range [0..n-1]

Method 1:

//C++-ish pseudocode int result[m]; for(i = 0; i < m; ++i) {    int r;    do    {       r = rand()%n;    }while(r is found in result array at indices from 0 to i)    result[i] = r;    } 

Method 2:

//C++-ish pseudocode int arr[n]; for(int i = 0; i < n; ++i)     arr[i] = i; random_shuffle(arr, arr+n); result = first m elements in arr; 

The first method is more efficient when n is much larger than m, whereas the second is more efficient otherwise. But "much larger" isn't that strict a notion, is it? :)

Question: What formula of n and m should I use to determine whether method1 or method2 will be more efficient? (in terms of mathematical expectation of the running time)

like image 923
Armen Tsirunyan Avatar asked Aug 04 '11 19:08

Armen Tsirunyan


People also ask

How do you generate a random number between 0 and 1?

The random. uniform() function is perfectly suited to generate a random number between the numbers 0 and 1, as it is utilized to return a random floating-point number between two given numbers specified as the parameters for the function.

How do you generate a distinct random number in C++?

Method 2: //C++-ish pseudocode int arr[n]; for(int i = 0; i < n; ++i) arr[i] = i; random_shuffle(arr, arr+n); result = first m elements in arr; The first method is more efficient when n is much larger than m, whereas the second is more efficient otherwise.


2 Answers

Pure mathematics:
Let's calculate the quantity of rand() function calls in both cases and compare the results:

Case 1: let's see the mathematical expectation of calls on step i = k, when you already have k numbers chosen. The probability to get a number with one rand() call is equal to p = (n-k)/n. We need to know the mathematical expectation of such calls quantity which leads to obtaining a number we don't have yet.

The probability to get it using 1 call is p. Using 2 calls - q * p, where q = 1 - p. In general case, the probability to get it exactly after n calls is (q^(n-1))*p. Thus, the mathematical expectation is
Sum[ n * q^(n-1) * p ], n = 1 --> INF. This sum is equal to 1/p (proved by wolfram alpha).

So, on the step i = k you will perform 1/p = n/(n-k) calls of the rand() function.

Now let's sum it overall:

Sum[ n/(n - k) ], k = 0 --> m - 1 = n * T - the number of rand calls in method 1.
Here T = Sum[ 1/(n - k) ], k = 0 --> m - 1

Case 2:

Here rand() is called inside random_shuffle n - 1 times (in most implementations).

Now, to choose the method, we have to compare these two values: n * T ? n - 1.
So, to choose the appropriate method, calculate T as described above. If T < (n - 1)/n it's better to use the first method. Use the second method otherwise.

like image 114
Grigor Gevorgyan Avatar answered Oct 02 '22 16:10

Grigor Gevorgyan


Check the Wikipedia description of the original Fisher-Yates algorithm. It advocates using essentially your method 1 for up to n/2, and your method 2 for the remainder.

like image 21
Mark Ransom Avatar answered Oct 02 '22 17:10

Mark Ransom