Generating m distinct random numbers in the range [0..n-1]

Tags:

I have two methods of generating m distinct random numbers in the range [0..n-1]

Method 1:

//C++-ish pseudocode int result[m]; for(i = 0; i < m; ++i) {    int r;    do    {       r = rand()%n;    }while(r is found in result array at indices from 0 to i)    result[i] = r;    }

Method 2:

//C++-ish pseudocode int arr[n]; for(int i = 0; i < n; ++i)     arr[i] = i; random_shuffle(arr, arr+n); result = first m elements in arr;

The first method is more efficient when n is much larger than m, whereas the second is more efficient otherwise. But "much larger" isn't that strict a notion, is it? :)

Question: What formula of n and m should I use to determine whether method1 or method2 will be more efficient? (in terms of mathematical expectation of the running time)

923

asked Aug 04 '11 19:08

Armen Tsirunyan

2 Answers

Pure mathematics:
Let's calculate the quantity of rand() function calls in both cases and compare the results:

Case 1: let's see the mathematical expectation of calls on step i = k, when you already have k numbers chosen. The probability to get a number with one rand() call is equal to p = (n-k)/n. We need to know the mathematical expectation of such calls quantity which leads to obtaining a number we don't have yet.

The probability to get it using 1 call is p. Using 2 calls - q * p, where q = 1 - p. In general case, the probability to get it exactly after n calls is (q^(n-1))*p. Thus, the mathematical expectation is
Sum[ n * q^(n-1) * p ], n = 1 --> INF. This sum is equal to 1/p (proved by wolfram alpha).

So, on the step i = k you will perform 1/p = n/(n-k) calls of the rand() function.

Now let's sum it overall:

Sum[ n/(n - k) ], k = 0 --> m - 1 = n * T - the number of rand calls in method 1.
Here T = Sum[ 1/(n - k) ], k = 0 --> m - 1

Case 2:

Here rand() is called inside random_shuffle n - 1 times (in most implementations).

Now, to choose the method, we have to compare these two values: n * T ? n - 1.
So, to choose the appropriate method, calculate T as described above. If T < (n - 1)/n it's better to use the first method. Use the second method otherwise.

114

answered Oct 02 '22 16:10

Grigor Gevorgyan

Check the Wikipedia description of the original Fisher-Yates algorithm. It advocates using essentially your method 1 for up to n/2, and your method 2 for the remainder.

answered Oct 02 '22 17:10

Mark Ransom

Related questions
                            
                                lifetime of a std::initializer_list return value
                            
                                Hinnant's short_alloc and alignment guarantees
                            
                                Simpler way to create a C++ memorystream from (char*, size_t), without copying the data?
                            
                                Testing C++ code for endian-independence
                            
                                Semicolon on a function parameters
                            
                                Visual Studio 2012 different values Release/Debug mode
                            
                                Get std::fstream failure error messages and/or exceptions
                            
                                Why is MinGW very slow?
                            
                                How do smart pointers choose between delete and delete[]?
                            
                                Questions about Hinnant's stack allocator
                            
                                std::shared_ptr Exception Safety
                            
                                Do I need to cast to unsigned char before calling toupper(), tolower(), et al.?
                            
                                c++ Initializing a struct with an array as a member
                            
                                is there in C# a method for List<T> like resize in c++ for vector<T>
                            
                                Execute CMD commands using C++
                            
                                A Good C++ Library for SOAP
                            
                                Error with address of parenthesized member function
                            
                                How do i write a pointer-to-member-function with std::function?
                            
                                Meaningful stack traces for address sanitizer in GCC
                            
                                Universal less<> for pointers in C++ standard

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Generating m distinct random numbers in the range [0..n-1]

Tags:

c++

performance

algorithm

random

Armen Tsirunyan

People also ask

2 Answers

Grigor Gevorgyan

Mark Ransom

Recent Activity

Donate For Us