How to generate a list of ascending random integers

Tags:

I have an external collection containing n elements that I want to select some number (k) of them at random, outputting the indices of those elements to some serialized data file. I want the indices to be output in strict ascending order, and for there to be no duplicates. Both n and k may be quite large, and it is generally not feasible to simply store entire arrays in memory of that size.

The first algorithm I came up with was to pick a random number r[0] from 1 to n-k... and then pick a successive random numbers r[i] from r[i-1]+1 to n-k+i, only needing to store two entries for 'r' at any one time. However, a fairly simple analysis reveals the the probability for selecting small numbers is inconsistent with what could have been if the entire set was equally distributed. For example, if n was a billion and k was half a billion, the probability of selecting the first entry with the approach I've just described is very tiny (1 in half a billion), where in actuality since half of the entries are being selected, the first should be selected 50% of the time. Even if I use external sorting to sort k random numbers, I would have to discard any duplicates, and try again. As k approaches n, the number of retries would continue to grow, with no guarantee of termination.

I would like to find a O(k) or O(k log k) algorithm to do this, if it is at all possible. The implementation language I will be using is C++11, but descriptions in pseudocode may still be helpful.

433

asked Jun 07 '16 22:06

markt1964

2 Answers

If in practice k has the same order of magnitude as n, perhaps very straightforward O(n) algorithm will suffice:

assert(k <= n);
std::uniform_real_distribution rnd;
for (int i = 0; i < n; i++) {
    if (rnd(engine) * (n - i) < k) {
        std::cout << i << std::endl;
        k--;
    }
}

It produces all ascending sequences with equal probability.

170

answered Sep 26 '22 01:09

Vlad Shcherbina

You can solve this recursively in O(k log k) if you partition in the middle of your range, and randomly sample from the hypergeometric probability distribution to choose how many values lie above and below the middle point (i.e. the values of k for each subsequence), then recurse for each:

int sample_hypergeometric(int n, int K, int N) // samples hypergeometric distribution and
// returns number of "successes" where there are n draws without replacement from
// a population of N with K possible successes.
// Something similar to scipy.stats.hypergeom.rvs in Python.
// In this case, "success" means the selected value lying below the midpoint. 
{
     std::default_random_engine generator;
     std::uniform_real_distribution<double> distribution(0.0,1.0);

     int successes = 0;
     for(int trial = 0; trial < n; trial++)
     {
         if((int)(distribution(generator) * N) < K)
         {
             successes++;
             K--;
         }
         N--;
     }
     return successes;
}

select_k_from_n(int start, int k, int n)
{
    if(k == 0)
        return;
    if(k == 1)
    {
        output start + random(1 to n);
        return;
    }

    // find the number of results below the mid-point:
    int k1 = sample_hypergeometric(k, n >> 1, n);
    select_k_from_n(start, k1, n >> 1);
    select_k_from_n(start + (n >> 1), k - k1, n - (n >> 1));
}

Sampling from the binomial distribution could also be used to approximate the hypergeometric distribution with p = (n >> 1) / n, rejecting samples where k1 > (n >> 1).

answered Sep 24 '22 01:09

samgak

Related questions
                            
                                Is it possible to alter #include filenames using #define?
                            
                                qt - setText outside of paint events not ok?
                            
                                Does template constructor with universal reference hide move construtor?
                            
                                C++ pass by reference local variable
                            
                                Assigning Rvalue returned from function to another Rvalue
                            
                                CMake testing a library, header location issue
                            
                                Include mpi.h in Clion
                            
                                Include Path Directory
                            
                                Time performance when permuting and casting double to float
                            
                                E_FAIL or S_FALSE, which is more appropriate to represent no such attribute?
                            
                                Why not always use std::forward?
                            
                                C++14 How often should I use constexpr?
                            
                                constexpr - function cannot be used in a constant expression
                            
                                Type trait: Check if class have specific function (maybe inherit)
                            
                                C++ Eigen - How to combine broadcasting and elementwise operations
                            
                                c++ Read/Write class from/to binary file
                            
                                Can C++ templates be used for conditional code inclusion?
                            
                                Why is raw pointer to shared_ptr construction allowed in all cases?
                            
                                How to check if a template type is one of the types of a variant type?
                            
                                No Global variable initialization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to generate a list of ascending random integers

Tags:

c++

algorithm

sorting

random

markt1964

People also ask

2 Answers

Vlad Shcherbina

samgak

Recent Activity

Donate For Us