Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

best way of random sampling c++

I have a data vector A of length 1 Million (0 to 1 Million). From A, I want to create the vector B (whose length is lets say just 10% of A) containing indexes of A. Those indexes are randomly taken sample indexes from A. I tried using srand() and random_shuffle, is this a good way to extracting samples for very huge vectors? Can anyone plz suggest me.

  std::vector <int> samplingIndex;

   for (int i = 0; i < 1000000; ++i) { samplingIndex.push_back(i); } 
   std::srand(50); 
   std::random_shuffle(samplingIndex.begin(), samplingIndex.end());

After this I take the first 10% indexes from samplingIndex to make B.

like image 810
Hum Avatar asked Oct 28 '25 10:10

Hum


1 Answers

You may use Fisher–Yates shuffle and then avoid to construct the huge array a:

Something like:

// Fisher–Yates_shuffle
std::vector<int> FisherYatesShuffle(std::size_t size,
                                    std::size_t max_size,
                                    std::mt19937& gen)
{
    assert(size <= max_size);
    std::vector<int> res(size);

    for (std::size_t i = 0; i != max_size; ++i) {
        std::uniform_int_distribution<> dis(0, i);
        std::size_t j = dis(gen);
        if (j < res.size()) {
            if (i < res.size()) {
                res[i] = res[j];
            }
            res[j] = i;
        }
    }
    return res;
}

Live example

like image 99
Jarod42 Avatar answered Oct 31 '25 01:10

Jarod42