best way of random sampling c++

Question

I have a data vector A of length 1 Million (0 to 1 Million). From A, I want to create the vector B (whose length is lets say just 10% of A) containing indexes of A. Those indexes are randomly taken sample indexes from A. I tried using srand() and random_shuffle, is this a good way to extracting samples for very huge vectors? Can anyone plz suggest me.

  std::vector <int> samplingIndex;

   for (int i = 0; i < 1000000; ++i) { samplingIndex.push_back(i); } 
   std::srand(50); 
   std::random_shuffle(samplingIndex.begin(), samplingIndex.end());

After this I take the first 10% indexes from samplingIndex to make B.

Jarod42 · Accepted Answer

You may use Fisher–Yates shuffle and then avoid to construct the huge array a:

Something like:

// Fisher–Yates_shuffle
std::vector<int> FisherYatesShuffle(std::size_t size,
                                    std::size_t max_size,
                                    std::mt19937& gen)
{
    assert(size <= max_size);
    std::vector<int> res(size);

    for (std::size_t i = 0; i != max_size; ++i) {
        std::uniform_int_distribution<> dis(0, i);
        std::size_t j = dis(gen);
        if (j < res.size()) {
            if (i < res.size()) {
                res[i] = res[j];
            }
            res[j] = i;
        }
    }
    return res;
}

Live example

best way of random sampling c++

Tags:

c++

vector

visual-c++

Hum

1 Answers

Jarod42

Recent Activity

Donate For Us

best way of random sampling c++

Tags:

c++

vector

visual-c++

Hum

1 Answers

Jarod42

Related questions

Recent Activity

Donate For Us