I have a data vector A of length 1 Million (0 to 1 Million). From A, I want to create the vector B (whose length is lets say just 10% of A) containing indexes of A. Those indexes are randomly taken sample indexes from A. I tried using srand() and random_shuffle, is this a good way to extracting samples for very huge vectors? Can anyone plz suggest me.
  std::vector <int> samplingIndex;
   for (int i = 0; i < 1000000; ++i) { samplingIndex.push_back(i); } 
   std::srand(50); 
   std::random_shuffle(samplingIndex.begin(), samplingIndex.end());
After this I take the first 10% indexes from samplingIndex to make B.
You may use Fisher–Yates shuffle and then avoid to construct the huge array a:
Something like:
// Fisher–Yates_shuffle
std::vector<int> FisherYatesShuffle(std::size_t size,
                                    std::size_t max_size,
                                    std::mt19937& gen)
{
    assert(size <= max_size);
    std::vector<int> res(size);
    for (std::size_t i = 0; i != max_size; ++i) {
        std::uniform_int_distribution<> dis(0, i);
        std::size_t j = dis(gen);
        if (j < res.size()) {
            if (i < res.size()) {
                res[i] = res[j];
            }
            res[j] = i;
        }
    }
    return res;
}
Live example
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With