Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is sample Not Randomly Populating my Vector?

I was experimenting with a toy sample program:

map<int, char> foo { { 1, 'a' }, { 2, 'b' }, { 3, 'c' } };
vector<pair<decltype(foo)::key_type, decltype(foo)::mapped_type>> bar(size(foo));

sample(begin(foo), end(foo), begin(bar), size(foo), mt19937{ random_device{}() });

Live Example

But bar always contains the contents of foo in order. Is this a gcc implementation problem, or am I just repeatedly getting unlucky?

like image 548
Jonathan Mee Avatar asked Mar 05 '23 11:03

Jonathan Mee


2 Answers

std::sample selects elements from the range you pass. From cppreference (emphasize mine):

Selects n elements from the sequence [first; last) such that each possible sample has equal probability of appearance, and writes those selected elements into the output iterator out. Random numbers are generated using the random number generator g.

If n is greater than the number of elements in the sequence, selects last-first elements.

I think the docs could be more clear, but returning only last-first in case the number of requested elements is greater, only makes sense if each element is selected at maximum once.

Try:

map<int, char> foo { { 1, 'a' }, { 2, 'b' }, { 3, 'c' } };
vector<pair<decltype(foo)::key_type, decltype(foo)::mapped_type>> bar(size(foo)-1);

sample(begin(foo), end(foo), begin(bar), bar.size(), mt19937{ random_device{}() });

to get two random samples out of foo.

Also note that

The algorithm is stable only if PopulationIterator meets the requirements of ForwardIterator

ie it was not just out of luck that you always got the same result.

like image 53
463035818_is_not_a_number Avatar answered Mar 18 '23 22:03

463035818_is_not_a_number


Sampling is about returning some subset of a larger population.

It is not intended to return elements in a random order, or any other order. It could, but that's not really what it's there for.

cppreference hints at ordering in this statement:

The algorithm is stable only if PopulationIterator meets the requirements of ForwardIterator

"Stable" here means it would return results in the same order as the input, thus the order is guaranteed to not be random with a ForwardIterator. Related: What is stability in sorting algorithms and why is it important?

This also makes sense, since, similar to what's been noted in the comments, to be efficient, you'll first need to figure out which elements to pick, and then go through the iterator and pick the elements, since you can only iterate from one direction to the other. Thus it should be trivial to keep the elements in the same order.

As for when you're not using a ForwardIterator, it makes no guarantee about the order one way or the other. So, even if it might appear to be randomly ordered, it would not be wise to rely on this, as the randomness of the ordering may be implementation-dependent and it may or may not have high entropy.

If you want a random order, you should shuffle it.

like image 21
Bernhard Barker Avatar answered Mar 18 '23 20:03

Bernhard Barker