Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can the order of std::shuffle depend on anything but the RNG?

For shuffling two vectors in the same order, it's tempting to do something like

whatever_rng_type rng2(rng1);

std::shuffle(vec1.begin(), vec1.end(), rng1);
std::shuffle(vec2.begin(), vec2.end(), rng2);

where identical RNG states are used for the two shuffles. However, I don't see any requirement that these shuffles actually produce the same order in the standard draft I checked.

std::shuffle must use the provided RNG as its source of randomness, but the implementation might also do something like take a different code path for different element sizes. Maybe an implementation might use AVX512 gather/scatter instructions for some types and a generic, non-vectorized code path for others, and that could affect the result ordering.

Is performing two shuffles with the same seed actually a safe way to get the same order? Is there something I missed in a later standard version, or a defect report or something?

like image 943
user2357112 supports Monica Avatar asked Sep 22 '20 14:09

user2357112 supports Monica


People also ask

Why was std:: random_ shuffle removed?

The reason for removing random_shuffle is, that the iterator only version is usually depending on std::rand, which is now also discussed for deprecation, and should be replaced with the classes of the <random> header, as std::rand is considered harmful.

Can random shuffle function be applied on vectors?

Using Fisher-Yates Shuffle Algorithm The algorithm does a linear scan of the vector and swaps each element with a random element among all remaining elements, including the element itself. That's all about shuffling a vector in C++.

How do you shuffle a string in C++?

The shuffle() function in C++ is a function in vector library. It is a function that will rearrange the elements of any range by placing the elements at random positions. To shuffle it uses a uniform random generator which helps in shuffling the elements.


1 Answers

I double-checked the Standard and agree there's nothing requiring shuffle make any guarantees about its use of random numbers. It simply remarks:

"To the extent that the implementation of this function makes use of random numbers, the object referenced by g shall serve as the implementation’s source of randomness.".

So, the remaining question seems to be whether you're interested in implementation-defined or -observed behaviour for any specific implementation(s), or will stick to a Standards-compliant portable solution (e.g. shuffling proxy objects or using an array of indices)...?

Based on your comments, it seems you're opposed to the array-of-indices suggestion, so below - an implementation of using custom iterators and proxies to shuffle the vectors themselves...

(Not done very carefully - more a proof of concept / illustration, so check carefully before using for anything important....)

The approach requires a move_together object that keeps references to the vectors, and then passes shuffle iterators that have a pointer to the move_together object and an index in the vectors being processed. You could arguably simplify this by foregoing the move_together object and having pointers or references to both vectors directly in the iterators. When the iterators are dereferenced, they return proxy objects that then support swapping.

It ostensibly works with GCC 10.2 and clang 10, but there could be a different implementation of std::shuffle for another compiler that requires a more fully fleshed out iterator or proxy....

#include <iostream>
#include <vector>
#include <random>
#include <string>
#include <algorithm>

template <typename T1, typename T2>
struct move_together;

template <typename T1, typename T2>
struct proxy
{
    const move_together<T1, T2>* p_;
    const size_t i_;
    proxy& operator=(const proxy& rhs);
};

template <typename T1, typename T2>
struct move_together
{
    move_together(std::vector<T1>& v1, std::vector<T2>& v2)
      : v1_(v1), v2_(v2)
    { }
    struct iterator
    {
        using iterator_category = std::random_access_iterator_tag;
        using difference_type = ssize_t;
        using value_type = proxy<T1, T2>;
        using pointer = value_type*;
        using reference = value_type&;

        const move_together* p_;
        size_t i_;
        value_type operator*() { return {p_, i_}; }
        bool operator==(const iterator& rhs) const { return i_ == rhs.i_; }
        bool operator!=(const iterator& rhs) const { return !(*this == rhs); }
        difference_type operator-(const iterator& rhs) const
           { return i_ - rhs.i_; }
        iterator operator+(int distance) const
           { return {p_, i_ + distance}; }
        iterator operator++(int) { auto x = *this; ++i_; return x; }
        iterator& operator++() { ++i_; return *this; }
    };

    iterator begin() { return {this, 0}; }
    iterator end()   { return {this, std::min(v1_.size(), v2_.size())}; }
    std::vector<T1>& v1_;
    std::vector<T2>& v2_;
};

template <typename T1, typename T2>
proxy<T1, T2>& proxy<T1, T2>::operator=(const proxy<T1, T2>& rhs)
{
    p_->v1_[i_] = rhs.p_->v1_[rhs.i_];
    p_->v2_[i_] = rhs.p_->v2_[rhs.i_];
}

template <typename T1, typename T2>
void swap(proxy<T1, T2> lhs, proxy<T1, T2> rhs) {
    using std::swap;
    swap(lhs.p_->v1_[lhs.i_], rhs.p_->v1_[rhs.i_]);
    swap(lhs.p_->v2_[lhs.i_], rhs.p_->v2_[rhs.i_]);
}

int main()
{
    std::vector<int> v1{ {1, 2, 3, 4, 5} };
    std::vector<std::string> v2{ {"one", "two", "three", "four", "five"} };

    std::random_device rd;
    std::mt19937 rng{rd()};

    move_together m{v1, v2};
    std::shuffle(m.begin(), m.end(), rng);

    for (const auto& x : v1) std::cout << x << '/';
    std::cout << '\n';
    for (const auto& x : v2) std::cout << x << '/';
    std::cout << '\n';
}
like image 104
Tony Delroy Avatar answered Oct 29 '22 06:10

Tony Delroy