I'm working on a multithreaded program where all threads share some vector (read-only). The goal of each thread is to walk the entire vector. Nonetheless, all threads must visit this vector in a different way.
Since the vector is const and shared among all threads, i cannot use random_shuffle and just iterate over it. For now my solution is to build a crossref vector that will contain indices over the shared vector and then shuffle this vector, i.e.
std::vector<int> crossref(SIZE) ; // SIZE is the size of the shared vector
std::iota (std::begin(crossref), std::end(crossref), 0); // Fill with indices ref
std::mt19937 g(SEED); // each thread has it own seed.
std::shuffle (crossref_.begin(), crossref_.end(), g); // Shuffle it
Nonetheless, doing this reveal some problems (1) it is not very efficient since every thread needs to access its crossref vector before accessing the shared one, (2) i have some performances issue because of the amount of memory required : the shared vector is very big and i have a lot of thread and processors.
Does anyone has some improvement ideas that will avoid the need of extra memory?
You can use the algebraic notion of primitive root modulo n. Basically
If n is a positive integer, the integers between 1 and n − 1 that are coprime to n form the group of primitive classes modulo n. This group is cyclic if and only if n is equal to 2, 4, p^k, or 2p^k where p^k is a power of an odd prime number
Wikipedia displays how you can generate numbers below 7
using 3
as generator.
From this statement you derive an algorithm.
n
m
which is bigger than n
F(0)
between 2
and m
F(i+1) = (F(i) * F(0)) mod m
. If that index is within [0, n]
range, access the element. If not go towards the next index.m - 1
iterations (or when you obtain 1, it is the same thing).Because m
is prime, every number between 2 and m-1 is coprime to m
so is a generator of the sequence {1 ... m}
. You are guaranteed that no number will repeat in the first m - 1
steps, and that all m - 1
numbers will appear.
Complexity :
O(thread count)
O(m)
time, O(1)
in space per thread. You dont need to store the F(i). You only need to know first value and last value. This is the same properties as incrementationIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With