void NetClass::Modulate(vector <synapse> & synapses )
{
int size = synapses.size();
int split = 200 * 0.5;
for(int w=0; w < size; w++)
if(synapses[w].active)
synapses[w].rmod = ((rand_r(seedp) % 200 - split ) / 1000.0);
}
The function rand_r(seedp)
is seriously bottle-necking my program. Specifically, its slowing me by 3X when run serialy, and 4.4X when run on 16 cores. rand()
is not an option because its even worse. Is there anything I can do to streamline this? If it will make a difference, I think I can sustain a loss in terms of statistical randomness. Would pre-generating (before execution) a list of random numbers and then loading to the thread stacks be an option?
Problem is that seedp
variable (and its memory location) is shared among several threads. Processor cores must synchronize their caches each time they access this ever changing value, which hampers performance. The solution is that all threads work with their own seedp
, and so avoid cache synchronization.
It depends on how good the statistical randomness needs to be. For high quality, the Mersenne twister, or its SIMD variant, is a good choice. You can generate and buffer a large block of pseudo-random numbers at a time, and each thread can have its own state vector. The Park-Miller-Carta PRNG is extremely simple - these guys even implemented it as a CUDA kernel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With