Using C++11's random module, I encountered an odd performance drop when using std::mt19937
(32 and 64bit versions) in combination with a uniform_real_distribution
(float or double, doesn't matter). Compared to a g++ compile, it's more than an order of magnitude slower!
The culprit isn't just the mt generator, as it's fast with a uniform_int_distribution
. And it isn't a general flaw in the uniform_real_distribution
since that's fast with other generators like default_random_engine
. Just that specific combination is oddly slow.
I'm not very familiar with the intrinsics, but the Mersenne Twister algorithm is more or less strictly defined, so a difference in implementation couldn't account for this difference I guess? measure Program is following, but here are my results for clang 3.4 and gcc 4.8.1 on a 64bit linux machine:
gcc 4.8.1 runtime_int_default: 185.6 runtime_int_mt: 179.198 runtime_int_mt_64: 175.195 runtime_float_default: 45.375 runtime_float_mt: 58.144 runtime_float_mt_64: 94.188 clang 3.4 runtime_int_default: 215.096 runtime_int_mt: 201.064 runtime_int_mt_64: 199.836 runtime_float_default: 55.143 runtime_float_mt: 744.072 <--- this and runtime_float_mt_64: 783.293 <- this is slow
Program to generate this and try out yourself:
#include <iostream> #include <vector> #include <chrono> #include <random> template< typename T_rng, typename T_dist> double time_rngs(T_rng& rng, T_dist& dist, int n){ std::vector< typename T_dist::result_type > vec(n, 0); auto t1 = std::chrono::high_resolution_clock::now(); for (int i = 0; i < n; ++i) vec[i] = dist(rng); auto t2 = std::chrono::high_resolution_clock::now(); auto runtime = std::chrono::duration_cast<std::chrono::microseconds>(t2-t1).count()/1000.0; auto sum = vec[0]; //access to avoid compiler skipping return runtime; } int main(){ const int n = 10000000; unsigned seed = std::chrono::system_clock::now().time_since_epoch().count(); std::default_random_engine rng_default(seed); std::mt19937 rng_mt (seed); std::mt19937_64 rng_mt_64 (seed); std::uniform_int_distribution<int> dist_int(0,1000); std::uniform_real_distribution<float> dist_float(0.0, 1.0); // print max values std::cout << "rng_default_random.max(): " << rng_default.max() << std::endl; std::cout << "rng_mt.max(): " << rng_mt.max() << std::endl; std::cout << "rng_mt_64.max(): " << rng_mt_64.max() << std::endl << std::endl; std::cout << "runtime_int_default: " << time_rngs(rng_default, dist_int, n) << std::endl; std::cout << "runtime_int_mt: " << time_rngs(rng_mt_64, dist_int, n) << std::endl; std::cout << "runtime_int_mt_64: " << time_rngs(rng_mt_64, dist_int, n) << std::endl; std::cout << "runtime_float_default: " << time_rngs(rng_default, dist_float, n) << std::endl; std::cout << "runtime_float_mt: " << time_rngs(rng_mt, dist_float, n) << std::endl; std::cout << "runtime_float_mt_64: " << time_rngs(rng_mt_64, dist_float, n) << std::endl; }
compile via clang++ -O3 -std=c++11 random.cpp
or g++ respectively. Any ideas?
edit: Finally, Matthieu M. had a great idea: The culprit is inlining, or rather a lack thereof. Increasing the clang inlining limit eliminated the performance penalty. That actually solved a number of performance oddities I encountered. Thanks, I learned something new.
As already stated in the comments, the problem is caused by the fact that gcc inlines more aggressive than clang. If we make clang inline very aggressively, the effect disappears:
Compiling your code with g++ -O3
yields
runtime_int_default: 3000.32 runtime_int_mt: 3112.11 runtime_int_mt_64: 3069.48 runtime_float_default: 859.14 runtime_float_mt: 1027.05 runtime_float_mt_64: 1777.48
while clang++ -O3 -mllvm -inline-threshold=10000
yields
runtime_int_default: 3623.89 runtime_int_mt: 751.484 runtime_int_mt_64: 751.132 runtime_float_default: 1072.53 runtime_float_mt: 968.967 runtime_float_mt_64: 1781.34
Apparently, clang now out-inlines gcc in the int_mt
cases, but all of the other runtimes are now in the same order of magnitude. I used gcc 4.8.3 and clang 3.4 on Fedora 20 64 bit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With