I expected the following code to give similar results. But on my hardware, the range-based for() loop takes ~80 milliseconds while std::transform() takes ~120 milliseconds.
#include <algorithm>
#include <iostream>
#include <chrono>
int main()
{
const auto t1 = std::chrono::high_resolution_clock::now();
/* When using loop #1, this code takes 80 milliseconds to run.
* When using loop #2, this code takes 120 milliseconds to run.
*/
for (size_t idx = 0; idx < 999999; idx ++)
{
std::string str = "Hello, World!";
#if 1
// loop #1
for (auto & c : str)
{
c = std::tolower(c);
}
#else
// loop #2
std::transform(str.begin(), str.end(), str.begin(),
[](unsigned char c)
{
return std::tolower(c);
});
#endif
}
const auto t2 = std::chrono::high_resolution_clock::now();
std::cout
<< std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count()
<< " milliseconds" << std::endl;
return 0;
}
The code was compiled with g++ -O0 test.cpp to disable optimizations as I was curious to see if there was any difference between the two types of loops, not the actual time to run once optimized.
Anyone know why the ranged for() loop would be faster?
The reason is simply that you're using -O0. Nothing in C++, including std::transform(), was designed with fast unoptimized execution in mind. On another compiler, the for loop could be slower. You're basically flipping a coin, and then acting surprised that it didn't land on its edge.
Now in -O2, the compilers should do only what's really necessary, which is the same for the two loops. And then you indeed do get the same outcome: nothing is necessary.
This is a common problem in benchmarking. Compilers are smart. Even if you would have printed hello, world! afterwards, a compiler might notice that calling tolower() once is sufficient. If you design a benchmark, make absolutely sure that every operation that you're benchmarking is actually required to execute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With