I am running while loop in 4 thread, in the loop I am evaluating function and incrementally increasing counter.
while(1) {
int fitness = EnergyFunction::evaluate(sequence);
mutex.lock();
counter++;
mutex.unlock();
}
When I run this loop, as I said in 4 running threads, I get ~ 20 000 000 evaluations per second.
while(1) {
if (dist(mt) == 0) {
sequence[distDim(mt)] = -1;
} else {
sequence[distDim(mt)] = 1;
}
int fitness = EnergyFunction::evaluate(sequence);
mainMTX.lock();
overallGeneration++;
mainMTX.unlock();
}
If I add some random mutation for the sequence, I get ~ 13 000 000 evaluations per second.
while(1) {
if (dist(mt) == 0) {
sequence[distDim(mt)] = -1;
} else {
sequence[distDim(mt)] = 1;
}
int fitness = EnergyFunction::evaluate(sequence);
mainMTX.lock();
if(fitness < overallFitness)
overallFitness = fitness;
overallGeneration++;
mainMTX.unlock();
}
But when I add simple if statement that checks, if new fitness is smaller than old fitness if that is true then replace old fitness with new fitness.
But performance loss is massive! Now I get ~ 20 000 evaluations per second. If I remove random mutation part, I also get ~ 20 000 evaluations per second.
Variable overallFitness is declared as
extern int overallFitness;
I am having troubles figuring out what is the problem for such a big performance loss. Is comparing two int such time taking operation?
Also I don't believe that is related to mutex locking.
UPDATE
This performance loss was not because of branch prediction, but compiler just ignored this call int fitness = EnergyFunction::evaluate(sequence);
.
Now I added volatile
and compiler doesn't ignore the call anymore.
Also thank you for pointing out branch misprediction and atomic<int>
, didn't know about them!
Because of atomic I also remove mutex part, so the final code looks like this:
while(1) {
sequence[distDim(mt)] = lookup_Table[dist(mt)];
fitness = EnergyFunction::evaluate(sequence);
if(fitness < overallFitness)
overallFitness = fitness;
++overallGeneration;
}
Now I am getting ~ 25 000 evaluations per second.
In general it will not affect the performance but can cause unexpected behaviour. In terms of Clean Code unneserry if and if-else statements have to be removed for clarity, maintainability, better testing. One case where the performance will be reduced because of unnecessary if statements is in loops.
Conclusion: If only is faster than If-Else construct. There is no difference, no "optimization" potential, between any of the alternatives at the source code level.
Sometimes a tiny amount of code costs a huge amount of performance. This is especially true of built-in language features, which many programmers assume to be extremely cheap if not free. Today we'll look at if and see just how much performance it can cost your app.
Strong branching provides the most accurate estimate, but is computationally very expensive. The idea is to compute the actual change in bound by solving the bounding problems resulting from imposing the disjunction.
You need to run a profiler to get to the bottom of this. On Linux, use perf
.
My guess is that EnergyFunction::evaluate()
is being entirely optimized away, because in the first examples, you don't use the result. So the compiler can discard the whole thing. You can try writing the return value to a volatile
variable, which should force the compiler or linker to not optimize the call away. 1000x speed up is definitely not attributable to a simple comparison.
There is actually an atomic instruction to increase an int by 1. So a smart compiler may be able to entirely remove the mutex, altough I'd be surprised if it did. You can test this by looking at the assembly, or by removing the mutex and changing the type of overallGeneration
to atomic<int>
an check how fast it still is. This optimization is no longer possible with your last, slow example.
Also, if the compiler can see that evaluate
does nothing to the global state and the result isn't used, then it can skip the entire call to evaluate
. You can find out if that's the case by looking at the assembly or by removing the call to EnergyFunction::evaluate(sequence)
and look at the timing - if it doesn't speed up, the function wasn't called in the first place. This optimization is no longer possible with your last, slow example. You should be able to stop the compiler from not executing EnergyFunction::evaluate(sequence)
by defining the function in a different object file (other cpp or library) and disabling link time optimization.
There are other effects here that also create a performance difference, but I can't see any other effects that can explain a difference of factor 1000. A factor 1000 usually means the compiler cheated in the previous test and the change now prevents it from cheating.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With