I need to use a logarithm function in some of my code, but the base doesn't matter. So I set out to pick between log()
, log2()
, and log10()
by performance, provided I found any significant differences. (I will refer to said functions as ln
, lb
, and lg
respectively).
Why am I fussing about this? Because I will be calling the function as often as 400,000,000 times per iteration of an optimisation algorithm. This is neither optional nor the topic of my question.
I set up some really basic tests, like so:
timespec start, end;
double sum = 0, m;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
for (int n = 1; n < INT_MAX; ++n)
{
m = n * 10.1;
sum += log(m);
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
cout << "ln=";
cout << diff(start, end).tv_sec << ":" << diff(start, end).tv_nsec << endl;
... // likewise for log2 and log10
(timespec diff(timespec start, timespec end) if you so desire....)
The following results were obtained:
GCC v4.6.3
-O0
ln=140:516853107
lb=155:878100147
lg=173:534086352
-O1
ln=133:948317112
lb=144:78885393
lg=163:870021712
-O2
ln=9:108117039
lb=9:134447209
lg=4:87951676
-O3
ln=9:102016996
lb=9:204672042
lg=4:153153558
I've looked at the output of compiling with -S
, but I really don't have a good enough grip on assembler to fully understand the differences. -S
output: -O0 -S, -O3 -S
Why does lg
optimise better with O2/O3?
EDIT: Source code, note the typo in the third loop, this is the cause of log10 seeming faster (mult. get optimised out). I have accepted the answer I believe is closest, since the question has now been closed, although I learnt a lot from drhirsch's and janneb's answers.
This is going to depend on the implementation of the log() functions in the C library, compiler version, hardware architecture, and so on. Anyway, below I'm using GCC 4.4 on x86-64 with glibc 2.11.
Changing the example so that I add a line
cout << "sum=" << sum << endl;
which prevents the compiler from optimizing away the log() calls, as I mentioned in a comment, I get the following timings (whole seconds only, -O2):
These timings seem roughly consistent with the -O0 and -O1 timings in the original post; at higher optimization levels the log evaluations are optimized away, hence the -O2 and -O3 results are so different.
Furthermore, looking at the log example with the "perf" profiler, the top 5 offenders in the report are
# Samples: 3259205
#
# Overhead Command Shared Object Symbol
# ........ .............. ......................... ......
#
87.96% log /lib/libm-2.11.1.so [.] __ieee754_log
5.51% log /lib/libm-2.11.1.so [.] __log
2.88% log ./log [.] main
2.84% log /lib/libm-2.11.1.so [.] __isnan
0.69% log ./log [.] log@plt
Except for main, all the other symbols are related to the log() call. Summing up these, we can conclude that 97% of the total runtime of this example is spent in log().
The implementation of __ieee754_log can be found here in the glibc git repo. Correspondingly, the other implementations are: log2, log10. Note that the previous links are to the HEAD versions, for released version see their corresponding branches
Unfortunately the OP failed to show us the original code, he chose to obfuscate the code slightly converting it to assembly.
In the assembly code the OP linked (annotations by me):
.L10:
cvtsi2sd %ebx, %xmm0 // convert i to double
xorpd %xmm1, %xmm1 // zero
mulsd .LC0(%rip), %xmm0 // multiply i with 10.1
ucomisd %xmm0, %xmm1 // compare with zero
jae .L31 // always below, never jump
addl $1, %ebx // i++
cmpl $2147483647, %ebx // end of for loop
jne .L10
...
.L31:
call log10, log2, whatever... // this point is never reached
One can see that the call to log
is never executed, especially if you step through it with gdb. All the code does are 231 multiplications and comparisons of a double.
This also explains the stunning increase of execution speed of the log function by a factor of 30 when compiled with -O2
, in case anybody found this strange too.
Edit:
for (int n = 1; n < INT_MAX; ++n)
{
m = n * 10.1;
sum += log(m);
}
The compiler isn't able to completely optimize the loops away, because she is not able to prove that the call to log
will always succeed - it has side effects, if the argument is negative. So she replaces the loop by a comparison with zero - the log
is only executed, if the result of the multiplication is less or below zero. Which means it is never executed :-)
What stays in the loop is the multiplication and the test if the result might be negative.
An interesting result happens if I add -ffast-math
to the compiler options, which relieves the compiler from the strict IEEE compliance:
ln=0:000000944
lb=0:000000475
lg=0:000000357
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With