Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wrong result with log10 math function in armv6 on Raspberry Pi

I have this very simple code:

#include <stdio.h>
#include <math.h>
int main()
{
    long v = 35;
    double app = (double)v;
    app /= 100;
    app = log10(app);
    printf("Calculated log10 %lf\n", app);
    return 0;
}

This code works perfectly on x86, but doesn't work on arm, on which the result is 0.00000. Some ideas?

Other info:

Operating system: linux 3.2.27

I build arm toolchain with ct-ng: arm-unknown-linux-gnueabi-

libc version 2.13

Output of gcc -v:

Using built-in specs. COLLECT_GCC=arm-unknown-linux-gnueabi-gcc COLLECT_LTO_WRAPPER=/opt/x-tools/arm-unknown-linux-gnueabi/libexec/gcc/arm-unknown-linux-gnueabi/4.5.1/lto-wrapper Target: arm-unknown-linux-gnueabi Configured with: /home/mirko/misc/rasppi-ct-ng-files/.build/src/gcc-4.5.1/configure --build=x86_64-build_unknown-linux-gnu --host=x86_64-build_unknown-linux-gnu --target=arm-unknown-linux-gnueabi --prefix=/opt/x-tools/arm-unknown-linux-gnueabi --with-sysroot=/opt/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --enable-languages=c --disable-multilib --with-pkgversion=crosstool-NG-1.9.3 --enable-__cxa_atexit --disable-libmudflap --disable-libgomp --disable-libssp --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-gmp=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-mpfr=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-mpc=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-ppl=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-cloog=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-libelf=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --enable-threads=posix --enable-target-optspace --with-local-prefix=/opt/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --disable-nls --enable-symvers=gnu --enable-c99 --enable-long-long Thread model: posix gcc version 4.5.1 (crosstool-NG-1.9.3)

like image 841
MirkoBanchi Avatar asked Dec 20 '12 08:12

MirkoBanchi


1 Answers

Floating point support on ARM Linux distributions is not trivial. Because of that you should use a toolchain matching your system that is operating system & hardware and use the right compile switches.

First thing you need to understand ARM's calling convention which is about "how arguments are passed when you call a function?". ARM being a RISC architecture, can only work on registers. There are no instructions manipulating memory directly. If you need to change a value in memory you first need to load it to a register, modify it, then you need to store it back on the memory.

When you call a function you may need to pass arguments to it, you can put arguments on stack (memory) but since ARM can only work with registers first thing your function would probably do will be loading them back to registers. To avoid this waste ARM calling convention uses registers to pass arguments. However since ARM has a limited number of registers, calling convention also dictates you to use only first four (r0-r3) registers for the first four arguments, remaining must still use stack to be passed.

Second thing is early ARM cores didn't have any floating point support, operations where implemented in software. (This is what is still supported via gcc's -mfloat-abi=soft.)

We can easily demonstrate what this means via following snippet.

float pi2(float a) {
    return a * 3.14f;
}

Compiling this via -c -O3 -mfloat-abi=soft and obdumping gives us

00000000 <pi2>:
   0:   f24f 51c3   movw    r1, #62915  ; 0xf5c3
   4:   b508        push    {r3, lr}
   6:   f2c4 0148   movt    r1, #16456  ; 0x4048
   a:   f7ff fffe   bl  0 <__aeabi_fmul>
   e:   bd08        pop {r3, pc}

As you can see (actually it is not visible :) ) pi2 gets its parameter in r0, populates pi constant on r1 and uses __aeabi_fmul to multiply those and return result in r0. Since __aeabi_fmul also uses same calling convention, details about r0 is not visible. All our function does to populate r1 and delegate it to __aeabi_fmul.

When floating hardware support added to ARM (again because of architecture style), it came with its own set of registers (s0, s1, ...).

If we compile same snippet with -c -O3 -mfloat-abi=softfp and dump we get

00000000 <pi2>:
   0:   eddf 7a04   vldr    s15, [pc, #16]  ; 14 <pi2+0x14>
   4:   ee07 0a10   vmov    s14, r0
   8:   ee27 7a27   vmul.f32    s14, s14, s15
   c:   ee17 0a10   vmov    r0, s14
  10:   4770        bx  lr
  12:   bf00        nop
  14:   4048f5c3    .word   0x4048f5c3

As you can see now compiler doesn't create a call to __aeabi_fmul but instead it creates a vmul.f32 instruction after it moves argument located in r0 to s14 and populates 3.14 on s15. After multiplication instruction it moves result available in s14 back to r0 since any caller of this function would expect it because of the calling convention.

Now if you think pi2 as a library provided to you by some third party, you can understand that both soft and softfp implementations do the same thing for you and you can use them interchangeably. If system provides them for you, you wouldn't care if your app runs on a system with hardware floating point support or not. This was quite good to keep old software running on new hardware.

However while keeping compability this approach introduces the overhead of moving values between ARM registers and FP registers. This obviously effects performance and addressed by a new calling convention, called hard by gcc. This new convention states that if you have floating point arguments in your function you can utilize floating point registers interleaved with normal ones, as well as you can return floating point values in floating point register s0.

Again if we compile our snippet with -c -O3 -mfloat-abi=hard and dump we get

00000000 <pi2>:
   0:   eddf 7a02   vldr    s15, [pc, #8]   ; c <pi2+0xc>
   4:   ee20 0a27   vmul.f32    s0, s0, s15
   8:   4770        bx  lr
   a:   bf00        nop
   c:   4048f5c3    .word   0x4048f5c3

You can see there is no registers getting moved around. Argument to pi2 gets passed in s0, compiler created code to populate 3.14 in s15 and uses vmul.f32 s0, s0, s15 to get result we want in s0.

Big problem with this new convention is while you improve the code produced by compiler you completely kill compability. You can't expect an application built with hard convention to work with libraries built for soft/softfp and an application built for softfp won't work with libraries built for hard.

For more information on calling conventions you should check ARM's website.

like image 131
auselen Avatar answered Oct 17 '22 09:10

auselen