Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Define a `static const` SIMD Variable within a `C` Function

I have a function in this form (From Fastest Implementation of Exponential Function Using SSE):

__m128 FastExpSse(__m128 x)
{
    static __m128  const a   = _mm_set1_ps(12102203.2f); // (1 << 23) / ln(2)
    static __m128i const b   = _mm_set1_epi32(127 * (1 << 23) - 486411);
    static __m128  const m87 = _mm_set1_ps(-87);
    // fast exponential function, x should be in [-87, 87]
    __m128 mask = _mm_cmpge_ps(x, m87);

    __m128i tmp = _mm_add_epi32(_mm_cvtps_epi32(_mm_mul_ps(a, x)), b);
    return _mm_and_ps(_mm_castsi128_ps(tmp), mask);
}

I want to make it C compatible.
Yet the compiler doesn't accept the form static __m128i const b = _mm_set1_epi32(127 * (1 << 23) - 486411); when I use C compiler.

Yet I don't want the first 3 values to be recalculated in each function call.
One solution is to inline it (But sometimes the compilers reject that).

Is there a C style to achieve it in case the function isn't inlined?

Thank You.

like image 951
Royi Avatar asked Dec 24 '22 04:12

Royi


2 Answers

Remove static and const.

Also remove them from the C++ version. const is OK, but static is horrible, introducing guard variables that are checked every time, and a very expensive initialization the first time.

__m128 a = _mm_set1_ps(12102203.2f); is not a function call, it's just a way to express a vector constant. No time can be saved by "doing it only once" - it normally happens zero times, with the constant vector being prepared in the data segment of the program and simply being loaded at runtime, without the junk around it that static introduces.

Check the asm to be sure, without static this is what happens: (from godbolt)

FastExpSse(float __vector(4)):
        movaps  xmm1, XMMWORD PTR .LC0[rip]
        cmpleps xmm1, xmm0
        mulps   xmm0, XMMWORD PTR .LC1[rip]
        cvtps2dq        xmm0, xmm0
        paddd   xmm0, XMMWORD PTR .LC2[rip]
        andps   xmm0, xmm1
        ret
.LC0:
        .long   3266183168
        .long   3266183168
        .long   3266183168
        .long   3266183168
.LC1:
        .long   1262004795
        .long   1262004795
        .long   1262004795
        .long   1262004795
.LC2:
        .long   1064866805
        .long   1064866805
        .long   1064866805
        .long   1064866805
like image 158
harold Avatar answered Dec 26 '22 10:12

harold


_mm_set1_ps(-87); or any other _mm_set intrinsic is not a valid static initializer with current compilers, because it's not treated as a constant expression.

In C++, it compiles to runtime initialization of the static storage location (copying from a vector literal somewhere else). And if it's a static __m128 inside a function, there's a guard variable to protect it.

In C, it simply refuses to compile, because C doesn't support non-constant initializers / constructors. _mm_set is not like a braced initializer for the underlying GNU C native vector, like @benjarobin's answer shows.


This is really dumb, and seems to be a missed-optimization in all 4 mainstream x86 C++ compilers (gcc/clang/ICC/MSVC). Even if it somehow matters that each static const __m128 var have a distinct address, the compiler could achieve that by using initialized read-only storage instead of copying at runtime.

So it seems like constant propagation fails to go all the way to turning _mm_set into a constant initializer even when optimization is enabled.


Never use static const __m128 var = _mm_set... even in C++; it's inefficient.

Inside a function is even worse, but global scope is still bad.

Instead, avoid static. You can still use const to stop yourself from accidentally assigning something else, and to tell human readers that it's a constant. Without static, it has no effect on where/how your variable is stored. const on automatic storage just does compile-time checking that you don't modify the object.

const __m128 var = _mm_set1_ps(-87);    // not static

Compilers are good at this, and will optimize the case where multiple functions use the same vector constant, the same way they de-duplicate string literals and put them in read-only memory.

Defining constants this way inside small helper functions is fine: compilers will hoist the constant-setup out of a loop after inlining the function.

It also lets compilers optimize away the full 16 bytes of storage, and load it with vbroadcastss xmm0, dword [mem], or stuff like that.

like image 23
Peter Cordes Avatar answered Dec 26 '22 12:12

Peter Cordes