Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Force Clang to "perform math early" on constant values

This is related to How to force const propagation through an inline function? Clang has an integrated assembler; and it does not use the system's assembler (which is often GNU AS (GAS)). Non-Clang performed the math early, and everything "just worked".

I say "early" because @n.m. objected to describing it as "math performed by the preprocessor." But the idea is the value is known at compile time, and it should be evaluated early, like when the preprocessor evaluates a #if (X % 32 == 0).

Below, Clang 3.6 is complaining about violating a constraint. It appears the constant is not being propagated throughout:

$ export CXX=/usr/local/bin/clang++
$ $CXX --version
clang version 3.6.0 (tags/RELEASE_360/final)
Target: x86_64-apple-darwin12.6.0
...
$ make
/usr/local/bin/clang++ -DNDEBUG -g2 -O3 -Wall -fPIC -arch i386 -arch x86_64 -pipe -Wno-tautological-compare -c integer.cpp
In file included from integer.cpp:8:
In file included from ./integer.h:7:
In file included from ./secblock.h:7:
./misc.h:941:44: error: constraint 'I' expects an integer constant expression
        __asm__ ("rolb %1, %0" : "+mq" (x) : "I" ((unsigned char)(y%8)));
                                                  ^~~~~~~~~~~~~~~~~~~~
./misc.h:951:44: error: constraint 'I' expects an integer constant expression
...

The functions above are inlined template specializations:

template<> inline byte rotrFixed<byte>(byte x, unsigned int y)
{
    // The I constraint ensures we use the immediate-8 variant of the
    // shift amount y. However, y must be in [0, 31] inclusive. We
    // rely on the preprocessor to propoagte the constant and perform
    // the modular reduction so the assembler generates the instruction.
    __asm__ ("rorb %1, %0" : "+mq" (x) : "I" ((unsigned char)(y%8)));
    return x;
}

They are being invoked with a const value, so the rotate amount is known at compile time. A typical caller might look like:

unsigned int x1 =  rotrFixed<byte>(1, 4);
unsigned int x2 =  rotrFixed<byte>(1, 32);

None of these [questionable] tricks would be required if GCC or Clang provided an intrinsic to perform the rotate in near constant time. I'd even settle for "perform the rotate" since they don't even have that.

What is the trick needed to get Clang to resume performing the preprocessing of the const value?


Astute readers will recognize rotrFixed<byte>(1, 32) could be undefined behavior if using a traditional C/C++ rotate. So we drop into assembly to avoid the C/C++ limitations and enjoy the 1 instruction speedup.

Curious reader may wonder why we would do this. The cryptographers call out the specs, and sometimes those specs are not sympathetic to the underlying hardware or standard bodies. Rather than changing the cryptographer's specification, we attempt to provide it verbatim to make audits easier.


A bug is opened for this issue: LLVM Bug 24226 - Constant not propagated into inline assembly, results in "constraint 'I' expects an integer constant expression".

I don't know what guarantees Clang makes, but I know the compiler and integrated assembler claim to be compatible with GCC and GNU's assembler. And GCC and GAS provide the propagation of the constant value.

like image 504
jww Avatar asked Jul 23 '15 03:07

jww


1 Answers

Since you seem to be out of luck trying to force a constant evaluation due to design decisions, the ror r/m8, cl form might be a good compromise:

__asm__ ("rorb %b1, %b0" : "+q,m" (x) : "c,c" (y) : "cc");

The multiple alternative constraint syntax is to 'promote' register use over memory use due to an issue with clang, covered here. I don't know if this issue has been resolved in later versions. gcc tends to be better at constraint matching and avoiding spills.

This does require loading (y) into the rcx/ecx/cl register, but the compiler can probably hide it behind another latency. Furthermore, there are no range issues for (y). rorb effectively uses (%cl % 8). The "cc" clobber isn't required.


If an expression is constant, both gcc and clang can use __builtin_constant_p :

if (__builtin_constant_p(y))
    __asm__("rorb %1, %b0" : "+q,m" (x) : "N,N" ((unsigned char) y) : "cc");
else
    ... non-constant (y) ...

or as alluded to in the mailing list:

if (__builtin_constant_p(y))
{
    if ((y &= 0x7) != 0)
        x = (x >> y) | (x << (8 - y)); /* gcc generates rotate. */
}
like image 98
Brett Hale Avatar answered Oct 29 '22 03:10

Brett Hale