Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I ensure lrint is inlined in gcc?

Tags:

c++

c

gcc

c99

After reading around the subject, there is overwhelming evidence from numerous sources that using standard C or C++ casts to convert from floating point to integer numbers on Intel is very slow. In order to meeting the ANSI/ISO specification, Intel CPUs need to execute a large number of instructions including those needed to switch the rounding mode of the FPU hardware.

There are a number of workarounds described in various documents, but the cleanest and most portable seems to be the lrint() call added to C99 and C++ 0x standards. Many documents say that a compiler should inline expand these functions when optimization is enabled, leading to code which is faster than a conventional cast, or a function call.

I even found references to gcc feature tracking bags to add this inline expansion to the gcc optimizer, but in my own performance tests I have been unable to get it to work. All my attempts show lrint performance to be much slower than a simple C or C++ style cast. Examining the assembly output of the compiler, and disassembling the compiled objects always shows an explicit call to an external lrint() or lrintf() function.

The gcc versions I have been working with are 4.4.3 and 4.6.1, and I have tried a number of flag combinations on 32bit and 64bit x86 targets, including options to explicitly enable SSE.

How do I get gcc to inline expand lrint, and give me fast conversions?

like image 773
Al Riddoch Avatar asked Feb 23 '12 14:02

Al Riddoch


1 Answers

The lrint() function may raise domain and range errors. One possible way the libc deals with such errors is setting errno (see C99/C11 section 7.12.1). The overhead of the error checking can be quite significant and in this particular case seems to be enough for the optimizer to decide against inlining.

The gcc flag -fno-math-errno (which is part of -ffast-math) will disable these checks. It might be a good idea to look into -ffast-math if you do not rely on standards-compliant handling of floating-point semantics, in particular NaNs and infinities...

like image 141
Christoph Avatar answered Oct 05 '22 13:10

Christoph