Since the function fsin
for computing the sin(x)
function under the x86 dates back to the Pentium era, and apparently it doesn't even use SSE registers, I was wondering if there is a newer and better set of instructions for computing trigonometric functions.
I'm used to code in C++ and do some asm
optimizations, so anything that fits in a pipeline starting from C++, to C to asm will do for me.
Thanks.
I'm under Linux 64 bit for now, with gcc
and clang
( even tough clang doesn't really offer any FPU related optimization AFAIK ).
EDIT
sin
function, it's usually 2 times faster then std::sin
even with sse
on.fsin
, even tough fsin
is usually more accurate, but considering that fsin
never outperforms my sin
implementation, I'll keep my sin
for now, also my sin
is totally portable where fsin
is for x86 only.If you need an approximation of sine optimized for absolute accuracy over -π … π, use:
x * (1 + x * x * (-0.1661251158026961831813227851437597220432 + x * x * (8.03943560729777481878247432892823524338e-3 + x * x * -1.4941402004593877749503989396238510717e-4))
It can be implemented with:
float xx = x * x;
float s = x + (x * xx) * (-0.16612511580269618f + xx * (8.0394356072977748e-3f + xx * -1.49414020045938777495e-4f));
And perhaps optimized depending on the characteristics of your target architecture. Also, not noted in the linked blog post, if you are implementing this in assembly, do use the FMADD
instruction. If implementing in C or C++, if you use, say, the fmaf()
C99 standard function, make sure that FMADD
is generated. The emulated version is much more expensive than a multiplication and an addition, because what fmaf()
does is not exactly equivalent to multiplication followed by addition (so it would be incorrect to just implement it so).
The difference between sin(x) and the above polynomial between -π and π graphs so:
The polynomial is optimized to reduce the difference between it and sin(x) between -π and π, not just something that someone thought was a good idea.
If you only need the [-1 … 1] definition interval, then the polynomial can be made more accurate on that interval by ignoring the rest. Running the optimization algorithm again for this definition interval produces:
x * (1 + x * x * (-1.666659904470566774477504230733785739156e-1 + x * x *(8.329797530524482484880881032235130379746e-3 + x * x *(-1.928379009208489415662312713847811393721e-4)))
The absolute error graph:
If that is too accurate for you, it is possible to optimize a polynomial of lower degree for the same objective. Then the absolute error will be larger but you will save a multiplication or two.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With