Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A faster but less accurate fsin for Intel asm?

Since the function fsin for computing the sin(x) function under the x86 dates back to the Pentium era, and apparently it doesn't even use SSE registers, I was wondering if there is a newer and better set of instructions for computing trigonometric functions.

I'm used to code in C++ and do some asm optimizations, so anything that fits in a pipeline starting from C++, to C to asm will do for me.

Thanks.


I'm under Linux 64 bit for now, with gcc and clang ( even tough clang doesn't really offer any FPU related optimization AFAIK ).

EDIT

  • I have already implemented a sin function, it's usually 2 times faster then std::sin even with sse on.
  • My function is never slower then fsin, even tough fsin is usually more accurate, but considering that fsin never outperforms my sin implementation, I'll keep my sin for now, also my sin is totally portable where fsin is for x86 only.
  • I need this for real time computation, so I'll trade precision for speed, I think that I'll be fine with 4-5 decimals of precision .
  • no to a table based approach, I'm not using it, it screws up the cache, makes everything slower, no algorithm based on memory access or lookup tables please.
like image 623
user2485710 Avatar asked May 23 '14 20:05

user2485710


1 Answers

If you need an approximation of sine optimized for absolute accuracy over -π … π, use:

x * (1 + x * x * (-0.1661251158026961831813227851437597220432 + x * x * (8.03943560729777481878247432892823524338e-3 + x * x * -1.4941402004593877749503989396238510717e-4))

It can be implemented with:

float xx = x * x;
float s = x + (x * xx) * (-0.16612511580269618f + xx * (8.0394356072977748e-3f + xx * -1.49414020045938777495e-4f));

And perhaps optimized depending on the characteristics of your target architecture. Also, not noted in the linked blog post, if you are implementing this in assembly, do use the FMADD instruction. If implementing in C or C++, if you use, say, the fmaf() C99 standard function, make sure that FMADD is generated. The emulated version is much more expensive than a multiplication and an addition, because what fmaf() does is not exactly equivalent to multiplication followed by addition (so it would be incorrect to just implement it so).

The difference between sin(x) and the above polynomial between -π and π graphs so:

graphpipi

The polynomial is optimized to reduce the difference between it and sin(x) between -π and π, not just something that someone thought was a good idea.

If you only need the [-1 … 1] definition interval, then the polynomial can be made more accurate on that interval by ignoring the rest. Running the optimization algorithm again for this definition interval produces:

x * (1 + x * x * (-1.666659904470566774477504230733785739156e-1 + x * x *(8.329797530524482484880881032235130379746e-3 + x * x *(-1.928379009208489415662312713847811393721e-4)))

The absolute error graph:

graph11

If that is too accurate for you, it is possible to optimize a polynomial of lower degree for the same objective. Then the absolute error will be larger but you will save a multiplication or two.

like image 188
Pascal Cuoq Avatar answered Oct 05 '22 23:10

Pascal Cuoq