Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using "sincos" in Java

In a lot of situations I not only need the sine, but also the cosine of the same parameter.

For C, there is the sincos function in the common unix m math library. And actually, at least on i386, this should be a single assembly instruction, fsincos.

sincos, sincosf, sincosl - calculate sin and cos simultaneously

I guess these benefits exist because there is an obvious overlap in computing sine and cosine: sin(x)^2 + cos(x)^2 = 1. But AFAIK it does not pay off to try to shortcut this as cos = Math.sqrt(1 - sin*sin), as the sqrt function comes at a similar cost.

Is there any way to reap the same benefits in Java? I guess I'm going to pay a price for a double[] then; which maybe makes all the efforts moot because of the added garbage collection.

Or is the Hotspot compiler smart enough to recognize that I need both, and will compile this to a sincos command? Can I test whether it recognizes it, and can I help it recognizing this, e.g. by making sure the Math.sin and Math.cos commands are directly successive in my code? This would actually make the most sense from a Java language point of view: having the comiler optimize this to use the fsincos assembly call.

Collected from some assembler documentation:

Variations    8087         287        387      486     Pentium
fsin           -            -       122-771  257-354   16-126  NP
fsincos        -            -       194-809  292-365   17-137  NP
 Additional cycles required if operand > pi/4 (~3.141/4 = ~.785)
sqrt        180-186      180-186    122-129   83-87    70      NP

fsincos should need an extra pop, but that should come at 1 clock cycle. Assuming that the CPU also does not optimize this, sincos should be almost twice as fast as calling sin twice (second time to compute cosine; so i figure it will need to do an addition). sqrt could be faster in some situations, but sine can be faster.

Update: I've done some experiments in C, but they are inconclusive. Interestingly enough, sincos seems to be even slightly faster than sin (without cos), and the GCC compiler will use fsincos when you compute both sin and cos - so it does what I'd like Hotspot to do (or does Hotspot, too?). I could not yet prevent the compiler from outsmarting me by using fsincos except by not using cos. It will then fall back to a C sin, not fsin.

like image 739
Has QUIT--Anony-Mousse Avatar asked Nov 19 '12 19:11

Has QUIT--Anony-Mousse


2 Answers

I have performed some microbenchmarks with caliper. 10000000 iterations over a (precomputed) array of random numbers in the range -4*pi .. 4*pi. I tried my best to get the fastest JNI solution I could come up going - it's a bit hard to predict whether you will actually get fsincos or some emulated sincos. Reported numbers are the best of 10 caliper trials (which in turn consist of 3-10 trials, the average of which is reported). So roughly it's 30-100 runs of the inner loop each.

I've benchmarked several variants:

  • Math.sin only (reference)
  • Math.cos only (reference)
  • Math.sin + Math.cos
  • sincos via JNI
  • Math.sin + cos via Math.sqrt( (1+sin) * (1-sin) ) + sign reconstruction
  • Math.cos + sin via Math.sqrt( (1+cos) * (1-cos) ) + sign reconstruction

(1+sin)*(1-sin)=1-sin*sin mathematically, but if sin is close to 1 it should be more precise? Runtime difference is minimal, you save one addition.

Sign reconstruction via x %= TWOPI; if (x<0) x+=TWOPI; and then checking the quadrant. If you have an idea how to do this with less CPU, I'd be happy to hear.

Numerical loss via sqrt seems to be okay, at least for common angles. On the range of 1e-10 from rough experiments.

Sin         1,30 ==============
Cos         1,29 ==============
Sin, Cos    2,52 ============================
JNI sincos  1,77 ===================
SinSqrt     1,49 ================
CosSqrt     1,51 ================

The sqrt(1-s*s) vs. sqrt((1+s)*(1-s)) makes about 0,01 difference. As you can see, the sqrt based approach wins hands down against any of the others (as we can't currently access sincos in pure Java). The JNI sincos is better than computing sin and cos, but the sqrt approach is still faster. cos itself seems to be consistently a tick (0,01) better than sin, but the case distinction to reconstruct the sign has an extra > test. I don't think my results support that either sin+sqrt or cos+sqrt is clearly preferrable, but they do save around 40% of the time compared to sin then cos.

If we would extend Java to have an intrinsic optimized sincos, then this would likely be even better. IMHO it is a common use case, e.g. in graphics. When used in AWT, Batik etc. numerous applications could benefit from it.

If I'd run this again, I would also add JNI sin and a noop to estimate the cost of JNI. Maybe also benchmark the sqrt trick via JNI. Just to make sure that we actually do want an intrinsic sincos in the long run.

like image 171
Erich Schubert Avatar answered Oct 13 '22 11:10

Erich Schubert


Most sin and cos calculations are calls directly to the hardware. There isn't much of a faster way to calculate it than that. Specifically, in the range +- pi/4, the rates are extremely fast. If you use hardware acceleration in general, and try to limit the values to those specified, you should be fine. Source.

like image 39
PearsonArtPhoto Avatar answered Oct 13 '22 12:10

PearsonArtPhoto