In a lot of situations I not only need the sine, but also the cosine of the same parameter.
For C, there is the sincos
function in the common unix m
math library. And actually, at least on i386, this should be a single assembly instruction, fsincos
.
sincos, sincosf, sincosl - calculate sin and cos simultaneously
I guess these benefits exist because there is an obvious overlap in computing sine and cosine: sin(x)^2 + cos(x)^2 = 1
. But AFAIK it does not pay off to try to shortcut this as cos = Math.sqrt(1 - sin*sin)
, as the sqrt
function comes at a similar cost.
Is there any way to reap the same benefits in Java? I guess I'm going to pay a price for a double[]
then; which maybe makes all the efforts moot because of the added garbage collection.
Or is the Hotspot compiler smart enough to recognize that I need both, and will compile this to a sincos
command? Can I test whether it recognizes it, and can I help it recognizing this, e.g. by making sure the Math.sin
and Math.cos
commands are directly successive in my code? This would actually make the most sense from a Java language point of view: having the comiler optimize this to use the fsincos
assembly call.
Collected from some assembler documentation:
Variations 8087 287 387 486 Pentium
fsin - - 122-771 257-354 16-126 NP
fsincos - - 194-809 292-365 17-137 NP
Additional cycles required if operand > pi/4 (~3.141/4 = ~.785)
sqrt 180-186 180-186 122-129 83-87 70 NP
fsincos
should need an extra pop, but that should come at 1 clock cycle. Assuming that the CPU also does not optimize this, sincos
should be almost twice as fast as calling sin
twice (second time to compute cosine; so i figure it will need to do an addition). sqrt
could be faster in some situations, but sine can be faster.
Update: I've done some experiments in C, but they are inconclusive. Interestingly enough, sincos
seems to be even slightly faster than sin
(without cos
), and the GCC compiler will use fsincos
when you compute both sin
and cos
- so it does what I'd like Hotspot to do (or does Hotspot, too?). I could not yet prevent the compiler from outsmarting me by using fsincos
except by not using cos
. It will then fall back to a C sin
, not fsin
.
I have performed some microbenchmarks with caliper. 10000000 iterations over a (precomputed) array of random numbers in the range -4*pi .. 4*pi. I tried my best to get the fastest JNI solution I could come up going - it's a bit hard to predict whether you will actually get fsincos
or some emulated sincos
. Reported numbers are the best of 10 caliper trials (which in turn consist of 3-10 trials, the average of which is reported). So roughly it's 30-100 runs of the inner loop each.
I've benchmarked several variants:
Math.sin
only (reference)Math.cos
only (reference)Math.sin
+ Math.cos
sincos
via JNIMath.sin
+ cos via Math.sqrt( (1+sin) * (1-sin) )
+ sign reconstructionMath.cos
+ sin via Math.sqrt( (1+cos) * (1-cos) )
+ sign reconstruction(1+sin)*(1-sin)=1-sin*sin
mathematically, but if sin is close to 1 it should be more precise? Runtime difference is minimal, you save one addition.
Sign reconstruction via x %= TWOPI; if (x<0) x+=TWOPI;
and then checking the quadrant. If you have an idea how to do this with less CPU, I'd be happy to hear.
Numerical loss via sqrt
seems to be okay, at least for common angles. On the range of 1e-10 from rough experiments.
Sin 1,30 ==============
Cos 1,29 ==============
Sin, Cos 2,52 ============================
JNI sincos 1,77 ===================
SinSqrt 1,49 ================
CosSqrt 1,51 ================
The sqrt(1-s*s)
vs. sqrt((1+s)*(1-s))
makes about 0,01 difference. As you can see, the sqrt
based approach wins hands down against any of the others (as we can't currently access sincos
in pure Java). The JNI sincos
is better than computing sin
and cos
, but the sqrt
approach is still faster. cos
itself seems to be consistently a tick (0,01) better than sin
, but the case distinction to reconstruct the sign has an extra >
test. I don't think my results support that either sin+sqrt
or cos+sqrt
is clearly preferrable, but they do save around 40% of the time compared to sin
then cos
.
If we would extend Java to have an intrinsic optimized sincos, then this would likely be even better. IMHO it is a common use case, e.g. in graphics. When used in AWT, Batik etc. numerous applications could benefit from it.
If I'd run this again, I would also add JNI sin
and a noop
to estimate the cost of JNI. Maybe also benchmark the sqrt
trick via JNI. Just to make sure that we actually do want an intrinsic sincos
in the long run.
Most sin and cos calculations are calls directly to the hardware. There isn't much of a faster way to calculate it than that. Specifically, in the range +- pi/4, the rates are extremely fast. If you use hardware acceleration in general, and try to limit the values to those specified, you should be fine. Source.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With