I only noticed today the existence of Math.fma(a, b, c)
in Java 9, which computes a*b + c
(for double
and float
values).
Returns the fused multiply add of the three arguments; that is, returns the exact product of the first two arguments summed with the third argument and then rounded once to the nearest float. The rounding is done using the round to nearest even rounding mode. In contrast, if a * b + c is evaluated as a regular floating-point expression, two rounding errors are involved, the first for the multiply operation, the second for the addition operation.
So it looks like it improves accuracy, by doing 1 rounding instead of 2. Is that correct? Is that conditional on CPU capabilities, or can we count on that always?
I'm guessing it might be implemented using special CPU instructions. Is that the case? And if so, can we expect performance benefits as well? I'm interested to read about actual benefits with current platforms/CPUs, but also about hypothetical future benefits.
Edit (trying to make it a bit less broad): I'm not looking after very detailed answers: yes/no to the few items to correct/confirm my understanding, plus a few pointers, would be enough for me to mark an answer as accepted. I'm really interested about both accuracy & performance aspects, and I think they go together...
Accuracy of the system are classified into: Point Accuracy. Accuracy as Percentage of Scale Range. Accuracy as Percentage of True Value.
Definition of accuracy 1 : freedom from mistake or error : correctness checked the novel for historical accuracy. 2a : conformity to truth or to a standard or model : exactness impossible to determine with accuracy the number of casualties.
Accuracy refers to the closeness of a measured value to a standard or known value. For example, if in lab you obtain a weight measurement of 3.2 kg for a given substance, but the actual or known weight is 10 kg, then your measurement is not accurate. In this case, your measurement is not close to the known value.
Yes, FMA improves accuracy for the very reason you said.
JVM uses FMA CPU instructions if available. However, FMA is not available everywhere. For example, Intel x86 CPUs before Haswell doesn't have it. It means that most Intel CPUs doesn't have FMA currently.
If CPU FMA is not available, Java uses a very slow solution: it performs FMA using java.math.BigDecimal
(that is the current solution - it may change in the future, but I bet it will always be slow compared to the CPU FMA).
I'm on mac with the 5-th generation of i7. When I do:
sysctl -n machdep.cpu.brand_string
I can see that my cpu is Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
and that cu supports the FMA
, you can see that by:
sysctl -a | grep machdep.cpu | grep FMA
and as a result I get a line where this String is present. Now let's see if the JVM actually uses that.
Those methods (one for double
and one for float
) are annotated with @HotSpotIntrinsicCandidate
which means that JIT
could replace them with actual CPU native instructions - if such are available, but this would mean that the method has to be hot enough - called multiple times and that's a JVM dependent thing.
I'm trying to simulate that with:
public static void main(String[] args) {
double result = 0;
for (int i = 0; i < 50_000; ++i) {
result = result + mine(i);
}
System.out.println(result);
}
private static float mine(int x) {
return Math.fma(x, x, x);
}
And I run that with:
java -XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
-XX:+PrintIntrinsics
-XX:CICompilerCount=2
-XX:+PrintCompilation
org.so/FMATest
There will be a bunch of lines there, but one of them is:
@ 6 java.lang.Math::fma (12 bytes) (intrinsic)
Which means that JVM has indeed used an intrinsic method for FMA instructions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With