I only noticed today the existence of <code>Math.fma(a, b, c)</code> in Java 9, which computes <code>a*b + c</code> (for <code>double</code> and <code>float</code> values). <blockquote> Returns the fused multiply add of the three arguments; that is, returns the exact product of the first two arguments summed with the third argument and then rounded once to the nearest float. The rounding is done using the round to nearest even rounding mode. In contrast, if a * b + c is evaluated as a regular floating-point expression, two rounding errors are involved, the first for the multiply operation, the second for the addition operation. </blockquote> So it looks like it improves accuracy, by doing 1 rounding instead of 2. Is that correct? Is that conditional on CPU capabilities, or can we count on that always? I'm guessing it might be implemented using special CPU instructions. Is that the case? And if so, can we expect performance benefits as well? I'm interested to read about actual benefits with current platforms/CPUs, but also about hypothetical future benefits. Edit (trying to make it a bit less broad): I'm not looking after very detailed answers: yes/no to the few items to correct/confirm my understanding, plus a few pointers, would be enough for me to mark an answer as accepted. I'm really interested about both accuracy & performance aspects, and I think they go together...

Yes, FMA improves accuracy for the very reason you said. JVM uses FMA CPU instructions if available. However, FMA is not available everywhere. For example, Intel x86 CPUs before Haswell doesn't have it. It means that most Intel CPUs doesn't have FMA currently. If CPU FMA is not available, Java uses a very slow solution: it performs FMA using <code>java.math.BigDecimal</code> (that is the current solution - it may change in the future, but I bet it will always be slow compared to the CPU FMA).

I'm on mac with the 5-th generation of i7. When I do: <pre class="prettyprint"><code>sysctl -n machdep.cpu.brand_string </code></pre> I can see that my cpu is <code>Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz</code> and that cu supports the <code>FMA</code>, you can see that by: <pre class="prettyprint"><code>sysctl -a | grep machdep.cpu | grep FMA </code></pre> and as a result I get a line where this String is present. Now let's see if the JVM actually uses that. Those methods (one for <code>double</code> and one for <code>float</code>) are annotated with <code>@HotSpotIntrinsicCandidate</code> which means that <code>JIT</code> could replace them with actual CPU native instructions - if such are available, but this would mean that the method has to be hot enough - called multiple times and that's a JVM dependent thing. I'm trying to simulate that with: <pre class="prettyprint"><code> public static void main(String[] args) { double result = 0; for (int i = 0; i < 50_000; ++i) { result = result + mine(i); } System.out.println(result); } private static float mine(int x) { return Math.fma(x, x, x); } </code></pre> And I run that with: <pre class="prettyprint"><code> java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintIntrinsics -XX:CICompilerCount=2 -XX:+PrintCompilation org.so/FMATest </code></pre> There will be a bunch of lines there, but one of them is: <pre class="prettyprint"><code> @ 6 java.lang.Math::fma (12 bytes) (intrinsic) </code></pre> Which means that JVM has indeed used an intrinsic method for FMA instructions.

What are the accuracy & performance benefits of using Math.fma?

Tags:

java

java-9

I only noticed today the existence of Math.fma(a, b, c) in Java 9, which computes a*b + c
(for double and float values).

Returns the fused multiply add of the three arguments; that is, returns the exact product of the first two arguments summed with the third argument and then rounded once to the nearest float. The rounding is done using the round to nearest even rounding mode. In contrast, if a * b + c is evaluated as a regular floating-point expression, two rounding errors are involved, the first for the multiply operation, the second for the addition operation.

So it looks like it improves accuracy, by doing 1 rounding instead of 2. Is that correct? Is that conditional on CPU capabilities, or can we count on that always?

I'm guessing it might be implemented using special CPU instructions. Is that the case? And if so, can we expect performance benefits as well? I'm interested to read about actual benefits with current platforms/CPUs, but also about hypothetical future benefits.

_{Edit (trying to make it a bit less broad): I'm not looking after very detailed answers: yes/no to the few items to correct/confirm my understanding, plus a few pointers, would be enough for me to mark an answer as accepted. I'm really interested about both accuracy & performance aspects,
and I think they go together...}

333

asked Jun 28 '17 16:06

Hugues M.

2 Answers

Yes, FMA improves accuracy for the very reason you said.

JVM uses FMA CPU instructions if available. However, FMA is not available everywhere. For example, Intel x86 CPUs before Haswell doesn't have it. It means that most Intel CPUs doesn't have FMA currently.

If CPU FMA is not available, Java uses a very slow solution: it performs FMA using java.math.BigDecimal (that is the current solution - it may change in the future, but I bet it will always be slow compared to the CPU FMA).

175

answered Oct 19 '22 09:10

geza

I'm on mac with the 5-th generation of i7. When I do:

sysctl -n machdep.cpu.brand_string

I can see that my cpu is Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz and that cu supports the FMA, you can see that by:

sysctl -a | grep machdep.cpu | grep FMA

and as a result I get a line where this String is present. Now let's see if the JVM actually uses that.

Those methods (one for double and one for float) are annotated with @HotSpotIntrinsicCandidate which means that JIT could replace them with actual CPU native instructions - if such are available, but this would mean that the method has to be hot enough - called multiple times and that's a JVM dependent thing.

I'm trying to simulate that with:

 public static void main(String[] args) {

    double result = 0;
    for (int i = 0; i < 50_000; ++i) {
        result = result + mine(i);
    }
    System.out.println(result);
}

private static float mine(int x) {
    return Math.fma(x, x, x);
}

And I run that with:

 java -XX:+UnlockDiagnosticVMOptions  
      -XX:+PrintInlining 
      -XX:+PrintIntrinsics 
      -XX:CICompilerCount=2 
      -XX:+PrintCompilation  
      org.so/FMATest

There will be a bunch of lines there, but one of them is:

 @ 6   java.lang.Math::fma (12 bytes)   (intrinsic)

Which means that JVM has indeed used an intrinsic method for FMA instructions.

answered Oct 19 '22 08:10

Eugene

Related questions
                            
                                Is it possible for a Java JAR file to damage your system and how can you check what it's doing?
                            
                                How do you configure cmake to only rebuild changed .java files in a java project?
                            
                                Can you write a Java class with ABCL?
                            
                                How to decorate an object in Java
                            
                                Visualising Android AudioTrack from a ByteStream
                            
                                Setting a List indexed property with BeanUtils
                            
                                Is it possible to use EJB 3.1 in desktop applications?
                            
                                Does anybody knows a "Morning Brew" like blog for Java? [closed]
                            
                                In Java, is it possible to use a method/constructor's parameter as a switch statement, case constant?
                            
                                Tint Bitmap with Paint?
                            
                                How to turn off VelocityViewResolver errors in Spring?
                            
                                What are the default modifier fields for an enum type?
                            
                                Struts2 internationalization using a database
                            
                                MD5 digest of a resumed download
                            
                                Erase Swing Content Pane/Panel & display a new panel
                            
                                split a linked list into 2 even lists containing the smallest and largest numbers
                            
                                Correct way of reusing PreparedStatement instance?
                            
                                Spring Security: method is not secured with @PreAuthorize annotation
                            
                                Combining @ClassRule and @Rule in JUnit 4.11
                            
                                How can I eliminate duplicated Enum code?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With