Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(How) does the Java JIT compiler optimize my code?

I'm writing fairly low level code that must be highly optimized for speed. Every CPU cycle counts. Since the code is in Java I can't write as low level as in C for example, but I want to get everything out of the VM that I can.

I'm processing an array of bytes. There are two parts of my code that I'm primarily interested in at the moment. The first one is:

int key =  (data[i]     & 0xff)
        | ((data[i + 1] & 0xff) <<  8)
        | ((data[i + 2] & 0xff) << 16)
        | ((data[i + 3] & 0xff) << 24);

and the second one is:

key = (key << 15) | (key >>> 17);

Judging from the performance I'm guessing that these statements aren't optimized the way I expect. The second statement is basically a ROTL 15, key. The first statement loads 4 bytes into an int. The 0xff masks are there only to compensate for the added sign bits resulting from the implicit cast to int if the accessed byte happens to be negative. This should be easy to translate to efficient machine code, but to my surprise performance goes up if I remove the masks. (Which of course breaks my code, but I was interested to see what happens.)

What's going on here? Do the most common Java VMs optimize this code during JIT in the way one would expect a good C++ compiler to optimize the equivalent C++ code? Can I influence this process? Setting -XX:+AggressiveOpts seems to make no difference.

(CPU: x64, Platform: Linux/HotSpot)

like image 271
Rinke Avatar asked Nov 24 '11 11:11

Rinke


People also ask

Does Java compiler optimize code?

The JVMs JIT compiler is one of the fascinating mechanisms on the Java platform. It optimizes your code for performance, without giving away its readability. Not only that, beyond the “static” optimization methods of inlining, it also makes decisions based on the way that the code performs in practice.

How JIT compiler improves the performance?

The JIT compiler helps improve the performance of Java programs by compiling bytecodes into native machine code at run time. The JIT compiler is enabled by default. When a method has been compiled, the JVM calls the compiled code of that method directly instead of interpreting it.

What is the advantage of compiling Java source code using a JIT?

Advantages of just-in-time compilationJIT compilers need less memory usage. JIT compilers run after a program starts. Code optimization can be done while the code is running. Any page faults can be reduced.

Why JIT compiler is faster?

A JIT compiler can be faster because the machine code is being generated on the exact machine that it will also execute on. This means that the JIT has the best possible information available to it to emit optimized code.


2 Answers

How do you test the performance?

Here is a good article:

http://www.ibm.com/developerworks/java/library/j-benchmark1/index.html

http://www.ibm.com/developerworks/java/library/j-benchmark2/index.html

http://ellipticgroup.com/html/benchmarkingArticle.html

like image 114
Puce Avatar answered Oct 27 '22 14:10

Puce


I've done a lot of performance code in Java, I've even coded directly in Bytecode, enough to be sure of a couple of thing : the JIT is a black box with obscure behaviours, the JIT and compilers are incredibly efficient, and the simplest code usually yield the best performance.

This is normal when you think about the GOAL of the JIT: extract the best possible performance from any Java code. When you add that Java is quite a simple and plain language, the simple code will be optimized, and any further trick will generally do no good.

Of course, there are some common pitfalls and gotchas you ought to know, but I see none in your code samples. Were I to optimize your code, I would go straight to the higher level: algorithm. What is the complexity of your code? Can some data be cached? What APIs are used? Etc... There's a seemingly endless pit of performance to be extracted from algorithmic tricks alone.

And if even this is not sufficient, if the language is not fast enough, if your machine is not fast enough, if your algorithm cannot be made any faster, the answer won't lie in "clock cycles", because you might squeeze 20% of efficiency, but 20% will never be enough when your data grow. To be sure you will never hit (again) a performance wall, the ultimate answer lies in scalability: make your algorithm and your data endlessly distributable so you can just throw more workers to the task.

like image 45
solendil Avatar answered Oct 27 '22 14:10

solendil