Using the stock Sun 1.6 compiler and JRE/JIT, is it a good idea to use the sort of extensive unroll exemplified by Duff's Device to unroll a loop? Or does it end up as code obfuscation with no performance benefit?
The Java profiling tools I've used are less informative about line-by-line CPU usage than, say, valgrind, so I was looking to augment measurement with other people's experience.
Note that, of course, you can't exactly code Duff's Device, but you can do the basic unroll, and that's what I'm wondering about.
short stateType = data.getShort(ptr);
switch (stateType) {
case SEARCH_TYPE_DISPATCH + 16:
if (c > data.getChar(ptr + (3 << 16) - 4)) {
ptr += 3 << 16;
}
case SEARCH_TYPE_DISPATCH + 15:
if (c > data.getChar(ptr + (3 << 15) - 4)) {
ptr += 3 << 15;
}
...
down through many other values.
It doesn't much matter whether it's a good idea (it's not), because it won't compile.
EDIT: This is mentioned explicitly in the JLS:
A trick known as Duff's device can be used in C or C++ to unroll the loop, but this is not valid code in the Java programming language:
Or, more bluntly (from the same section):
Great C hack, Tom, but it's not valid here.
EDIT: To answer your more (too) general question, usually no. You should generally rely on the JIT.
You are ignoring the fact that Java compiles to bytecodes for a stack-oriented virtual machine. Whatever low-level optimization trick you attempt at the Java level is largely ineffective. The real optimization happens when the JIT compiler produces the assembly for the target architecture, a process that you can neither control nor care about for the most part.
You should instead optimize at a much larger picture. Let the JIT compiler handle the low-level optimizations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With