Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java bytecode "excessive" number of dup considered "poor" code?

This is a two part question, but wouldn't make sense by the individual pieces. Is a large number of dup instructions within the bytecode output an indicator of poorly written code? Where large is defined by some percentage of all bytecode instructions. Further how does one go about rewriting code that generates a dup instruction?

like image 284
Woot4Moo Avatar asked Feb 15 '13 20:02

Woot4Moo


3 Answers

Are we talking about javac output you are analyzing or your own compiler/generator? If you are concerned about the quality of your Java code from the perspective of what javac produces - forget about it. First of all javac produces suboptimal bytecode and relies on JVM/JIT to do all the optimizations (very good choice). But still bytecode is probably much better than anything one can come up with quickly. It's similar to asking about the quality of assembly code generated by C compiler.

If you are generating bytecode yourself, excessive number of dup may look bad, but as well it might not have any impact on performance. Remember that bytecode is translated to assembly on target machine. JVM is stack machine but most architectures these days are register based. The fact that dup is used is only because some bytecode instructions are destructive (pop value from operand stack when reading). This doesn't happen with registers - you can read them as many times as you want. Take the following code as an example:

new java/lang/Object
dup
invokespecial java/lang/Object <init> ()V

dup must be used here because invokespecial pops top of the operand stack. Creating an object just to loose a reference to it after calling constructor sounds like a bad idea. But in assembly there is no dup, no data copying and duplication. You will just have a single CPU registry pointing to java/lang/Object.

In other words suboptimal bytecode is translated into "more optimal" assembly on the fly. Just... don't bother.

like image 182
Tomasz Nurkiewicz Avatar answered Nov 15 '22 19:11

Tomasz Nurkiewicz


The dup instruction simply duplicates the top element of the operand stack. If the compiler knows that it's going to use a value multiple times within a relatively short span, it can choose to duplicate the value and hold it on the operand stack until needed.

One of the most common cases where you see dup is when you create an object and store it in a variable:

Foo foo = new Foo();

Running javap -c, you get the following bytecode:

0:  new #1; //class Foo
3:  dup
4:  invokespecial   #23; //Method "<init>":()V
7:  astore_1

In English: the new operation creates a new instance of the Foo object, and the invokespecial executes the Foo constructor. Since you need the reference on the stack to invoke the constructor and also to store in the variable, it makes a lot of sense to use dup (especially since the alternative, storing in the variable and then retrieving to run the ctor, could violate the Java memory model).

Here's a case where where the Oracle Java Compiler (1.6) didn't use dup when I would expect it to:

int x = 12;

public int bar(int z)
{
    int y = x + x * 3;
    return y + z;
}

I'd expect the compiler to dup the value of x, since it appears multiple times in the expression. Instead, itemitted code that repeatedly loaded the value from the object:

0:  aload_0
1:  getfield    #12; //Field x:I
4:  aload_0
5:  getfield    #12; //Field x:I
8:  iconst_3
9:  imul
10: iadd

I would have expected the dup because it's relatively expensive to retrieve a value from an object (even after Hotspot does its magic), whereas two stack cells are likely to be on the same cache line.

like image 44
parsifal Avatar answered Nov 15 '22 18:11

parsifal


If you're worried about the impact of dup and its relations on performance, don't bother. The JVM does just in time compilation, so it shouldn't actually make any difference in performance.

As far as quality of code, there are two main things that will cause Javac to generate dup instructions. The first is object instantiation, where it is unavoidable. The second is certain uses of immediate values in expressions. If you see a lot of the later, it could be poor quality code, since you don't usually want complicated expressions like that in your source code (it's less readable).

The other versions of dup (dup_x1, dup_x2, dup2, dup2_x1, and dup2_x2) are especially problematic since object instantiation doesn't use those, so it almost certainly means the later. Of course even then it's not a huge problem. All it means is that the source code isn't as readable as it could be.

If the code isn't compiled from Java, all bets are off. The presence or absence of instructions doesn't really tell you much, especially in the languages whose compilers perform compile time optimization.

like image 42
Antimony Avatar answered Nov 15 '22 18:11

Antimony