Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Java break/label statements act as "goto"s in bytecode obfuscation?

Tags:

java

bytecode

I'm attempting to deobfuscate some Java .class files after decompiling them, and I've come across a part of the code where it's using labels in a way I don't think they can be used. I don't know if this is the decompiler's fault in misunderstanding the labels, or if the code was intentionally obfuscated this way. In other words, can labels be used this way in Java bytecode?

Note that the labels appear AFTER the related break statements, not before. It almost seems to be using them as a goto, rather than a label being used to go out of a loop. There are also no loops at all, so I'm a bit confused as to how they're supposed to be used here.

What's going on here? I've marked the 3 labels in comments (###)

if (i != 96)
  {
    if ((i ^ 0xFFFFFFFF) != -98)
    {
      if (i == 98)
        break label417;  // ### Here are the three breaks... The relevant labels appear later in the code
      if (i != 99)
        break label540;
      if (!bool)
        break label461;
    }
  }
  else
  {
    if (localwb == this.localWB5)
    {
      if (this.localWB4 != null) {
          this.localWB4.a((byte)-92, this);
        if (!bool);
      }
      else
      {
          this.localWB6.a((byte)-9, this);
      }
      return true;
    }
    if (localwb == this.localWB4)
    {
        this.localWB6.a((byte)-59, this);
      return true;
    }
    if (this.localWB3 != localwb)
      break label540;
      this.localWB2.a((byte)-38, this);
    return true;
  }
  if (this.localWB6 == localwb)
  {
    if (this.localWB4 != null) {
        this.localWB4.a((byte)-122, this);
      if (!bool);
    }
    else {
        this.localWB5.a((byte)-63, this);
    }
    return true;
  }
  if (this.localWB4 == localwb)
  {
    this.localWB5.a((byte)-22, this);
    return true;
  }
  if ((this.localWB2 == localwb) && (this.localWB3.M))
  {
    this.localWB3.a((byte)-84, this);
    return true;
    label417:  //  ### The first label.  Note how this next if-statement has inaccessible code... if the above if-statement is true, it would have already returned true;  However, the label appears after the return statement, almost as if the label is being used as a goto.
    if (localwb == this.localWB2)
    {
        this.localWB6.a((byte)-86, this);
      return true;
    }
    if (this.localWB3 == localwb)
    {
      this.localWB5.a((byte)-31, this);
      return true;
      label461:  //  ###  The second label
      if ((this.localWB6 == localwb) || (this.localWB4 == localwb))
      {
          this.localWB2.a((byte)-60, this);
        return true;
      }
      if (localwb == this.localWB5)
      {
        if (this.localWB3.M)
        {
          this.localWB3.a((byte)-44, this);
          if (!bool);
        }
        else {
            this.localWB2.a((byte)-9, this);
        }
        return true;
      }
    }
  }
  label540:  //  ###  The final label.
like image 359
Khalid Mahmoud Avatar asked Jan 24 '13 07:01

Khalid Mahmoud


3 Answers

The goto bytecode instruction (yes, it's actually called "goto") is used to implement break and other constructs.

The specification of goto itself only restricts the target to be within the same method as the goto instruction.

There are many other constraints that are defined in 4.10. Verification of class Files, specifically in Checking Code, which describes how the actual bytecode of a method is to be verified.

I suspect that you can't produce inconsistent interpretation of the local variables and operand stacks with goto, for example by requiring the target instruction to be compatible with the source instruction, but I the actual specification is written in Prolog and I'd be thankful if anyone got the relevant point where this is ensured.

like image 106
Joachim Sauer Avatar answered Oct 18 '22 09:10

Joachim Sauer


break <label> can be used to exit code blocks, like so:

public static boolean is_answer(int arg) {
    boolean ret = false;
    label: {
        if (arg != 42)
            break label;
        ret = true;
    }
    return ret;
}

However, the decompiled code that you show is not valid Java due to the following JLS requirement:

A break statement transfers control out of an enclosing statement.

like image 45
NPE Avatar answered Oct 18 '22 08:10

NPE


The problem stems from a mismatch between Java and bytecode. Java imposes a lot of restrictions that aren't present at the bytecode level. If all you're doing is decompiling a normal compiled Java classfile, this won't be a problem. However, obfuscators will usually rearrange the control flow of the method into an equivalent version that no longer corresponds to valid Java. A naive decompiler will get confused and just emit invalid Java, as you've seen.

If you are interested in decompiling obfuscated classfiles, you can try the open source Krakatau Decompiler I wrote. It is much smarter about trying to transform obfuscated bytecode back into valid Java, so it can often decompile classes which no other decompiler can. However, the resulting code will probably not be pretty even if it is valid, and the decompiler may still fail.

like image 26
Antimony Avatar answered Oct 18 '22 10:10

Antimony