Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What scala statements or code can produce a byte-code which can not be translated to java?

I have read an answer to a question about converting Scala code to Java code. It says:

I don't think it's possible to convert from scala back to standard java since Scala does some pretty low-level byte-code manipulation. I'm 90% sure they do some things that can't exactly be translated back into normal Java code.

So what Scala statements or code can produce bytecode which can not be translated to java?

P.S. I generally agree with that answer, but want a concrete example for learning purposes.

like image 505
Cherry Avatar asked Jul 04 '14 04:07

Cherry


4 Answers

The answer really depends on how hard you want to try to convert the code.

Since Java and Scala are both turing complete, any program in one can trivially be converted to the other, but this isn't really interesting or useful.

What you really want is to convert the results to readable, idiomatic code. From this perspective, even Java code can't automatically be converted to Java because compilation loses information (though relatively little compared to C) and machines aren't as good as humans at writing human readable code anyway.

If you got a Java and Scala expert, they could probably rewrite your Scala codebase in Java and end up with reasonably idiomatic Java code. But it wouldn't be as readable as Scala due to the simple fact that Scala is a language designed to improve on Java. Scala tries to remove the warts from Java and provide powerful high level programming features, removing the need for all the classic Java boilerplate. So the Java equivalent codebase will not be as readable.

From this perspective, the answer is "any feature in Scala that is not in Java".

like image 192
Antimony Avatar answered Sep 30 '22 05:09

Antimony


Scala's nested blocks do not have a Java equivalent.

Nested block in Scala (taken from this question):

def apply(x: Boolean) = new Tuple2(null, {
  while (x) { }
  null
})

Produces the bytecode

 0: new           #12                 // class scala/Tuple2
 3: dup           
 4: aconst_null   
 5: iload_1       
 6: ifne          5
 9: aconst_null   
10: invokespecial #16                 // Method scala/Tuple2."<init>":(Ljava/lang/Object;Ljava/lang/Object;)V
13: areturn   

At instruction 0 an uninitialised object is pushed onto the stack, and then initialised at instruction 10. Between these two points there is a backwards jump from 6 to 5. This actually reveals a bug in the OpenJDK bytecode verifier as it rejects this code despite the fact that it is acceptable by the JVM specifications. This probably got through testing as this bytecode can't be generated from Java.

As in Java nested blocks are not expressions that evaluate to a value then the the closest Java equivalent would be

public Tuple2 apply(boolean x){
  while(x){}
  return new Tuple2(null,null);
}

Which would compile to something akin to

 0: iload_1
 1: ifne          0
 3: new           #12                 // class scala/Tuple2
 6: dup
 7: aconst_null
 8: dup
 9: invokespecial #16                 // Method scala/Tuple2."<init>":(Ljava/lang/Object;Ljava/lang/Object;)V
12: areturn

Note that this doesn't have the uninitialised object on the stack at the time of the backwards jump. (N.B. bytecode was written by hand, do not execute!)


This paper from Li, White, and Singer shows differences in JVM languages including the bytecode that they compile to. It finds that in an N-gram analysis of bytecodes that 58.5% of 4-grams executed by Scala are not found in bytecode executed by Java. This is not to say that Java can't produce these bytecodes, but that they weren't present in the Java corpus.

like image 23
ggovan Avatar answered Sep 30 '22 06:09

ggovan


As you noted, Scala eventually compiles to JVM bytecode. An obvious instruction from the JVM instruction set, that has no equivalent in the Java language, is goto.

A Scala compiler might use goto for instance to optimize loops or tail-recursive methods. In this case, in Java you would have to emulate the behavior of a goto.

As Antimony hinted, a Turing complete language can at least emulate another Turing complete language. However the resulting program may be heavyweight and suboptimal.

As a final note, decompilers may help. I'm not familiar with the intrinsics of decompilers, but I assume that they rely a lot on patterns. I mean, for example, Java source pattern f(x) compiles to Bytecode pattern f'(x), so with a lot of hard work and experience, some manage to decompile Bytecode f'(y) to Java source f(y).

However, I've not heard of Scala decompilers yet (maybe someone's working on that).

[EDIT] About what I originally meant by emulating the behavior of a goto:

I had in mind switch/case statements inside a loop, and cdshines showed another way by using labeled break/continue in a loop (though I believe that using "disregarded and condemned" features is not standard).

In either of these cases, in order to jump back to an earlier instruction, an idiomatic Java loop (for/while/do-while) is required (any other suggestion?). An endless loop makes it easy to implement, a conditional loop would require more work, but I assume this is doable.

Also, goto isn't limited to loops. In order to jump forward, Java would require other contructs.

A counterexample: in C, there are limitations but you don't have to go through such great lengths, because there's a goto instruction.

As a related topic, if you're interested in non-idiomatic jumps in Scala, c.f. this old Q&A of mine. My point being, not only a Scala compiler might emit goto in a way that's not natural in Java, but a developer can have a tight control on that with the help of Scala macros.

LabelDef: A labelled expression. Not expressible in language syntax, but generated by the compiler to simulate while/do-while loops, and also by the pattern matcher. In my past tests, it could be used for forward jumps as well. In Scala Internals, developers wrote about to removing LabelDef, but I don't know if and when they would.

Therefore, yes you can reproduce the behavior of goto in Java, but because of the complexity involved in so doing, that is not what I would call standard Java, IMHO. Maybe my wording is incorrect, but in my mind the reproduction of an elementary behavior by complex means is an "emulation" of that behavior.

Cheers.

like image 44
eruve Avatar answered Sep 30 '22 06:09

eruve


It really depends on how you define So what Scala statements or code can produce bytecode which can not be translated to java?.

Ultimately, some of the scala features are backed by the so named ScalaSignature (scala signature) that stores meta information. As of 2.10, it may be deemed as a secret api which is abstracted by the scala reflection mechanisms (which are radically different from java reflection). The documentation is scarce, but you can check out this pdf to get the details (there could be major changes since then). There is no way to produce identical structures in native java, unless you're fallback bytecode manipulation tools.

In a more relaxed sense, there are macroses and implicts which interact solely with a scalac and have no direct analog in java. Yes, you can write java code, identical to result produced by scalac, but you can't write this dynamic instructions that will direct compiler.

like image 43
om-nom-nom Avatar answered Sep 30 '22 07:09

om-nom-nom