Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java bytecode: types of local variables?

Tags:

java

jvm

bytecode

According to this article http://slurp.doc.ic.ac.uk/pubs/observing/linking.html#assignment:

Due to the differences in information between Java code and bytecode (bytecode does not contain the types of local variables), the verifier does not need to check subtypes for assignments to local variables, or to parameters.

My question: Why does the bytecode not contain type information for local variables, whilst it does indeed contain type information for the parameters and return value?

like image 677
JB2 Avatar asked Apr 14 '13 16:04

JB2


1 Answers

First off, there are several different notions of type. There are the compile time types, which include generics. However, generics don't exist after compile time.

There is the verification inferred static type of a variable, which can be int, float, long, double, returnaddress, or an object reference. Object references are additionally typed with an upper bound, so that all references are subtypes of java/lang/String for instance. Fields can additionally have one of the short types: byte, short, char, or boolean. These are treated identically to ints for execution purposes but have different storage.

Finally, there is the runtime type, which is the same as the verified static type, but in the case of object references, represents the actual type of the instance being referenced. Note that due to verifier laziness, there are some cases where the runtime type may not actually be a subtype of the verified type. For instance, a variable of declared type Comparable can actually hold any object in Hotspot because the VM doesn't check interfaces at verification time.

Compile time information is not preserved except through optional attributes for reflection and debugging. This is because there's no reason to keep it.

Local variables have no explicit type information (except for the new StackMapTable attribute, but that's a technicality). Instead, when the class is loaded, the bytecode verifier infers a type for each value by running a static dataflow analysis. The purpose of this is not to catch bugs like compile time type checking might, because it is assumed that the bytecode already went through such checking at compile time.

Instead, the purpose of verification is to ensure that the instructions are not dangerous to the VM itself. For example, it needs to make sure that you aren't taking an integer and interperting it as an object reference, because that could lead to arbitrary memory access and hacking the VM.

So while bytecode values don't have explicit type information, they do have an implicit type which is the result of static type inference. The details of this vary based on the internal implementation details of each VM, though they are supposed to follow the JVM standard. But you'll only have to worry about that in handwritten bytecode.

Fields have an explicit type since the VM needs to know which type of data is being stored in it. Method parameters and return types are encoded in what is known as a method descriptor, also used in type checking. They're impossible to infer automatically because these values can come from or go anywhere, while type checking is done on a per class basis.

P.S. I left out a few minor details when talking about the verification types. Object types additionally track whether they have been initialized or not, and which instruction created them if uninitialized. Address types track the target of the jsr that created them.

like image 180
Antimony Avatar answered Sep 29 '22 01:09

Antimony