Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to decompile Java bytecode back to original generic type parameters

I know Java compiler replace all type parameters in generic types with their bounds or Object if the type parameters are unbounded during the process of Type Erasure. The produced machine bytecode would reflect the replaced bounds or Object.

Is there a way to take the resulted machine bytecode and decompile it back to a Java file that contains the original type parameters in generic types? Does there exist a decompiler that can achieve this? Or this process is simply irreversible due to the nature of compiling process?

like image 287
OLIVER.KOO Avatar asked Aug 31 '17 21:08

OLIVER.KOO


2 Answers

You are correct that, at the bytecode level, much information gets lost when you define and interact with generic types. Type erasure was nice for preserving compatibility: if you mostly enforce type safety at compile time, you don't need to do much at runtime, so you can reduce generic types to their 'raw' equivalents.

And that's the key: compile time verification. If you want the flexibility and type safety of generics, your compiler has to know a lot about the generic types you interact with. In many cases, you won't have the source code for those classes, so it has to get the information from somewhere. And it does: metadata. Embedded in the .class file alongside the bytecode is wealth of information: everything the compiler needs to know you're using generic library types safely. So what kind of generics information gets preserved?

Type variables and constraints

The most basic thing a compiler needs to know in order to consume a generic type is the list of type variables. For any generic type or generic method, the names and positions of the type variables are preserved. Moreover, any constraints (upper or lower bounds) get included as well.

Generic supertype signatures

Sometimes you write a class that extends a generic class or implements a generic interface. If you write a StringList that extends ArrayList<String>, you inherit a lot of functionality. If someone wants to use your StringList as intended and without the source code, it's not enough for the compiler to know that you extended ArrayList; it has to know you extended ArrayList<String>. This applies transitively up the hierarchy: it has to know ArrayList<> extends AbstractList<>, and so on. So this information gets preserved. Your class file a will include the complete generic signatures of any generic supertypes (classes or interfaces).

Member signatures

The compiler can't verify that you're using a generic type correctly if it doesn't know the full generic types of fields, method parameters and return types. So, you guessed it: that information gets included. If any part of a class member contains a generic type, wildcard, or type variable, that member will get its signature information saved in the metadata.

Local variables

It's not necessary to preserve information about local variable types in order to consume a type. It can be useful for debugging, but that's about it. There are metadata tables that can be used to record the names and types of variables, and the bytecode ranges at which they exist. Depending on the compiler, they may or may not be written by default. You can force javac to emit them by passing -g:vars, but I believe they're omitted by default

Call sites

One of the biggest issues for decompilers, mostly affecting generic inference within method bodies, is that call sites invoking generic methods retain no information about type arguments. That creates huge headaches for APIs like Java 8 Streams, where generic operators get chained together, each one accepting anonymously typed lambdas (which may be contravariant in their argument types and covariant in their return types). That's a type inference nightmare, but it's an issue for any code that happens to interact with generics. That kind of code doesn't become substantially harder to decompile simply because it exists within a generic type.

How this affects decompilation

Modern Java decompilers like Procyon and CFR should be able to reconstruct generic types reasonably well. If the local variable metadata is available, the results should be pretty close to the original code. If not, they'll have to try to infer generic type arguments in method bodies based on data flow analysis. Essentially, the decompiler must look at what data flows in and out of generic instantiations, and use what it knows about the type of that data to guess the type arguments. Sometimes it works really well; other times, not so much (see earlier comment about Java 8 Streams).

At the API level, though—type and member signatures—the results should be spot-on.

Caveats

Strictly speaking, all of the metadata described here is optional: it's only needed at compile time (or decompile time). If someone has run their compiled classes through an obfuscator, optimizer, or some other utility, all of this information could get stripped out. It won't make a difference at runtime.

tldr; Conclusion

Yes, it is certainly possible to decompile generic types and methods with their type parameters intact. Assuming the required metadata is present, getting the type and member signatures right is the 'easy' part. Correctly inferring the type arguments of generic instances and method invocations is the tricky bit, but that's a problem for any code that happens to interact with generics.

As mentioned, Procyon and CFR should both do a pretty decent job of restoring generic types and methods.

like image 119
Mike Strobel Avatar answered Oct 08 '22 23:10

Mike Strobel


That depends mostly on whether the code has been obfuscated. While it is true that generics use type erasure, compilers typically include source level information such as generic types as metadata in the classfile for various reasons - reflection, debugging, compilation against closed source libraries, etc.

So for a well behaved classfile, it should be possible to get the information back. Whether there are any off the shelf tools for this, I don't know. A lot of decompilers do try to recover generic types, but I don't know how reliable they are.

If the code has been obfuscated, then all the metadata will be stripped out, so there is no hope of recovering the original generic types.

like image 20
Antimony Avatar answered Oct 08 '22 22:10

Antimony