The legacy project I am working on includes some external library in a form of set of binary jar files. We decided that for analysis and potential patching, we want to receive sources of this library, use them to build new binaries and after detailed and long enough regression testing switch to these binaries.
Assume that we have already retrieved and built the sources (I am actually in planning phase). Before real testing, I would like to perform some "compatibility checks" to exclude possibility that the sources represent something dramatically different from what is in the "old" binaries.
Using the javap
tool I was able to extract the version of JDK used for compilation (at least I believe it is the version of JDK). It says, the binaries were built using major version 46 and minor 0. According to this article it maps to JDK 1.2.
Assume that the same JDK would be used for sources compilation.
The question is: Is there a reliable and possibly effective method of verification if both of these binaries are built from the same sources? I would like to know if all method signatures and class definitions are identical and if most or maybe all of method implementations are identical/similar.
The library is pretty big, so I think that detailed analysis of decompiled binaries may be not an option.
I suggest a multi-stage process:
Apply the previously suggested Jardiff or similar to see if there are any API differences. If possible, pick a tool that has an option for reporting private methods etc. In practice, any substantial implementation change in Java is likely to change some methods and classes, even if the public API is unchanged.
If you have an API match, compile a few randomly selected files with the indicated compiler, decompile the result and the original class files, and compare the results. If they match, apply the same process to larger and larger bodies of code until you either find a mismatch, or have checked everything.
Diffs of decompiled code are more likely to give you clues about the nature of the differences, and are easier to filter for non-significant differences, than the actual class files.
If you get a mismatch, analyze it. It may be due to something you do not care about. If so, try to construct a script that will delete that form of difference and resume the compile-and-compare process. If you get widespread mismatches, experiment with compiler parameters such as optimization. If adjustments to the compiler parameters eliminate the differences, continue with the bulk comparison. The objective in this phase is to find a combination of compiler parameters and decompiled code filters that produce a match on the sample files, and apply them to bulk comparison of the library.
If you cannot get a reasonably close match in the decompiled code, you probably do not have the right source code. Even so, if you have an API match it may be worth building your system and running your tests using the result of the compilation. If your tests run at least as well with the version you built from source, continue work using it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With