Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can the Java 8 compiler be forced into creating reproducible class files?

My employer has a business need to make Java builds byte-for-byte reproducible. I am aware of the difficulties in making JAR files reproducible (due to archiving order and time stamps), but at this point I’m talking about class files.

I have builds of the same code using Java 8u65, both on Mac and on Linux. The class files are binarily different. Both classes decompile back to the same source; to see the difference requires the javap disassembler.

The source code seems to be:

final TrustStrategy acceptingTrustStrategy =
              (X509Certificate[] chain, String authType) -> true;

On one build, the result is:

private static boolean lambda$restTemplate$38(java.security.cert.X509Certificate[], java.lang.String) throws java.security.cert.CertificateException;
        Code:
           0: iconst_1
           1: ireturn
     

On the other, it is:

private static boolean lambda$restTemplate$15(java.security.cert.X509Certificate[], java.lang.String) throws java.security.cert.CertificateException;
        Code:
           0: iconst_1
           1: ireturn

Anonymous lambdas are getting names with different numbers in them (lambda$restTemplate$15 versus lambda$restTemplate$38).

It appears that, when I rebuild on the same host, I get the same bytes. When the host differs, the numbers change; two Linux hosts produced different bytes.

What determines these numbers? Is there a way to force every compilation to use the same numbers in this place, and thus produce the same class files? Or is Java 8 class file compilation indeterministic?

like image 463
Robert Mandeville Avatar asked Mar 13 '19 20:03

Robert Mandeville


3 Answers

I haven't looked into it too much, but this article talks about reproducible builds in Java, and reproducible-builds has some tools to try to help making builds (and classes) reproducible.

The link you're probably looking for is the Reproducible Build Maven Plugin, made specifically for Java to try to "strip non-reproducible data from the generated artifacts".

like image 149
Major Avatar answered Sep 24 '22 18:09

Major


The counting of lambda expression is done by the compiler and increased as it encounters other lambda expressions.

If the files are read by the compiler in the same order, it should give the same compiled classes.

In any case, since you are building the code yourself, you could simply change the lambda expression to annonymous class declarations.

EDIT: I just noticed you indicated that the classes are built on two different OS. This can introduce difference in the compiling phase of your code. In order to have a reproducible build, it must be performed on the same architecture. Is there a reason you cannot deploy the artefacts as build on one architecture (either MacOS or Linux)?

like image 39
ebigeon Avatar answered Sep 24 '22 18:09

ebigeon


As mentioned in the DZone article, linked in Major's answer, for gradle this is all you need:

tasks.withType(AbstractArchiveTask) {
    preserveFileTimestamps = false
    reproducibleFileOrder = true
}

After adding this to build.gradle, the md5sum of the .jar file was stable between builds on the same system. I could not test with other systems because everyone I asked had different compiler versions, and that makes the build different.

like image 27
Luc Avatar answered Sep 24 '22 18:09

Luc