Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is decompression of pack200 deterministic and identical on all platforms?

Tags:

java

I would like to distribute my 20-jar application as pack200 files, but I also need to provide file checksums for the sake of validation.

Because I am paranoid (thank you, JWS), I would like to also have checksums on decompressed files.

Is decompression of pack200 deterministic and giving identical results on all platforms (Win/Mac/Linux cross 32/64 bit)?

In other words, can I decompress the files on one computer, compute their checksums, and expect them to always be identical if decompressed at other computers?

EDIT: Thanks for the comments. I am looking for some hard specification to confirm or deny this.

Making assumptions (even based on testing on a few machines) means risk.

Implementations may vary across platforms and Java versions. Even the same implementation can give different results (thinking of order of items in ZIP directory?). That's why I ask whether it's the same for all platforms and Java versions AND deterministic.


If this cannot be confirmed or denied, how about this follow-up question. How can I verify that after decompression a jar is valid? Thinking of half-finished files, gamma rays corrupting single bits in the file and whatnot.

like image 906
Konrad Garus Avatar asked May 27 '11 11:05

Konrad Garus


2 Answers

Think that's what you're looking for.

...However, for any given Pack200 archive, every decompressor is required to produce a particular byte-wise image for each class file transmitted. This requirement is placed on decompressors in order to make it possible for compressors to transmit information, such as message digests, which relates to the eventual byte-wise contents of transmitted class files. This section describes the restrictions placed on every decompressor that makes the byte-wise contents of its output files a well-defined function of its input.

This means you can do what you want to do here. JNF/Pack200 works by taking out constants that are shared across classes and intelligently compressing the .class files - but this portion of the standard says that while it COULD be possible to reconstruct class files several ways, this would lead to not being able to verify these files with digests. To avoid that issue, Pack200 explicitly specifies how decoding should work - so while the output .class files may not be identical to the input .class files, every Pack200 decompressor's outputted .class files will match every other Pack200 decompressor's output .class files.

So your best bet is to Pack 'em with Pack200, unpack them, then do MD5 or comparable digest algorithm, and use that to verify the unpacked files.

Hope that answers your question!

like image 170
Travis Avatar answered Nov 16 '22 04:11

Travis


I am looking for some hard specification to confirm or deny this.

@Travis's answer says that the reconstructed class files are not byte-for-byte identical to the original class files, and this (obviously) means that the JAR files won't be identical either.

Furthermore, none of the documentation says that unpack200 will produce identical JAR files across all platforms, and I wouldn't expect it to. (For a start, different platforms will be running different versions of unpack200 ...)

If this cannot be confirmed or denied, how about this follow-up question. How can I verify that after decompression a jar is valid? Thinking of half-finished files, gamma rays corrupting single bits in the file and whatnot.

I don't think there's a way to do this either. If we assume that regenerated JAR files may be platform dependent, then we've no baseline to generate a checksum from.

I think your best bet is to send a high quality checksum of the pack200 file, and trust that the unpack200 will either work correctly or will set a non-zero exit code when it fails ... like any correctly written utility should do.

BTW, if you are that worried about random errors, how are you going to detect "cosmic ray" effects when the JVM loads code from the JAR files? The sensible approach is to use ECC memory, etc and leave this to the hardware to deal with.

like image 38
Stephen C Avatar answered Nov 16 '22 04:11

Stephen C