I have a fat/uber JAR generated by Gradle Shadow plugin. I often need to send the fat JAR over network and therefore, it is convenient for me to send only delta of the file instead of cca 40 MB of data. rsync is a great tool for this purpose. However, a small change in my source code leads to a large change in final fat JAR and consequently rsync is not helping as much as it could.
Can I convert the fat JAR to rsync-friendly JAR?
My ideas of a solution/workarounds:
Possibly related questions:
There are two ways to do this both of which involve turning compression off. Gradle first then turn it off using the jar method...
You can do this using gradle (this answer actually came from the OP)
shadowJar {
zip64 true
entryCompression = org.gradle.api.tasks.bundling.ZipEntryCompression.STORED
exclude 'META-INF/*.RSA', 'META-INF/*.SF','META-INF/*.DSA'
manifest {
attributes 'Main-Class': 'com.my.project.Main'
}
}
with
jar {
manifest {
attributes(
'Main-Class': 'com.my.project.Main',
)
}
}
task fatJar(type: Jar) {
manifest.from jar.manifest
classifier = 'all'
from {
configurations.runtime.collect { it.isDirectory() ? it : zipTree(it) }
} {
exclude "META-INF/*.SF"
exclude "META-INF/*.DSA"
exclude "META-INF/*.RSA"
}
with jar
}
The key thing here is that compression has been turned off ie
org.gradle.api.tasks.bundling.ZipEntryCompression.STORED
You can find the docs here
https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/bundling/ZipEntryCompression.html#STORED
Yes you can speed it up by about 40% on a new archive and by more than 200% on a jar archive you've already rsync'd. The trick is to not compress the jar so you can take advantage of rsyncs chunking algorithm.
I used the following commands to compress a directory with a lot of class files...
jar cf0 uncompressed.jar .
jar cf compressed.jar .
This created the following two jars...
-rw-r--r-- 1 rsync jar 28331212 Apr 13 14:11 ./compressed.jar
-rw-r--r-- 1 rsync jar 38746054 Apr 13 14:10 ./uncompressed.jar
Note that the size of the uncompressed Jar is about 10MB larger.
I then rsync'd these files and timed them using the following commands. (Note, even turning on compression for the compressed file had little effect, I'll explain later).
Compressed Jar
time rsync -av -e ssh compressed.jar [email protected]:/tmp/
building file list ... done
compressed.jar
sent 28334806 bytes received 42 bytes 2982615.58 bytes/sec
total size is 28331212 speedup is 1.00
real 0m9.208s
user 0m0.248s
sys 0m0.483s
Uncompressed Jar
time rsync -avz -e ssh uncompressed.jar [email protected]:/tmp/
building file list ... done
uncompressed.jar
sent 11751973 bytes received 42 bytes 2136730.00 bytes/sec
total size is 38746054 speedup is 3.30
real 0m5.145s
user 0m1.444s
sys 0m0.219s
We have gained a speedup of nearly 50%. This at least speeds up the rsync and we get a good boost but what about subsequent rsyncs where a small change has been made.
I removed one class file from the directory that was 170 bytes in size recreated the jars mow they are this size..
-rw-r--r-- 1 rsycn jar 28330943 Apr 13 14:30 compressed.jar
-rw-r--r-- 1 rsync jar 38745784 Apr 13 14:30 uncompressed.jar
Now the timings are very different.
Compressed Jar
building file list ... done
compressed.jar
sent 12166657 bytes received 31998 bytes 2217937.27 bytes/sec
total size is 28330943 speedup is 2.32
real 0m5.435s
user 0m0.378s
sys 0m0.335s
Uncompressed Jar
building file list ... done
uncompressed.jar
sent 220163 bytes received 43624 bytes 175858.00 bytes/sec
total size is 38745784 speedup is 146.88
real 0m1.533s
user 0m0.363s
sys 0m0.047s
So we can speed up rsyncing large jar files a lot using this method. The reason for this is related to information theory. When you compress data it in effect removes everything that's common from the data ie what you're left with looks very much like random data, the best compressors remove more of this information. A small change to any of the data and most compression algorithms have a dramatic effect on the output of the data.
The Zip algorithm is effectively making it harder for rsync to find checksums that are the same between the server and client and this means it needs to transfer more data. When you uncompress it you're letting rsync do what it's good at, send less data to sync the two files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With