I use this code to create a .zip with a list of files:
ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(zipFile));
for (int i=0;i<srcFiles.length;i++){
String fileName=srcFiles[i].getName();
ZipEntry zipEntry = new ZipEntry(fileName);
zos.putNextEntry(zipEntry);
InputStream fis = new FileInputStream(srcFiles[i]);
int read;
for(byte[] buffer=new byte[1024];(read=fis.read(buffer))>0;){
zos.write(buffer,0,read);
}
fis.close();
zos.closeEntry();
}
zos.close();
I don't know how the zip algorithm and the ZipOutputStream works, if it writes something before I read and send to 'zos' all of the data, the result file can be different in size of bytes than if I choose another buffer size.
in other words I don't know if the algorithm is like:
READ DATA-->PROCESS DATA-->CREATE .ZIP
or
READ CHUNK OF DATA-->PROCESS CHUNK OF DATA-->WRITE CHUNK IN .ZIP-->| ^-----------------------------------------------------------------------------------------------------------------------------
If this is the case, what buffer size is the best?
Update:
I have tested this code, changing the buffer size from 1024 to 64, and zipping the same files: with 1024 bytes the 80 KB result file was 3 bytes smaller than with 64 bytes buffer. Which is the best buffer size to produce the smallest .zip in the fatest time?
Short answer: I would pick something like 16k.
Long answer:
ZIP is using the DEFLATE algorithm for compression (http://en.wikipedia.org/wiki/DEFLATE). Deflate is a flavor of Ziv Lempel Welch(search wikipedia for LZW). DEFLATE uses LZ77 and Huffman coding.
This is a dictionary compression, and as far as I know from the algorithm standpoint the buffer size used when feeding the data into the deflater should have almost no impact. The biggest impact for LZ77 are dictionary size and sliding window, which are not controlled by the buffer size in your example.
I think you can experiment with different buffer sizes if you want and plot a graph, but I am sure you will not see any significant changes in compression ratio (3/80000 = 0.00375%).
The biggest impact the buffer size has is on the speed due to the amount of overhead code that is executed when you make the calls to FileInputStream.read and zos.write. From this point of view you should take into account what you gain and what you spend.
When increasing from 1 byte to 1024 bytes, you lose 1023 bytes (in theory) and you gain a ~1024 reduction of the overhead time in the .read and .write methods. However when increasing from 1k to 64k, you are spending 63k which reducing the overhead 64 times.
So this comes with diminishing returns, thus I would choose somewhere in the middle (let's say 16k) and stick with that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With