Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zip / 7zip Compression Differences

Tags:

zip

7zip

I have a number of zip files that I need to distribute to users, around 130 of them. Each zip file contains a number of similar text, html, xml, and jpg files. In total, the zip files total 146 megabytes; unzipped, their contents total 551mb.

I want to distribute all these files together to users in as small a format as possible. I looked into two different ways of doing it, each using two different compression schemes, zip and 7zip (which I understand is either LZMA or a variant thereof):

  1. Compress all the zip files into a compressed file and send that file (single.zip/7z)
  2. Compress the unzipped contents of the zip files into a compressed file and send that file (combined.zip/7z)

For example, say that I have 3 zip files, A.zip, B.zip and C.zip, each of which contains one text file, one html file, and one XML file. With method 1, a single compressed file would be created containing A.zip, B.zip and C.zip. With method 2, a single compressed file would be created containing A.txt, A.html, A.xml, B.txt, B.html, B.xml, C.txt, C.html, and C.xml.

My assumption was that under either compression scheme, the file generated by method 2 would be smaller or at least the same size as the file generated by method 1, as you might be able to exploit efficiencies by considering all the files together. At the very least, method 2 would avoid the overhead of multiple zip files.

The surprising results (the sizes of files generated by the 7zip tool) were as follows:

  1. single.zip - 142mb
  2. single.7z - 124mb
  3. combined.zip - 149mb
  4. combined.7z - 38mb

I'm not surprised that the 7zip format produced smaller files than the zip format (result 2/4 vs result 1/3), as it generally compresses better than zip. What was surprising was that for the zip format, compressing all 130 zip files together resulted in a smaller output file than compressing all their uncompressed contents (result 3 vs result 1).

Why is it more efficient to zip several zip files together, than to zip their unzipped contents together?

The only thing I can think of is that during compression, the 7zip format builds a dictionary across all the file contents, so it can exploit similarities between files, while the zip format builds the dictionary per-file. Is that true? And even that still doesn't explain why result 3 was 7mb larger than result 1.

Thanks for your help.

like image 806
Colen Avatar asked Feb 24 '14 15:02

Colen


People also ask

Should I compress to 7z or zip?

For the best compression rate, choose 7z. Compression level — the compression time increases with the compression level. The presets range from Store (fastest compression) to Ultra (slowest compression time with the most space saved).

What is the difference between zip and compress file?

in common parlance, they are the same. "Zip" invokes memories of a couple specific programs (gzip, pkzip and winzip in particular), but is colloquially equivalent to "compress".


1 Answers

  • Both .zip and .7z are lossless compression formats. .7z is newer and is likely to give you a better compression ratio, but it's not as widely supported as .zip, and I think it's somewhat more computationally expensive to compress/decompress.

  • The how much better is dependent on the types of files you are compressing but according to the wikipedia article on 7zip

In 2011, TopTenReviews found that the 7z compression was at least 17% better than ZIP,[15] and 7-Zip's own site has since 2002 reported that while compression ratio results are very dependent upon the data used for the tests, "Usually, 7-Zip compresses to 7z format 30–70% better than to zip format, and 7-Zip compresses to zip format 2–10% better than most other zip-compatible programs."[16]

like image 89
Anmol Tomer Avatar answered Oct 26 '22 13:10

Anmol Tomer