Suppose we compress for example a .txt file that has 7 bytes size. After compression and convert to .zip file, the size will be 190 bytes. Is there a way to estimate or compute the approximate size of “overhead”? What factor affects the overhead size? The Zlib compute the overhead: They said: “... only expansion is an overhead of five bytes per 16 KB block (about 0.03%), plus a one-time overhead of six bytes for the entire stream.” I just put this site to tell that it's possible to estimate the "overhead" size. Note: Overhead is some amount of extra data added into the compressed version of the data.

From the ZIP format .. Assuming that there is only one central directory and no comments and no extra fields, the overhead should be similar to the following. (The overhead will only go up if any additional metadata is added.) <ul> <li>Per file (Local file header) - 30+len(filename)</li> <li>Per file (Data descriptor) - 12 (to 16)</li> <li>Per file (Central directory header) - 46+len(filename)</li> <li>Per archive (EOCD) - 22</li> </ul> So the overhead, where <code>afn</code> is the average length of all file names, and <code>f</code> is the number of files: <pre class="prettyprint"><code> f * ((30 + afn) + 12 + (46 * afn)) + 22 = f * (88 + 2 * afn) + 22 </code></pre> This of course makes ZIP a very poor choice for very tiny bits of compressed data where a (file) structure or metadata is not required - zlib, on the other hand, is a very thin Deflate wrapper. For small payloads, a poor Deflate implementation may also result in a significantly larger "compressed" size, such as the notorious .NET implementation .. <hr> Examples: <ul> <li> Storing 1 file, with name "hello world note.txt" (len = 20), <code>= 1 * (88 + 2 * 20) + 22 = 150</code> bytes overhead </li> <li> Storing 100 files, with an average name of 14 letters, <code>= 100 * (88 + 2 * 14) + 22 = 11622</code> bytes overhead </li> </ul>

How can we estimate “overhead” of a compressed file?

1 Answers

From the ZIP format ..

Assuming that there is only one central directory and no comments and no extra fields, the overhead should be similar to the following. (The overhead will only go up if any additional metadata is added.)

Per file (Local file header) - 30+len(filename)
Per file (Data descriptor) - 12 (to 16)
Per file (Central directory header) - 46+len(filename)
Per archive (EOCD) - 22

So the overhead, where afn is the average length of all file names, and f is the number of files:

  f * ((30 + afn) + 12 + (46 * afn)) + 22
= f * (88 + 2 * afn) + 22

This of course makes ZIP a very poor choice for very tiny bits of compressed data where a (file) structure or metadata is not required - zlib, on the other hand, is a very thin Deflate wrapper.

For small payloads, a poor Deflate implementation may also result in a significantly larger "compressed" size, such as the notorious .NET implementation ..

Examples:

Storing 1 file, with name "hello world note.txt" (len = 20),

= 1 * (88 + 2 * 20) + 22 = 150 bytes overhead
Storing 100 files, with an average name of 14 letters,

= 100 * (88 + 2 * 14) + 22 = 11622 bytes overhead

139

answered Sep 18 '22 04:09

user2864740

Related questions
                            
                                How to read the contents of a .zip file with VBScript without actually extracting the files?
                            
                                How to create ZipArchive correctly?
                            
                                Create .jar files deterministically (identical each time)
                            
                                Ruby / rubyzip alternative capable of handling rar/tar/zip/7z? [closed]
                            
                                python unzip -- tremendously slow?
                            
                                Java: searching inside zips inside zips
                            
                                PHP zip_open() and php://temp, can't seem to open
                            
                                JArchive::create for Joomla 2.5?
                            
                                Which zipping library should I use to properly assemble a valid XLSX file in Objective-C?
                            
                                Julia: Extract Zip files within a Zip file
                            
                                Ant build classpath jar generates "error in opening zip file"
                            
                                Rubyzip vs native OS compression

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can we estimate “overhead” of a compressed file?

Tags:

compression

zip

overhead

overhead-minimization

user3184352

People also ask

1 Answers

user2864740

Recent Activity

Donate For Us