Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Estimating zip size/creation time

I need to create ZIP archives on demand, using either Python zipfile module or unix command line utilities.

Resources to be zipped are often > 1GB and not necessarily compression-friendly.

How do I efficiently estimate its creation time / size?

like image 226
ohnoes Avatar asked Apr 20 '09 10:04

ohnoes


People also ask

How do I determine the size of a ZIP file?

When you open a ZIP-file with the archive manager, it tells you the size of the contained files. If you want to know how much all or some contained files are, just mark them (to mark all files: CTRL+A) and take a look at the bar on the bottom.

How much time does gzip take?

gzip compression adds about 0.001 seconds to compress, and 0.0003 seconds to decompress (let's round up and say 0.002 total), but you only have to transmit 16kB, which takes 0.0032 seconds. Add them together, transfer with gzip compression is about twice as fast.

How much size does zip reduce?

Microsoft Windows provides a utility that allows you to zip multiple files into a single compressed file format. This is especially helpful if you are emailing files as attachments or if you need to conserve space (zipping files can reduce file size by up to 50%).

Why is zip taking so long?

A reason of the extremely slow unzipping on Windows can be Defender that runs in the background and scans each file. This usually happens when you try to unzip a file that was downloaded from an online storage (e.g. from Google Drive) or you received it as an email attachment.


2 Answers

Extract a bunch of small parts from the big file. Maybe 64 chunks of 64k each. Randomly selected.

Concatenate the data, compress it, measure the time and the compression ratio. Since you've randomly selected parts of the file chances are that you have compressed a representative subset of the data.

Now all you have to do is to estimate the time for the whole file based on the time of your test-data.

like image 193
Nils Pipenbrinck Avatar answered Sep 20 '22 13:09

Nils Pipenbrinck


I suggest you measure the average time it takes to produce a zip of a certain size. Then you calculate the estimate from that measure. However I think the estimate will be very rough in any case if you don't know how well the data compresses. If the data you want to compress had a very similar "profile" each time you could probably make better predictions.

like image 26
Skurmedel Avatar answered Sep 19 '22 13:09

Skurmedel