Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk Compress (Zip) files

Use: Our users have many objects in our AWS S3 account. We are adding a feature to download entire projects at once. We are more concerned with efficiency than with storage.

After looking at different options (ZipArchive, PclZip) I came across this guide recommending the use of Chilkat.

It's method makes a lot of sense, and summarized it is as follows:

  • Prezip each file on upload and store it in S3
  • "Project Download" starts downloading each compressed file then QuickAppend (Chilkat terminology) which then "instantly" (200ms per file) adds them to the overall compressed file
  • Upload new Zip file to S3, provide link

The issue I am running into is a license for Chilkat is $249, and I am looking for free alternatives.

An alternative (also free) uses a similar concept:

  • Prezip each file on upload and store it in S3
  • "Project Download" starts downloading each compressed file then tar's them together
  • Upload new Zip file to S3, provide link

Is there a "standard" or "ideal" way for dealing with this?

like image 451
Kerry Jones Avatar asked Mar 14 '14 12:03

Kerry Jones


People also ask

How do I compress multiple zip files?

To place multiple files into a zip folder, select all of the files while hitting the Ctrl button. Then, right-click on one of the files, move your cursor over the “Send to” option and select “Compressed (zipped) folder”.


1 Answers

On my local system PHP's built-in zip library is able to merge a 10 file 24MB zip into a 21 file 51MB zip in about 800ms, which is comparable to the 200ms/file you reported but I'm not sure how large your files are or what type of hardware you're using.

Unlike the Java library that the author of your guide initially used, PHP's zip library is implemented in C, so you won't see the same Java to C performance gains that the author saw. Having said that, I don't know how Chillkat's QuickAppend works or how it compares to PHP's zip library but appending to pre-zipped files whether you do it with PHP or Chillkat does seem to be the fastest solution.

$destination = new ZipArchive;
$source = new ZipArchive;

if($source->open('a.zip') === TRUE 
&& $destination->open('b.zip') === TRUE) {

    $time_start = microtime(true);

    $temp_dir = "/tmp/zip_" . time();        
    mkdir($temp_dir,0777,true);
    $source->extractTo($temp_dir);
    $source->close();

    $files = scandir($temp_dir);
    $file_count = 0;

    foreach($files as $file) {
        if($file == '.' || $file == '..')
          continue;

        $destination->addFile("$temp_dir/$file");
        ++$file_count;
    }

    $destination->close();
    exec("rm -rf $temp_dir &");

    $time_end = microtime(true);
    $time = $time_end - $time_start;

    print "Added $file_count files in " . ($time * 1000). "ms \n";    
}

Output

-rw-rw-r-- 1 fuzzytree fuzzytree 24020997 Jun  4 15:57 a.zip
-rw-rw-r-- 1 fuzzytree fuzzytree 51418980 Jun  4 15:57 b.zip

fuzzytree@atlas:~/testzip$ php zip.php 
Added 10 files in 872.43795394897ms

fuzzytree@atlas:~/testzip$ ls -ltr *zip
-rw-rw-r-- 1 fuzzytree fuzzytree 24020997 Jun  4 15:57 a.zip
-rw-rw-r-- 1 fuzzytree fuzzytree 75443030 Jun  4 15:57 b.zip
like image 199
FuzzyTree Avatar answered Oct 01 '22 22:10

FuzzyTree