Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird behaviour with ZipArchive() adding null bytes to archive

I have a simple Zip creation script that copies a load of files into a single directory and then creates a .zip file from that directory. This approach sounds simple, however the archives it produces have issues opening.

At first I was confused based on the fact that the archives open fine in things like 7Zip, WinRar et cetera et cetera. However, we're failing using Windows built in archive opener. To rule out any issues with my main server since it uses Nginx+PHPfpm + Fedora 16 I also tested on a more standard server using Apache and mod_php running on an Ubuntu server.

In both cases the issue was the same: the archive would always open fine in a pure zip, too, but failed in the Windows version. After some random digging I came up with the idea of opening the file in Notepad++ to check its initial headers.

Turns out that Ziparchive() is doing 2 things it shouldn't be doing.

The first problem is simple: it's including the full path as a null path in the archive. It shouldn't be, but it is. This may be due to my recursion though so I can live with this bit enter image description here

The second issue is the big problem that's causing the files not to open. It's prepending a null byte at the very start of the archive. All I have to do is manually open the file in Notepad++, delete the byte and then save it, and voila: the file opens in everything including Windows built in no problems at all.

enter image description here **

I've never encountered this before and a quick Google finds many things/issues with Ziparchive() but I couldn't find anything specific like this.

Here is my Zip create method:

private function zipcreate($source, $destination) {
        if (!extension_loaded('zip') || !file_exists($source)) {
            return false;
        }
        $zip = new ZipArchive();
        if (!$zip->open($destination, ZIPARCHIVE::CREATE)) {
            return false;
        }
        $source = str_replace('\\', '/', realpath($source));
        if (is_dir($source) === true) {
            $files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($source), RecursiveIteratorIterator::SELF_FIRST);
            foreach ($files as $file) {
                $file = str_replace('\\', '/', realpath($file));
                if( in_array(substr($file, strrpos($file, '/')+1), array('.', '..')) )
                    continue;

                $file = realpath($file);

                if (is_dir($file) === true) {
                    $zip->addEmptyDir(str_replace($source . '/', '', $file . '/'));
                } else if (is_file($file) === true) {
                    $zip->addFromString(str_replace($source . '/', '', $file), file_get_contents($file));
                }
            }
        } else if (is_file($source) === true) {
            $zip->addFromString(basename($source), file_get_contents($source));
        }
        return $zip->close();
    }

Called by doing:

$this->zipcreate($newdirpath, getcwd()."/$siteid-CompliancePack.zip");

For reference phpinfo() on primary server:

phpinfo() for primary server

As requested the first 60 bytes of the file in hex

[root@sid tmp]# od --format=x1 --read-bytes=60 54709-CompliancePack.zip
0000000 50 4b 03 04 14 00 00 00 08 00 39 4e 92 45 59 28
0000020 27 b3 37 53 00 00 00 f2 00 00 36 00 00 00 47 45
0000040 4e 45 52 41 4c 5f 4e 65 77 20 53 69 74 65 20 20
0000060 48 6f 77 61 72 74 68 20 54 69 6d 62
0000074
[root@sid tmp]#

NEW DEVELOPMENT :) So I thought I'd try something completely different! I ran up a WAMP stack on my Windows desktop (I normally test and develop exclusively on linux).

I ran the portal site on the Windows machine reading data from the Linux primary server exactly the same as the live portal site does (only difference is live portal runs on Linux!)

This time the file created perfectly the difference is 1 byte! This is exactly the same code as running on live running against the same back end server the only difference is the user server (portal) code is running on a Windows server rather than Linux.

Windows FileLinux File

The file is created by the back end server as a zip then base64-encoded and returned via Nusoap to the portal server. Which then streams the file direct to the client browser with the following code. The SitesClass.downloadCompliancePack is just a method that moves all the files into a temp folder then runs the zipcreate method above so nothing magical.

$result = $client->call('SitesClass.downloadCompliancePack', array('appusername' => 'xxx','apppassword' => 'xxx','apikey' => 'xxx','siteid' => 54709));
    // Display the result
    header('Content-type: application/octet-stream');
    header('Content-disposition: attachment; filename="54709-CompliancePack.zip"');
    $base = json_decode($result[2]);
    echo base64_decode($base->FileData);

So now I'm even more confused as a simple base64_decode shouldn't differ between windows and linux.

UPDATE Jan 2015

Sorry for the delay all those that have posted/helped so far I've been a little busy and only just got around to looking at this!

I've done some testing based off information posted below and I've narrowed the point of failure down! I now know exactly the bit of code responsible for it. See the screenshot below. enter image description here

The hex output in the text area is created by the following code.

<?php
// get configuration
include "system/config.php";
include "pages/pageclasses/carbon.class.php";
//////////// document action ///////////////
$sid = $_GET['sid'];
// Pull in the NuSOAP code
require_once('lib/nusoap.php');
// Create the client instance
$client = new nusoap_client($api_link); // using nosoap_client
// Call the SOAP method
$result = $client->call('SitesClass.downloadCompliancePack', array('appusername' => $api_username,'apppassword' => $api_password,'apikey' => $api_key,'siteid' => $sid));
// Display the result
$base = json_decode($result[2]);
echo "<textarea>".bin2hex(trim(base64_decode($base->FileData)))."</textarea>";
?>

The other block of code that shows 20 (Hex Space) before it is this

<?php
// get configuration
include "system/config.php";
include "pages/pageclasses/carbon.class.php";
//////////// document action ///////////////
$sid = $_GET['sid'];
// Pull in the NuSOAP code
require_once('lib/nusoap.php');
// Create the client instance
$client = new nusoap_client($api_link); // using nosoap_client
// Call the SOAP method
$result = $client->call('SitesClass.downloadCompliancePack', array('appusername' => $api_username,'apppassword' => $api_password,'apikey' => $api_key,'siteid' => $sid));
// Display the result
header('Content-type:application/octet-stream');
header('Content-disposition:attachment;filename="'.$sid.'-CompliancePack.zip"');
$base = json_decode($result[2]);
echo trim(base64_decode($base->FileData));
?>

In both cases the code is ran on the same front end web server (linux) and the same back end/web services server (linux) The only difference is that one outputs the file data to a textarea the other outputs the file data to the browser in a direct stream.

Both code blocks are the entire file contents and there are no spaces before the php opening or after the php closing and just to be on the safe side none in the header() and none at the end of any line.

So now I'm at the rather odd situation of this block of code appearing to add a random space in to the file before its streamed

header('Content-type:application/octet-stream');
header('Content-disposition:attachment;filename="'.$sid.'-CompliancePack.zip"');
echo trim(base64_decode($base->FileData));
like image 476
Dave Avatar asked Dec 11 '14 09:12

Dave


2 Answers

About your first issue I would suggest to use

  • ZipArchive::addGlob http://php.net/manual/en/ziparchive.addglob.php or
  • ZipArchive::addPattern() http://php.net/manual/en/ziparchive.addpattern.php

These functions has additional arguments to manipulate the filenames:

"remove_path"

Prefix to remove from matching file paths before adding to the archive.

And they seem to be doing the filesystem traversal job also.

The null characters maybe related to the paths.

They mention here an old bug relating to old files: http://grokbase.com/t/php/php-bugs/094pkepf54/48048-new-empty-files-corrupt-zip I don't really think it is relevant, but maybe it worth a try to remove the empty files. (for a test.)

like image 125
Lajos Veres Avatar answered Nov 09 '22 03:11

Lajos Veres


Regarding the additional byte at the beginning of your file:

The file seems ok, when generated on the server. Obviously the problem comes from the transfer/encoding process.

Check the scripts where you actually serve the file. For example when your server script looks like this:

_<?php readfile('zipfile.zip');

and you have a space (indicated by underscore) or any other character at the beginning of your script, it will be part of the output.

If the character is not part of your script, check included scripts that might break your output.

Update according to new code samples:

Try to clean the output buffer before sending the binary data to the browser:

header('Content-type:application/octet-stream');
header('Content-disposition:attachment;filename="'.$sid.'-CompliancePack.zip"');
ob_clean();
echo trim(base64_decode($base->FileData));
like image 2
Gerd K Avatar answered Nov 09 '22 01:11

Gerd K