Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List files in .7z, .rar and .tar archives using PHP

I want to list the files inside an archive, without their extraction.

The types of archives I am interested in:

  • .7z (7-Zip)
  • .rar (WinRAR)
  • .tar (POSIX, e.g. GNU tar).
  • .zip (ISO standard, e.g. WinZip)

For .zip files, I have been able to achieve this:

<?php
    $za = new ZipArchive();
    $za->open('theZip.zip');
    for ($i = 0; $i < $za->numFiles; $i++) {
        $stat = $za->statIndex($i);
        print_r(basename($stat['name']) . PHP_EOL);
    }
?>

However, I have not managed to do the same for .7z files. Haven’t tested .rar and .tar, but will need them as well.

like image 641
M.M Avatar asked Jul 21 '16 17:07

M.M


4 Answers

Arnuld's comment is a clue as to the most practical way to solve the problem. Even if you can find implementations of all the possible archive types you want to support acessible to PHP, only ZIP and gzip are natively supported by a PHP extension. The remainder will either be native PHP code or shell out to invoke a standalone binary. The former will be a bit of a performance/resource bottleneck while the latter will be dependant on your underlying platform.

(BTW, unless you trust the users completely with access to your server, or are a relatively good programmer, you're going to have to do more inspection of the content than just listing what is inside an uploaded archive).

Once you've collected a ragtag assortment of utilities, and audited the code to a reasonable level, you should then be decorating the implementation with a uniform API to ensure that your glue code doesn't turn into spaghetti.

If it were me, I'd start from scratch implementing an interface like that for PHP's zip around the standalone binaries; PHP is, after all, a scripting language. That you will be applying this to user uploaded files is no reason not to use an existing, native code implementation, indeed, the security consideration is a strong argument for such an approach.

Remember to watch outfor zip bombs.

like image 152
symcbean Avatar answered Oct 22 '22 02:10

symcbean


This is something that has come up before (for various reasons like this and this and the one with broken links in the answer).

Generally the prevailing opinion at the moment is to create a wrapper (either DIY or use a library) which relies on having a 7-zip binary (executable) to be accessible on the server and wrap calls to the binary using exec(), rather than a pure PHP solution.

Since the 7zip format supports a variety of compression algorithms, I'm assuming that you probably want a pure PHP implementation of reading/decompressing the LZMA format. While there are LZMA SDKs available for C, C++, C# and Java and someone has made a PHP Extension for LZMA2 (and a fork for LZMA) as yet even though there has even been interest on the 7-zip forums for quite a while, no one seems to have ported this over as a comprehensive PECL extension or pure PHP yet.

Depending on your needs & motivation, this leaves you with:

  • add the 7-zip binary to your server, and use a wrapper library, be it your own or someone else's
  • install and use an unofficial PECL extension
  • bravely port the LZMA SDK to PHP yourself (and hopefully contribute it back to open source!)

For other formats you can look to the PHP documentation for examples and details on usage:

  • .rar has its own official PECL extension
  • .tar can be extracted by the Phar PECL extention (also see SO for examples)
  • .zip has an official PECL extension
  • .gz has an official PECL exension
  • and a couple of other formats

Since all of these involve PECL extensions, if you're limited by your webhost in some way and need pure PHP solutions for this, it might be easier to just shift to a more amenable webhost.

To attempt to protect against zip bombs, you can look at the compression ratios as suggested by this answer (packed size divided by unpacked size and treat anything over a certain threshold as invalid), although the zip bomb talked about the answer to one of the linked questions would indicate that this can be ineffective against multi-layered zip bombs. For those you would need to look at whether or not the files you're listing are archives as well, ensuring you're not doing any kind of recursive extraction and then treat archives that contain archives as invalid.

For completeness, some usage examples for official PECL extensions:

RAR:

<?php
// open the archive file
$archive = RarArchive::open('archive.rar');
// make sure it's valid
if ($archive === false) return;
// retrieve a list of entries in the archive
$entries = $archive->getEntries();
// make sure the entry list is valid
if ($entries === false) return;
// example output of entry count
echo "Found ".count($entries)." entries.\n";
// loop over entries
foreach ($entries as $e) {
    echo $e->getName()."\n";
}
// close the archive file
$archive->close();
?>

TAR:

<?php
// open the archive file
try {
    $archive = new PharData('archive.tar');
}
// make sure it's valid
catch (UnexpectedValueException $e) {
    return;
}
// make sure the entry list is valid
if ($archive->count() === 0) return;
// example output of entry count
echo "Found ".$archive->count()." entries.\n";
// loop over entries (PharData is already a list of entries in the archive)
foreach ($archive as $entry) {
    echo $entry."\n";
}
// no need to close a PharData
?>

ZIP (adapted from OP's question which is from here):

<?php
// open the archive file
$archive = new ZipArchive;
$valid = $archive->open('archive.zip');
// make sure it's valid (if not ZipArchive::open() returns various error codes)
if ($valid !== true) return;
// make sure the entry list is valid
if ($archive->numFiles === 0) return;
// example output of entry count
echo "Found ".$archive->numFiles." entries.\n";
// loop over entries
for ($i = 0; $i < $archive->numFiles; $i++) {
    $e = $archive->statIndex($i);
    echo $e['name']."\n";
}
// close the archive file (redundant as called automatically at the end of the script)
$archive->close();
?>

GZ:

Since gz (gnu Zlib) is a compression mechanism rather than an archive format, this is different in PHP. If you open a .gz file by itself (rather than treating it like a .tar) with gzopen(), any reads from it are transparently decompressed. Since this is most commonly .tar.gz, you can treat it like a .tar as above (also see this answer on another question). Or you can extract the tar with PharData::decompress() as in this answer on another question.

like image 37
Leith Avatar answered Oct 22 '22 04:10

Leith


I think this class might help you

Code sample from the link

// Open an archive.
$archive = new SevenZipArchive('docs.7z');

// Show number of contained files:
print $archive->count() . " file(s) in archive\n";

// Show info about the first contained file:
$entry = $archive->get(0);
print 'First file name: ' . $entry['Name'] . "\n";

// Iterate over all the contained files in archive, and dump all their info:
foreach ($archive as $entry) {
    print_r($entry);
}

Update
As promised in my comments and OP asked for one way to check uploaded files against bomb, here is a link that describe it. It is a ClamAV® is an open source antivirus engine for detecting trojans, viruses, malware & other malicious threats source antivirus.

From ClamavNet site I found this information

Whenever a file exceeds ArchiveMaxCompressionRatio (see clamd.conf man page), it’s considered a logic bomb and marked as Oversized.zip . Try increasing your ArchiveMaxCompressionRatio setting.

That said my experience with uploading files comes from typically trusted users. Zip bombs or any other threats, if I was you, I will research it first and find out how a zip bombs/any other threats works, this will help you preventing it through extra coding or solution.

Further more depending on your business size, budget and how critical your web-app is, it is a good idea to make a kind of strategy, policy and roles on your site, which describes the usage of your web-app. A part of that is files uploading policy, like which type of files allowed to be uploaded, what is the maximum size, who can upload and accepting your disclaimer where you mentioning these stuffs etc. that policy should be reflected as guideline to audience using your web-app services.

Here is few links about zip bombs:

  • How does one make a Zip bomb?
  • How can I protect myself from a zip bomb?
  • https://en.wikipedia.org/wiki/Zip_bomb
like image 30
Maytham Avatar answered Oct 22 '22 02:10

Maytham


Try this

<?php

$x = exec("7z l ./test.zip | awk '/^[0-9]{4}-/{print}'", $l);
foreach($l as $r)
{
    $e = explode(" ", $r);
    $c = count($e)-1;
    echo $e[$c]."\n";
}
?>
like image 20
Labradorcode Avatar answered Oct 22 '22 02:10

Labradorcode