Not so much of a coding problem here, but a general question relating to security. I'm currently working on a project that allows user submitted content. A key part of this content is the user uploads a Zip file. The zip file should contain only mp3 files.
I then unzip those files to a directory on the server, so that we can stream the audio on the website for users to listen to.
My concern is that this opens us up for some potentially damaging zip files. I've read about 'zipbombs' in the past, and obviously don't want a malicious zip file causing damage.
So, is there a safe way of doing this? Can i scan the zip file without unzipping it first, and if it contains anything other than MP3's delete it or flag a warning to the admin?
If it makes a difference i'm developing the site on Wordpress. I currently use the built in upload features of wordpress to let the user upload the zip file to our server (i'm not sure if there's any form of security within wordpress already to scan the zip file?)
Code, only extract MP3s from zip, ignore everthing else
$zip = new ZipArchive();
$filename = 'newzip.zip';
if ($zip->open($filename)!==TRUE) {
exit("cannot open <$filename>\n");
}
for ($i=0; $i<$zip->numFiles;$i++) {
$info = $zip->statIndex($i);
$file = pathinfo($info['name']);
if(strtolower($file['extension']) == "mp3") {
file_put_contents(basename($info['name']), $zip->getFromIndex($i));
}
}
$zip->close();
I would use use something like id3_get_version
(http://www.php.net/manual/en/function.id3-get-version.php) to ensure the contents of the file is mp3 too
Is there a reason they need to ZIP the MP3s? Unless there's a lot of text frames in the ID3v2 info in the MP3s, the file size will actually increase with the ZIP due to storage of the dictionary.
As far as I know, there isn't any way to scan a ZIP without actually parsing it. The data are opaque until you run each bit through the Huffman dictionary. And how would you determine what file is an MP3? By file extension? By frames? MP3 encoders have a loose standard (decoders have a more stringent spec) which makes it difficult to scan the file structure without false negatives.
Here are some ZIP security risks:
So, either do a lot of scrubbing and integrity checks, or at the very least use PHP to scan the archive; check each file for its MP3-ness (however you do that - extension and the presence of MP3 headers? You can't rely on them being at byte 0, though. http://en.wikipedia.org/wiki/MP3#File_structure) and deflated file size (http://www.php.net/manual/en/function.zip-entry-filesize.php). Bail out if an inflated file is too big, or if there are any non-MP3s present.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With