Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

checksums on zip files

Tags:

php

zip

checksum

I am currently working on a tool that uploads a group of files, then uses md5 checksums to compare the files to the last batch that were uploaded and tells you which files have changed.

For regular files this is working fine but some of the uploaded files are zip archives, which almost always have changed, even when the files inside it are the same.

Is there a way to perform a different type of checksum to check if these files have changed without having to unzip each one individually and then comparing the contents of each file individually.

Here is my current function

function check_if_changed($date, $folder, $filename)
{
  $dh = opendir('./wp-content/uploads/Base/');
  while (($file = readdir($dh)) !== false) {
    $folders[] = $file;
  }
  sort($folders);
  $position = array_search($date, $folders);
  $prev_folder = $folders[$position - 1];
  if ($prev_folder == '.' || $prev_folder == '..')
    { return true;}
  $newhash = md5_file('./wp-content/uploads/Base/'.$date.'/'.$folder.'/'.$filename);
  $oldhash = md5_file('./wp-content/uploads/Base/'.$prev_folder.'/'.$folder.'/'.$filename);
  if ($oldhash != $newhash){
    return true;
  }
  return false;
}
like image 349
Kit Barnes Avatar asked May 21 '12 16:05

Kit Barnes


1 Answers

Inside a zip archive, each "file" is stored with meta data like last modifcation time, filename, filesize in bytes, etc...and the important part - a crc32 checksum.

basically, you can just operate on the zip archive in a binary fashion, finding each file's meta data header and comparing the checksum to the previously stored checksums. You don't need to do any uncompressing to access the meta data in a zip archive. This would be extremely fast.

http://en.wikipedia.org/wiki/Zip_(file_format)

edit- actually, ZipArchive offers this functionality. See: http://www.php.net/manual/en/ziparchive.statindex.php

like image 142
goat Avatar answered Sep 19 '22 13:09

goat