How do I check if a zip file is corrupt or not? I have a zip file with 10 jpg images. I am able to extract say 8 of the images. Two of the images in the zip are corrupt and I am not able to extract those. Is there a way to check for this in a Python script?
Ideally the best way to check if a zip is corrupted is to do a CRC check but this can take a long time especially if there is a lot of large zip files. I would be happy just to be able to do a quick file size or header check.
isdir() or a file with TarInfo. isfile() . Similarly you can determine whether a file is a zip file using zipfile. is_zipfile() .
Your code is basically OK, try to find out which file is responsible for the corrupted zip file. Check whether digitalFile. getFile() always returns a valid and accessible argument to FileInputStream. Just add a bit logging to your code and you will find out what's wrong.
This code will either throw an exception (if the zip file is really bad or if it's not a zip file), or show the first bad file in the zip file.
import os
import sys
import zipfile
if __name__ == "__main__":
args = sys.argv[1:]
print "Testing zip file: %s" % args[0]
the_zip_file = zipfile.ZipFile(args[0])
ret = the_zip_file.testzip()
if ret is not None:
print "First bad file in zip: %s" % ret
sys.exit(1)
else:
print "Zip file is good."
sys.exit(0)
You should, of course, enclose this stuff in proper try/except clauses. But that's the basics.
Use the zipfile
module testzip
function, see http://docs.python.org/library/zipfile.html#zipfile.ZipFile.testzip
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With