Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

zipfile cant handle some type of zip data?

I came up over this problem while trying to decompress a zip file.

-- zipfile.is_zipfile(my_file) always returns False, even though the UNIX command unzip handles it just fine. Also, when trying to do zipfile.ZipFile(path/file_handle_to_path) I get the same error

-- the file command returns Zip archive data, at least v2.0 to extract and using less on the file it shows:

PKZIP for iSeries by PKWARE Length Method Size Cmpr Date Time CRC-32 Name 2113482674 Defl:S 204502989 90% 2010-11-01 08:39 2cee662e myfile.txt 2113482674 204502989 90% 1 file

Any ideas how can I go around this issue ? It would be nice if I could make python's zipfile work since I already have some unit tests that I'll have to drop if I'll switch to running subprocess.call("unzip")

like image 725
hyperboreean Avatar asked Feb 07 '11 15:02

hyperboreean


People also ask

Can Python read ZIP files?

Python can work directly with data in ZIP files. You can look at the list of items in the directory and work with the data files themselves. This recipe is a snippet that lists all of the names and content lengths of the files included in the ZIP archive zipfile. zip .

How do I read a ZIP file in pandas?

Method #1: Using compression=zip in pandas. read_csv() method. By assigning the compression argument in read_csv() method as zip, then pandas will first decompress the zip and then will create the dataframe from CSV file present in the zipped file.

Which statement successfully creates a ZIP file using the ZIP file module in Python?

with ZipFile(file_name, 'r') as zip: Here, a ZipFile object is made by calling ZipFile constructor which accepts zip file name and mode parameters. We create a ZipFile object in READ mode and name it as zip.


2 Answers

Run into the same issue on my files and was able to solve it. I'm not sure how they were generated, like in the above example. They all had trailing data in the end ignored by both Windows by 7z and failing python's zipfile.

This is the code to solve the issue:

def fixBadZipfile(zipFile):  
     f = open(zipFile, 'r+b')  
     data = f.read()  
     pos = data.find('\x50\x4b\x05\x06') # End of central directory signature  
     if (pos > 0):  
         self._log("Truncating file at location " + str(pos + 22) + ".")  
         f.seek(pos + 22)   # size of 'ZIP end of central directory record' 
         f.truncate()  
         f.close()  
     else:  
         # raise error, file is truncated  
like image 94
Uri Cohen Avatar answered Oct 22 '22 04:10

Uri Cohen


You say using less on the file it shows such and such. Do you mean this?

less my_file

If so, I would guess these are comments that the zip program put in the file. Looking at a user guide for the iSeries PKZIP I found on the web, this appears to be the default behavior.

The docs for zipfile say "This module does not currently handle ZIP files which have appended comments." Perhaps this is the problem? (Of course, if less shows them, this would seem to imply that they're prepended, FWIW.)

It appears you (or whoever created the zipfile on an iSeries machine) can turn this off with ARCHTEXT(*NONE), or use ARCHTEXT(*CLEAR) to remove it from an existing zipfile.

like image 35
Tom Zych Avatar answered Oct 22 '22 03:10

Tom Zych