Not sure why this is happening, but when I run unzip a file (e.g. apache-groovy-binary-2.4.7.zip) at the command line...
rwx-r-xr-x
rwxr-xr-x
or rw-r--r--
But when I run zipfile.extractall()
from a Python 2.7 script on the same file...
rwx-r-x---
rw-r----
- even the ones that should be executables as per above.My umask
setting is 0027
—this partly explains what's going on, but why is the executable bit being removed from all files?
What's the easiest fix to get Python adopting similar behaviour to the command-line version (apart from shelling out, of course!)?
The ZIP file format is a common archive and compression standard. This module provides tools to create, read, write, append, and list a ZIP file. Any advanced use of this module will require an understanding of the format, as defined in PKZIP Application Note.
with ZipFile(file_name, 'r') as zip: Here, a ZipFile object is made by calling ZipFile constructor which accepts zip file name and mode parameters. We create a ZipFile object in READ mode and name it as zip.
In the zipfile module, you'll find the ZipFile class. This class works pretty much like Python's built-in open() function, allowing you to open your ZIP files using different modes. The read mode ( "r" ) is the default. You can also use the write ( "w" ), append ( "a" ), and exclusive ( "x" ) modes.
To unzip it first create a ZipFile object by opening the zip file in read mode and then call extractall() on that object i.e. It will extract all the files in zip at current Directory. If files with same name are already present at extraction location then it will overwrite those files.
The reason for this can be found in the _extract_member()
method in zipfile.py
, it only calls shutil.copyfileobj()
which will write the output file without any execute bits.
The easiest way to solve this is by subclassing ZipFile
and changing extract()
(or patching in an extended version. By default it is:
def extract(self, member, path=None, pwd=None):
"""Extract a member from the archive to the current working directory,
using its full name. Its file information is extracted as accurately
as possible. `member' may be a filename or a ZipInfo object. You can
specify a different directory using `path'.
"""
if not isinstance(member, ZipInfo):
member = self.getinfo(member)
if path is None:
path = os.getcwd()
return self._extract_member(member, path, pwd)
This last line should be changed to actually set the mode based on the original attributes. You can do it this way:
import os
import sys
from zipfile import ZipFile, ZipInfo
class MyZipFile(ZipFile):
if sys.version_info < (3, 6):
def extract(self, member, path=None, pwd=None):
if not isinstance(member, ZipInfo):
member = self.getinfo(member)
if path is None:
path = os.getcwd()
ret_val = self._extract_member(member, path, pwd)
attr = member.external_attr >> 16
os.chmod(ret_val, attr)
return ret_val
else:
def _extract_member(member, ZipInfo):
if not isinstance(member, ZipInfo):
member = self.getinfo(member)
path = super(ZipFile, self)._extract_member(member, targetpath, pwd)
if member.external_attr > 0xffff:
os.chmod(path, member.external_attr >> 16)
return path
with MyZipFile('test.zip') as zfp:
zfp.extractall()
(The above is based on Python 3.5 and assumes the zipfile is called test.zip
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With