Can somebody please explain the following mystery?
I created a binary file of size ~37[MB]. zipping it in Ubuntu -- using the terminal -- took less than 1[sec]. I then tried python: zipping it programatically (using the zipfile module) took also about 1[sec].
I then tried to unzip the zip file I created. In Ubuntu -- using the terminal -- this took less than 1[sec].
In python, the code to unzip (used the zipfile module) took close to 37[sec] to run! any ideas why?
A reason of the extremely slow unzipping on Windows can be Defender that runs in the background and scans each file. This usually happens when you try to unzip a file that was downloaded from an online storage (e.g. from Google Drive) or you received it as an email attachment.
To unzip a file in Python, use the ZipFile. extractall() method. The extractall() method takes a path, members, pwd as an argument and extracts all the contents.
I was struggling to unzip/decompress/extract zip files with Python as well and that "create ZipFile object, loop through its .namelist(), read the files and write them to file system" low-level approach didn't seem very Python. So I started to dig zipfile objects that I believe not very well documented and covered all the object methods:
>>> from zipfile import ZipFile
>>> filepath = '/srv/pydocfiles/packages/ebook.zip'
>>> zip = ZipFile(filepath)
>>> dir(zip)
['NameToInfo', '_GetContents', '_RealGetContents', '__del__', '__doc__', '__enter__', '__exit__', '__init__', '__module__', '_allowZip64', '_didModify', '_extract_member', '_filePassed', '_writecheck', 'close', 'comment', 'compression', 'debug', 'extract', 'extractall', 'filelist', 'filename', 'fp', 'getinfo', 'infolist', 'mode', 'namelist', 'open', 'printdir', 'pwd', 'read', 'setpassword', 'start_dir', 'testzip', 'write', 'writestr']
There we go the "extractall" method works just like tarfile's extractall ! (on python 2.6 and 2.7 but NOT 2.5)
Then the performance concerns; the file ebook.zip is 84.6 MB (mostly pdf files) and uncompressed folder is 103 MB, zipped by default "Archive Utility" under MacOSx 10.5. So I did the same with Python's timeit module:
>>> from timeit import Timer
>>> t = Timer("filepath = '/srv/pydocfiles/packages/ebook.zip'; \
... extract_to = '/tmp/pydocnet/build'; \
... from zipfile import ZipFile; \
... ZipFile(filepath).extractall(path=extract_to)")
>>>
>>> t.timeit(1)
1.8670060634613037
which took less than 2 seconds on a heavy loaded machine that has 90% of the memory is being used by other applications.
Hope this helps someone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With