Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python unzip -- tremendously slow?

Can somebody please explain the following mystery?

I created a binary file of size ~37[MB]. zipping it in Ubuntu -- using the terminal -- took less than 1[sec]. I then tried python: zipping it programatically (using the zipfile module) took also about 1[sec].

I then tried to unzip the zip file I created. In Ubuntu -- using the terminal -- this took less than 1[sec].

In python, the code to unzip (used the zipfile module) took close to 37[sec] to run! any ideas why?

like image 975
user3262424 Avatar asked Feb 14 '11 22:02

user3262424


People also ask

Why is it taking so long to unzip a file?

A reason of the extremely slow unzipping on Windows can be Defender that runs in the background and scans each file. This usually happens when you try to unzip a file that was downloaded from an online storage (e.g. from Google Drive) or you received it as an email attachment.

How to unzip files in Python?

To unzip a file in Python, use the ZipFile. extractall() method. The extractall() method takes a path, members, pwd as an argument and extracts all the contents.


1 Answers

I was struggling to unzip/decompress/extract zip files with Python as well and that "create ZipFile object, loop through its .namelist(), read the files and write them to file system" low-level approach didn't seem very Python. So I started to dig zipfile objects that I believe not very well documented and covered all the object methods:

>>> from zipfile import ZipFile
>>> filepath = '/srv/pydocfiles/packages/ebook.zip'
>>> zip = ZipFile(filepath)
>>> dir(zip)
['NameToInfo', '_GetContents', '_RealGetContents', '__del__', '__doc__', '__enter__', '__exit__', '__init__', '__module__', '_allowZip64', '_didModify', '_extract_member', '_filePassed', '_writecheck', 'close', 'comment', 'compression', 'debug', 'extract', 'extractall', 'filelist', 'filename', 'fp', 'getinfo', 'infolist', 'mode', 'namelist', 'open', 'printdir', 'pwd', 'read', 'setpassword', 'start_dir', 'testzip', 'write', 'writestr'] 

There we go the "extractall" method works just like tarfile's extractall ! (on python 2.6 and 2.7 but NOT 2.5)

Then the performance concerns; the file ebook.zip is 84.6 MB (mostly pdf files) and uncompressed folder is 103 MB, zipped by default "Archive Utility" under MacOSx 10.5. So I did the same with Python's timeit module:

>>> from timeit import Timer
>>> t = Timer("filepath = '/srv/pydocfiles/packages/ebook.zip'; \
...         extract_to = '/tmp/pydocnet/build'; \
...         from zipfile import ZipFile; \
...         ZipFile(filepath).extractall(path=extract_to)")
>>> 
>>> t.timeit(1)
1.8670060634613037

which took less than 2 seconds on a heavy loaded machine that has 90% of the memory is being used by other applications.

Hope this helps someone.

like image 148
kirpit Avatar answered Sep 22 '22 22:09

kirpit