Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unzip nested zip files in python

Tags:

python

zip

I am looking for a way to unzip nested zip files in python. For example, consider the following structure (hypothetical names for ease):

  • Folder
    • ZipfileA.zip
      • ZipfileA1.zip
      • ZipfileA2.zip
    • ZipfileB.zip
      • ZipfileB1.zip
      • ZipfileB2.zip

...etc. I am trying to access text files that are within the second zip. I certainly don't want to extract everything, as the shear numbers would crash the computer (there is several hundred zips in the first layer, and almost 10,000 in the second layer (per zip)).

I have been playing around with the 'zipfile' module - I am able open the 1st level of zipfiles. E.g.:

zipfile_obj = zipfile.ZipFile("/Folder/ZipfileA.zip")
next_layer_zip = zipfile_obj.open("ZipfileA1.zip")

However, this returns a "ZipExtFile" instance (not a file or zipfile instance) - and I can't then go on and open this particular data type. That I can't do this:

data = next_layer_zip.open(data.txt)

I can however "read" this zip file file with:

next_layer_zip.read()

But this is entirely useless! (i.e. can only read compressed data/goobledigook).

Does anyone have any ideas on how I might go about this (without using ZipFile.extract)??

I came across this, http://pypi.python.org/pypi/zip_open/ - which looks to do exactly what I want, but it doesn't seem to work for me. (keep getting "[Errno 2] No such file or directory:" for the files I am trying to process, using that module).

Any ideas would be much appreciated!! Thanks in advance

like image 474
djmac Avatar asked Aug 13 '12 08:08

djmac


1 Answers

For those looking for a function that extracts a nested zip file (any level of nesting) and cleans up the original zip files:

import zipfile, re, os

def extract_nested_zip(zippedFile, toFolder):
    """ Unzip a zip file and its contents, including nested zip files
        Delete the zip file(s) after extraction
    """
    with zipfile.ZipFile(zippedFile, 'r') as zfile:
        zfile.extractall(path=toFolder)
    os.remove(zippedFile)
    for root, dirs, files in os.walk(toFolder):
        for filename in files:
            if re.search(r'\.zip$', filename):
                fileSpec = os.path.join(root, filename)
                extract_nested_zip(fileSpec, root)
like image 178
ronnydw Avatar answered Oct 15 '22 19:10

ronnydw