Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursively unzip archives, store (filename, extracted-contents) in dictionary

Could you please help me write a function returning:

dict("file1.txt": list(<contents of file1>),
     "file2.txt": list(<contents of file2>),
     "file3.txt": list(<contents of file3>),
     "file4.txt": list(<contents of file4>))

On input:

    file.zip:
        outer\
        outer\inner1.zip:
                file1.txt
                file2.txt
        outer\inner2.zip:
                file3.txt
                file4.txt

My attempts (with exceptions below):

  • http://ideone.com/s1tyb

    WindowsError: [Error 32] The process cannot access the file because it is being used by another process

  • http://ideone.com/Y2oTw

    "File is not a zip file"

  • http://ideone.com/0HoGa

    "File is not a zip file"

  • http://ideone.com/owmdK

    AttributeError: ZipFile instance has no attribute 'seek'

like image 415
user1438003 Avatar asked Jun 05 '12 17:06

user1438003


People also ask

How do I unzip a recursive file in Linux?

-r Option: To zip a directory recursively, use the -r option with the zip command and it will recursively zips the files in a directory.

How do I unzip an archive in Python?

To unzip a file in Python, use ZipFile. The extractall() method takes a path, members, pwd as an argument and extracts all the contents.

How do I unzip a nested zip file in Python?

Unzipping with the zipfile module As you did for zipping, for unzipping you first have to create an object of ZipFile class. However unlike zipping, for unzipping the first parameter is the path to the zipped file and the second parameter is the file permission which should be “r” (reading) in case of unzipping.

Which module can help extract all of the files from a zip file?

extractall() method will extract all the contents of the zip file to the current working directory. You can also call extract() method to extract any file by specifying its path in the zip file. This will extract only the specified file.


2 Answers

Finally worked it out... with a bit of help from: Extracting a zipfile to memory?;

from zipfile import ZipFile, is_zipfile

def extract_zip(input_zip):
    input_zip=ZipFile(input_zip)
    return {name: input_zip.read(name) for name in input_zip.namelist()}

def extract_all(input_zip): 
    return {entry: extract_zip(entry) for entry in ZipFile(input_zip).namelist() if is_zipfile(entry)}
like image 158
user1438003 Avatar answered Nov 15 '22 06:11

user1438003


Modified your code (You should close ZipFile before deleting it + added extraction of inner zip files):

import os
import shutil
import tempfile
from zipfile import ZipFile

def unzip_recursively(parent_archive):
    parent_archive = ZipFile(parent_archive)
    result = {}
    tmpdir = tempfile.mkdtemp()
    try:
        parent_archive.extractall(path=tmpdir)
        namelist=parent_archive.namelist()
        for name in namelist[1:]:
            innerzippath = os.path.join(tmpdir, name)
            inner_zip = ZipFile(innerzippath)
            inner_extract_path = innerzippath+'.content'
            if not os.path.exists(inner_extract_path):
                os.makedirs(inner_extract_path)
            inner_zip.extractall(path=inner_extract_path)

            for inner_file_name in inner_zip.namelist():
                result[inner_file_name] = open(os.path.join(inner_extract_path, inner_file_name)).read()
            inner_zip.close()
    finally:
        shutil.rmtree(tmpdir)
    return result

if __name__ == '__main__':
    print unzip_recursively('file.zip')
like image 38
Arseniy Avatar answered Nov 15 '22 07:11

Arseniy