Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract files from zip without keep the top-level folder with python zipfile

I'm using the current code to extract the files from a zip file while keeping the directory structure:

zip_file = zipfile.ZipFile('archive.zip', 'r')
zip_file.extractall('/dir/to/extract/files/')
zip_file.close()

Here is a structure for an example zip file:

/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg

At the end I want this:

/dir/to/extract/file.jpg
/dir/to/extract/file1.jpg
/dir/to/extract/file2.jpg

But it should ignore only if the zip file has a top-level folder with all files inside it, so when I extract a zip with this structure:

/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg
/dir2/file.txt
/file.mp3

It should stay like this:

/dir/to/extract/dir1/file.jpg
/dir/to/extract/dir1/file1.jpg
/dir/to/extract/dir1/file2.jpg
/dir/to/extract/dir2/file.txt
/dir/to/extract/file.mp3

Any ideas?

like image 253
xsquirrel Avatar asked Dec 31 '11 18:12

xsquirrel


2 Answers

If I understand your question correctly, you want to strip any common prefix directories from the items in the zip before extracting them.

If so, then the following script should do what you want:

import sys, os
from zipfile import ZipFile

def get_members(zip):
    parts = []
    # get all the path prefixes
    for name in zip.namelist():
        # only check files (not directories)
        if not name.endswith('/'):
            # keep list of path elements (minus filename)
            parts.append(name.split('/')[:-1])
    # now find the common path prefix (if any)
    prefix = os.path.commonprefix(parts)
    if prefix:
        # re-join the path elements
        prefix = '/'.join(prefix) + '/'
    # get the length of the common prefix
    offset = len(prefix)
    # now re-set the filenames
    for zipinfo in zip.infolist():
        name = zipinfo.filename
        # only check files (not directories)
        if len(name) > offset:
            # remove the common prefix
            zipinfo.filename = name[offset:]
            yield zipinfo

args = sys.argv[1:]

if len(args):
    zip = ZipFile(args[0])
    path = args[1] if len(args) > 1 else '.'
    zip.extractall(path, get_members(zip))
like image 84
ekhumoro Avatar answered Sep 27 '22 21:09

ekhumoro


Read the entries returned by ZipFile.namelist() to see if they're in the same directory, and then open/read each entry and write it to a file opened with open().

like image 44
Ignacio Vazquez-Abrams Avatar answered Sep 27 '22 21:09

Ignacio Vazquez-Abrams