I have a zip file which contains three zip files in it like this:
zipfile.zip\
dirA.zip\
a
dirB.zip\
b
dirC.zip\
c
I want to extract all the inner zip files that are inside the zip file in directories with these names (dirA, dirB, dirC).
Basically, I want to end up with the following schema:
output\
dirA\
a
dirB\
b
dirC\
c
I have tried the following:
import os, re
from zipfile import ZipFile
os.makedirs(directory) # where directory is "\output"
with ZipFile(self.archive_name, "r") as archive:
for id, files in data.items():
if files:
print("Creating", id)
dirpath = os.path.join(directory, id)
os.mkdir(dirpath)
for file in files:
match = pattern.match(filename)
new = match.group(2)
new_filename = os.path.join(dirpath, new)
content = archive.open(file).read()
with open(new_filename, "wb") as outfile:
outfile.write(content)
But it only extracts the zip file and I end up with:
output\
dirA\
dirA.zip
dirB\
dirB.zip
dirC\
dirC.zip
Any suggestions including code-segments will be much appreciated cause I have tried so many different things and read the docs without success.
To do this, I did: setx PATH "%PATH%;C:\Program Files\7-Zip\" , restarted Command Prompt, then echo %PATH% to confirm it was added in, and finally unzip.
-r Option: To zip a directory recursively, use the -r option with the zip command and it will recursively zips the files in a directory. This option helps you to zip all the files present in the specified directory.
When extracting the zip file, you would want to write the inner zip files to memory instead of them on disk. To do this, I've used BytesIO
.
Check out this code:
import os
import io
import zipfile
def extract(filename):
z = zipfile.ZipFile(filename)
for f in z.namelist():
# get directory name from file
dirname = os.path.splitext(f)[0]
# create new directory
os.mkdir(dirname)
# read inner zip file into bytes buffer
content = io.BytesIO(z.read(f))
zip_file = zipfile.ZipFile(content)
for i in zip_file.namelist():
zip_file.extract(i, dirname)
If you run extract("zipfile.zip")
with zipfile.zip
as:
zipfile.zip/
dirA.zip/
a
dirB.zip/
b
dirC.zip/
c
Output should be:
dirA/
a
dirB/
b
dirC/
c
For a function that extracts a nested zip file (any level of nesting) and cleans up the original zip files:
import zipfile, re, os
def extract_nested_zip(zippedFile, toFolder):
""" Extract a zip file including any nested zip files
Delete the zip file(s) after extraction
"""
with zipfile.ZipFile(zippedFile, 'r') as zfile:
zfile.extractall(path=toFolder)
os.remove(zippedFile)
for root, dirs, files in os.walk(toFolder):
for filename in files:
if re.search(r'\.zip$', filename):
fileSpec = os.path.join(root, filename)
extract_nested_zip(fileSpec, root)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With