a.zip---
-- b.txt
-- c.txt
-- d.txt
Methods to process the zip files with Python,
I could expand the zip file to a temporary directory, then process each txt file one bye one
Here, I am more interested to know whether or not python provides such a way so that I don't have to manually expand the zip file and just simply treat the zip file as a specialized folder and process each txt accordingly.
Python can work directly with data in ZIP files. You can look at the list of items in the directory and work with the data files themselves.
zip" # opening the zip file in READ mode with ZipFile(file_name, 'r') as zip: # printing all the contents of the zip file zip. printdir() # extracting all the files print('Extracting all the files now...') zip. extractall() print('Done! ')
The Python standard library helps you.
Doug Hellman writes very informative posts about selected modules: https://pymotw.com/3/zipfile/
To comment on Davids post: From Python 2.7 on the Zipfile object provides a context manager, so the recommended way would be:
import zipfile
with zipfile.ZipFile("zipfile.zip", "r") as f:
for name in f.namelist():
data = f.read(name)
print name, len(data), repr(data[:10])
The close
method will be called automatically because of the with statement. This is especially important if you write to the file.
Yes you can process each file by itself. Take a look at the tutorial here. For your needs you can do something like this example from that tutorial:
import zipfile
file = zipfile.ZipFile("zipfile.zip", "r")
for name in file.namelist():
data = file.read(name)
print name, len(data), repr(data[:10])
This will iterate over each file in the archive and print out its name, length and the first 10 bytes.
The comprehensive reference documentation is here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With