Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to process zip file with Python

Tags:

python

a.zip---
      -- b.txt
      -- c.txt
      -- d.txt

Methods to process the zip files with Python,

I could expand the zip file to a temporary directory, then process each txt file one bye one

Here, I am more interested to know whether or not python provides such a way so that I don't have to manually expand the zip file and just simply treat the zip file as a specialized folder and process each txt accordingly.

like image 545
q0987 Avatar asked Sep 23 '11 19:09

q0987


People also ask

Can Python access zipped files?

Python can work directly with data in ZIP files. You can look at the list of items in the directory and work with the data files themselves.

How do I read a zip folder in Python?

zip" # opening the zip file in READ mode with ZipFile(file_name, 'r') as zip: # printing all the contents of the zip file zip. printdir() # extracting all the files print('Extracting all the files now...') zip. extractall() print('Done! ')


2 Answers

The Python standard library helps you.

Doug Hellman writes very informative posts about selected modules: https://pymotw.com/3/zipfile/

To comment on Davids post: From Python 2.7 on the Zipfile object provides a context manager, so the recommended way would be:

import zipfile
with zipfile.ZipFile("zipfile.zip", "r") as f:
    for name in f.namelist():
        data = f.read(name)
        print name, len(data), repr(data[:10])

The close method will be called automatically because of the with statement. This is especially important if you write to the file.

like image 123
rocksportrocker Avatar answered Oct 02 '22 23:10

rocksportrocker


Yes you can process each file by itself. Take a look at the tutorial here. For your needs you can do something like this example from that tutorial:

import zipfile
file = zipfile.ZipFile("zipfile.zip", "r")
for name in file.namelist():
    data = file.read(name)
    print name, len(data), repr(data[:10])

This will iterate over each file in the archive and print out its name, length and the first 10 bytes.

The comprehensive reference documentation is here.

like image 29
David Heffernan Avatar answered Oct 02 '22 22:10

David Heffernan