Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reading multiple files contained in a zip file with pandas

Tags:

I have multiple zip files containing different types of txt files. Like below:

zip1    - file1.txt   - file2.txt   - file3.txt 

How can I use pandas to read in each of those files without extracting them?

I know if they were 1 file per zip I could use the compression method with read_csv like below:

df = pd.read_csv(textfile.zip, compression='zip')  

Any help on how to do this would be great.

like image 318
johnnyb Avatar asked Jun 15 '17 19:06

johnnyb


People also ask

How do I read a zip file using pandas?

Method #1: Using compression=zip in pandas. read_csv() method. By assigning the compression argument in read_csv() method as zip, then pandas will first decompress the zip and then will create the dataframe from CSV file present in the zipped file.


2 Answers

You can pass ZipFile.open() to pandas.read_csv() to construct a pandas.DataFrame from a csv-file packed into a multi-file zip.

Code:

pd.read_csv(zip_file.open('file3.txt')) 

Example to read all .csv into a dict:

from zipfile import ZipFile  zip_file = ZipFile('textfile.zip') dfs = {text_file.filename: pd.read_csv(zip_file.open(text_file.filename))        for text_file in zip_file.infolist()        if text_file.filename.endswith('.csv')} 
like image 142
Stephen Rauch Avatar answered Sep 18 '22 16:09

Stephen Rauch


The most simplest way to handle this (if you have multiple parts of one big csv file compressed to a one zip file).

import pandas as pd from zipfile import ZipFile  df = pd.concat(     [pd.read_csv(ZipFile('some.zip').open(i)) for i in ZipFile('some.zip').namelist()],     ignore_index=True ) 
like image 39
valentinmk Avatar answered Sep 18 '22 16:09

valentinmk