I have multiple zip files containing different types of txt files. Like below:
zip1 - file1.txt - file2.txt - file3.txt
How can I use pandas to read in each of those files without extracting them?
I know if they were 1 file per zip I could use the compression method with read_csv like below:
df = pd.read_csv(textfile.zip, compression='zip')
Any help on how to do this would be great.
Method #1: Using compression=zip in pandas. read_csv() method. By assigning the compression argument in read_csv() method as zip, then pandas will first decompress the zip and then will create the dataframe from CSV file present in the zipped file.
You can pass ZipFile.open()
to pandas.read_csv()
to construct a pandas.DataFrame
from a csv-file packed into a multi-file zip
.
pd.read_csv(zip_file.open('file3.txt'))
.csv
into a dict:from zipfile import ZipFile zip_file = ZipFile('textfile.zip') dfs = {text_file.filename: pd.read_csv(zip_file.open(text_file.filename)) for text_file in zip_file.infolist() if text_file.filename.endswith('.csv')}
The most simplest way to handle this (if you have multiple parts of one big csv file compressed to a one zip file).
import pandas as pd from zipfile import ZipFile df = pd.concat( [pd.read_csv(ZipFile('some.zip').open(i)) for i in ZipFile('some.zip').namelist()], ignore_index=True )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With