I have many zip
files stored in my path
mypath/data1.zip
mypath/data2.zip
Each zip file contains three different txt
files. For instance, in data1.zip
there is:
data1_a.txt
data1_b.txt
data1_c.txt
I need to load datai_c.txt
from each zipped file (that is, data1_c.txt
, data2_c.txt
, data3_c.txt
, etc) and concatenate them into a dataframe.
Unfortunately I am unable to do so using read_csv
because it only works with a single zipped file.
Any ideas how to do so? Thanks!
So you need some other code to reach into the zip file. Below is modified code from O'Reilly's Python Cookbook
import zipfile
import pandas as pd
## make up some data for example
x = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
x.to_csv('a.txt', sep="|", index=False)
(x * 2).to_csv('b.txt', sep="|", index=False)
with zipfile.ZipFile('zipfile.zip', 'w') as myzip:
myzip.write('a.txt')
myzip.write('b.txt')
for filename in z.namelist( ): print 'File:', filename,
insideDF = pd.read_csv(StringIO(z.read(filename)))
df = pd.concat([df, insideDF])
print df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With