Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: how to load a zip file containing multiple txt files?

I have many zip files stored in my path

  • mypath/data1.zip
  • mypath/data2.zip
  • etc.

Each zip file contains three different txt files. For instance, in data1.zip there is:

  • data1_a.txt
  • data1_b.txt
  • data1_c.txt

I need to load datai_c.txt from each zipped file (that is, data1_c.txt, data2_c.txt, data3_c.txt, etc) and concatenate them into a dataframe.

Unfortunately I am unable to do so using read_csv because it only works with a single zipped file.

Any ideas how to do so? Thanks!

like image 206
ℕʘʘḆḽḘ Avatar asked Feb 04 '23 15:02

ℕʘʘḆḽḘ


1 Answers

So you need some other code to reach into the zip file. Below is modified code from O'Reilly's Python Cookbook

import zipfile
import pandas as pd
## make up some data for example
x = pd.DataFrame({"A": [1, 2], "B": [3, 4]}) 
x.to_csv('a.txt', sep="|", index=False) 
(x * 2).to_csv('b.txt', sep="|", index=False)

with zipfile.ZipFile('zipfile.zip', 'w') as myzip:
    myzip.write('a.txt')
    myzip.write('b.txt')
    for filename in z.namelist( ): print 'File:', filename,
         insideDF = pd.read_csv(StringIO(z.read(filename)))
         df = pd.concat([df, insideDF])
print df
like image 92
JD Long Avatar answered Feb 07 '23 18:02

JD Long