Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I append to a compressed stream with pandas?

I know that by passing the compression='gzip' argument to pd.to_csv() I can save a DataFrame into a compressed CSV file.

my_df.to_csv('my_file_name.csv', compression='gzip')

I also know that if I want to append a DataFrame to the end of an existing CSV file I can use mode='a', like so

my_df.to_csv('my_file_name.csv', mode='a', index=False)

But what if I want to append a DataFrame to the end of a compressed CSV file? Is that even possible? I tried to do so with

my_df.to_csv('my_file_name.csv', mode='a', index=False, compression='gzip')

But the resulting CSV was not compressed, albeit in fine condition.


This question is motivated by my processing of a large CSV file with Pandas. I need to produce compressed CSV output, and am processing the CSV file in chunks into a DataFrame so that I don't run into a MemoryError. Hence, the most seemingly logical thing for me to do is to append each output DataFrame chunk together into one compressed zip file.

I am using Python 3.4 and Pandas 0.16.1.

like image 540
Eric Hansen Avatar asked Jul 29 '16 09:07

Eric Hansen


3 Answers

Up-to-date answer: worked for me with pandas 1.2.4

Code:

df.to_csv('test.csv', mode='a', compression='gzip')
new_df = pd.read_csv('test.csv', compression='gzip')

df.shape[0] # 1x
new_df.shape[0] # 2x
like image 99
Julian Avatar answered Oct 13 '22 00:10

Julian


You can do the following

import gzip

with gzip.open('my_file_name.csv.gz', 'a') as compressed_file:
    df.to_csv(compressed_file, index=False)

since pandas .to_csv method accepts a path or a file-like object.

like image 38
paulo.filip3 Avatar answered Oct 13 '22 01:10

paulo.filip3


The above answer does not seem to work anymore. When df.to_csv() is handed no path or file-like object it returns the dataframe as string. This can be encoded and written to the gzip file.

import gzip

with gzip.open('my_file_name.csv.gz', 'a') as compressed_file:
    compressed_file.write(df.to_csv().encode())
like image 32
cmosig Avatar answered Oct 13 '22 00:10

cmosig