Can I append to a compressed stream with pandas?

Question

I know that by passing the compression='gzip' argument to pd.to_csv() I can save a DataFrame into a compressed CSV file.

my_df.to_csv('my_file_name.csv', compression='gzip')

I also know that if I want to append a DataFrame to the end of an existing CSV file I can use mode='a', like so

my_df.to_csv('my_file_name.csv', mode='a', index=False)

But what if I want to append a DataFrame to the end of a compressed CSV file? Is that even possible? I tried to do so with

my_df.to_csv('my_file_name.csv', mode='a', index=False, compression='gzip')

But the resulting CSV was not compressed, albeit in fine condition.

This question is motivated by my processing of a large CSV file with Pandas. I need to produce compressed CSV output, and am processing the CSV file in chunks into a DataFrame so that I don't run into a MemoryError. Hence, the most seemingly logical thing for me to do is to append each output DataFrame chunk together into one compressed zip file.

I am using Python 3.4 and Pandas 0.16.1.

Julian · Accepted Answer

Up-to-date answer: worked for me with pandas 1.2.4

Code:

df.to_csv('test.csv', mode='a', compression='gzip')
new_df = pd.read_csv('test.csv', compression='gzip')

df.shape[0] # 1x
new_df.shape[0] # 2x

paulo.filip3 · Answer

You can do the following

import gzip

with gzip.open('my_file_name.csv.gz', 'a') as compressed_file:
    df.to_csv(compressed_file, index=False)

since pandas .to_csv method accepts a path or a file-like object.

cmosig · Answer

The above answer does not seem to work anymore. When df.to_csv() is handed no path or file-like object it returns the dataframe as string. This can be encoded and written to the gzip file.

import gzip

with gzip.open('my_file_name.csv.gz', 'a') as compressed_file:
    compressed_file.write(df.to_csv().encode())

Can I append to a compressed stream with pandas?

Tags:

python

pandas

csv

gzip

Eric Hansen

3 Answers

Julian

paulo.filip3

cmosig

Recent Activity

Donate For Us

Can I append to a compressed stream with pandas?

Tags:

python

pandas

csv

gzip

Eric Hansen

3 Answers

Julian

paulo.filip3

cmosig

Related questions

Recent Activity

Donate For Us