I know that by passing the compression='gzip'
argument to pd.to_csv()
I can save a DataFrame into a compressed CSV file.
my_df.to_csv('my_file_name.csv', compression='gzip')
I also know that if I want to append a DataFrame to the end of an existing CSV file I can use mode='a'
, like so
my_df.to_csv('my_file_name.csv', mode='a', index=False)
But what if I want to append a DataFrame to the end of a compressed CSV file? Is that even possible? I tried to do so with
my_df.to_csv('my_file_name.csv', mode='a', index=False, compression='gzip')
But the resulting CSV was not compressed, albeit in fine condition.
This question is motivated by my processing of a large CSV file with Pandas. I need to produce compressed CSV output, and am processing the CSV file in chunks into a DataFrame so that I don't run into a MemoryError. Hence, the most seemingly logical thing for me to do is to append each output DataFrame chunk together into one compressed zip file.
I am using Python 3.4 and Pandas 0.16.1.
Up-to-date answer: worked for me with pandas 1.2.4
Code:
df.to_csv('test.csv', mode='a', compression='gzip')
new_df = pd.read_csv('test.csv', compression='gzip')
df.shape[0] # 1x
new_df.shape[0] # 2x
You can do the following
import gzip
with gzip.open('my_file_name.csv.gz', 'a') as compressed_file:
df.to_csv(compressed_file, index=False)
since pandas .to_csv
method accepts a path or a file-like object.
The above answer does not seem to work anymore. When df.to_csv()
is handed no path or file-like object it returns the dataframe as string. This can be encoded and written to the gzip file.
import gzip
with gzip.open('my_file_name.csv.gz', 'a') as compressed_file:
compressed_file.write(df.to_csv().encode())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With