Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply GZIP compression to a CSV in Python Pandas

I am trying to write a dataframe to a gzipped csv in python pandas, using the following:

import pandas as pd import datetime import csv import gzip  # Get data (with previous connection and script variables) df = pd.read_sql_query(script, conn)  # Create today's date, to append to file todaysdatestring = str(datetime.datetime.today().strftime('%Y%m%d')) print todaysdatestring  # Create csv with gzip compression df.to_csv('foo-%s.csv.gz' % todaysdatestring,       sep='|',       header=True,       index=False,       quoting=csv.QUOTE_ALL,       compression='gzip',       quotechar='"',       doublequote=True,       line_terminator='\n') 

This just creates a csv called 'foo-YYYYMMDD.csv.gz', not an actual gzip archive.

I've also tried adding this:

#Turn to_csv statement into a variable d = df.to_csv('foo-%s.csv.gz' % todaysdatestring,       sep='|',       header=True,       index=False,       quoting=csv.QUOTE_ALL,       compression='gzip',       quotechar='"',       doublequote=True,       line_terminator='\n')  # Write above variable to gzip  with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as output:    output.write(d) 

Which fails as well. Any ideas?

like image 782
user2752159 Avatar asked May 12 '16 16:05

user2752159


People also ask

Can Pandas read gzip?

gz is not supported by Pandas!


1 Answers

Using df.to_csv() with the keyword argument compression='gzip' should produce a gzip archive. I tested it using same keyword arguments as you, and it worked.

You may need to upgrade pandas, as gzip was not implemented until version 0.17.1, but trying to use it on prior versions will not raise an error, and just produce a regular csv. You can determine your current version of pandas by looking at the output of pd.__version__.

like image 68
root Avatar answered Sep 23 '22 11:09

root