Pandas to_csv to GzipFile in Python 3 not working

Tags:

python

pandas

Saving a Pandas dataframe to gzipped csv in memory works like this in Python 2.7 (Pandas 0.22.0):

from io import BytesIO
import gzip
import pandas as pd
df = pd.DataFrame.from_dict({'a': ['a', 'b', 'c']})
s = BytesIO()
f = gzip.GzipFile(fileobj=s, mode='wb', filename='file.csv')
df.to_csv(f)
s.seek(0)
content = s.getvalue()

However, in Python 3.6 (Pandas 0.22.0), the same code throws error when calling to_csv:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lib/python3.6/site-packages/pandas/core/frame.py", line 1524, in to_csv
    formatter.save()
  File "lib/python3.6/site-packages/pandas/io/formats/format.py", line 1652, in save
    self._save()
  File "lib/python3.6/site-packages/pandas/io/formats/format.py", line 1740, in _save
    self._save_header()
  File "lib/python3.6/site-packages/pandas/io/formats/format.py", line 1708, in _save_header
    writer.writerow(encoded_labels)
  File "miniconda3/lib/python3.6/gzip.py", line 260, in write
    data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'

How should I resolve this? Do I need to alter the GzipFile object somehow for to_csv to handle it properly?

To clarify, I want to create the gzipped file in-memory (the content variable) so that I can save it to Amazon S3 using Boto 3 put_object later.

208

asked Apr 26 '18 09:04

Waiski

1 Answers

You can utilise StringIO:

from io import StringIO
buf = StringIO()
df.to_csv(buf)
f = gzip.GzipFile(fileobj=s, mode='wb', filename='file.csv')
f.write(buf.getvalue().encode())
f.flush()

Note also the added f.flush() - according to my experience without this line the GzipFile may in some cases randomly not flush the data, resulting in corrupt archive.

Or as a complete example based on your code:

from io import BytesIO
import gzip
import pandas as pd
from io import StringIO
df = pd.DataFrame.from_dict({'a': ['a', 'b', 'c']})
s = BytesIO()
buf = StringIO()
f = gzip.GzipFile(fileobj=s, mode='wb', filename='file.csv')
df.to_csv(buf)
f.write(buf.getvalue().encode())
f.flush()
s.seek(0)
content = s.getvalue()

166

answered Oct 12 '22 23:10

Roland Pihlakas

Related questions
                            
                                Segmenting numpy arrays with as_strided
                            
                                Search and Replace in pandas dataframe for large dataset
                            
                                InvalidArgumentError when loading tfrecord file
                            
                                Calling multiple instances of python scripts in matlab using java.lang.Runtime.getRuntime not working
                            
                                How to train statsmodels.tsa.ARIMA model with multiple series
                            
                                SqlAlchemy non persistent column
                            
                                Tensorflow, negative KL Divergence
                            
                                Merge two folders in python
                            
                                Can Selenium use a specific Firefox profile without making a copy
                            
                                Converting Tensor to Numpy Array - Custom Loss function In keras
                            
                                python h5py file read "OSError: Unable to open file (bad superblock version number)"
                            
                                Plot.ly pie chart result precision
                            
                                Controlling Dataflow/Apache Beam output sharding
                            
                                How to debug async python code?
                            
                                Safely bind method from one class to another class in Python [duplicate]
                            
                                Using context managers for recovering from celery's SoftTimeLimitExceeded
                            
                                Can I use the secrets module with a version of Python earlier than 3.6?
                            
                                df.append() with dicts converts booleans to 1s and 0s
                            
                                Is there a way to impute missing values in machine learning?
                            
                                How do chr() and ord() relate to str and bytes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With