Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using csv.DictWriter to output an in-memory gzipped csv file?

I want to use a DictWriter from Python's csv module to generate a .csv file that's compressed using GZip. I need to do this all in-memory, so utilizing local files is out of the question.

However, I'm having trouble dealing with each module's type requirements in Python 3. Assuming that I got the general structure correctly, I can't make both modules work together because DictWriter needs to write to a io.StringIO buffer, while GZip needs a io.BytesIO object.

So, when I try to do:

buffer = io.BytesIO()
compressed = gzip.GzipFile(fileobj=buffer, mode='wb')
dict_writer = csv.DictWriter(buffer, ["a", "b"], extrasaction="ignore")

I get:

TypeError: a bytes-like object is required, not 'str'

And trying to use io.StringIO with GZip doesn't work either. How can I go about this?

like image 883
felipecgonc Avatar asked Dec 11 '22 03:12

felipecgonc


2 Answers

You can use io.TextIOWrapper to seamlessly transform a text stream into a binary one:

import io
import gzip
import csv
buffer = io.BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
    with io.TextIOWrapper(compressed, encoding='utf-8') as wrapper:
        dict_writer = csv.DictWriter(wrapper, ["a", "b"], extrasaction="ignore")
        dict_writer.writeheader()
        dict_writer.writerows([{'a': 1, 'b': 2}, {'a': 4, 'b': 3}])
print(buffer.getvalue()) # dump the compressed binary data
buffer.seek(0)
dict_reader = csv.DictReader(io.TextIOWrapper(gzip.GzipFile(fileobj=buffer, mode='rb'), encoding='utf-8'))
print(list(dict_reader)) # see if uncompressing the compressed data gets us back what we wrote

This outputs:

b'\x1f\x8b\x08\x00\x9c6[\\\x02\xffJ\xd4I\xe2\xe5\xe52\xd41\x02\x92&:\xc6@\x12\x00\x00\x00\xff\xff\x03\x00\x85k\xa2\x9e\x12\x00\x00\x00'
[OrderedDict([('a', '1'), ('b', '2')]), OrderedDict([('a', '4'), ('b', '3')])]
like image 183
blhsing Avatar answered Dec 21 '22 17:12

blhsing


A roundabout way would be to write it to a io.StringIO object first and then convert the content back to io.BytesIO:

s = io.StringIO()
b = io.BytesIO()

dict_writer = csv.DictWriter(s, ["a", "b"], extrasaction="ignore")

... # complete your write operations ...

s.seek(0)  # reset cursor to the beginning of the StringIO stream
b.write(s.read().encode('utf-8')) # or an encoding of your choice

compressed = gzip.GzipFile(fileobj=b, mode='wb')

... 

s.close()   # Remember to close your streams!
b.close()

Though as @wwii's comment suggest, depending on the size of your data, perhaps it's more worthwhile to write your own csv in bytes instead.

like image 31
r.ook Avatar answered Dec 21 '22 18:12

r.ook