I want to use a DictWriter
from Python's csv
module to generate a .csv file that's compressed using GZip. I need to do this all in-memory, so utilizing local files is out of the question.
However, I'm having trouble dealing with each module's type requirements in Python 3. Assuming that I got the general structure correctly, I can't make both modules work together because DictWriter
needs to write to a io.StringIO
buffer, while GZip
needs a io.BytesIO
object.
So, when I try to do:
buffer = io.BytesIO()
compressed = gzip.GzipFile(fileobj=buffer, mode='wb')
dict_writer = csv.DictWriter(buffer, ["a", "b"], extrasaction="ignore")
I get:
TypeError: a bytes-like object is required, not 'str'
And trying to use io.StringIO
with GZip
doesn't work either. How can I go about this?
You can use io.TextIOWrapper
to seamlessly transform a text stream into a binary one:
import io
import gzip
import csv
buffer = io.BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
with io.TextIOWrapper(compressed, encoding='utf-8') as wrapper:
dict_writer = csv.DictWriter(wrapper, ["a", "b"], extrasaction="ignore")
dict_writer.writeheader()
dict_writer.writerows([{'a': 1, 'b': 2}, {'a': 4, 'b': 3}])
print(buffer.getvalue()) # dump the compressed binary data
buffer.seek(0)
dict_reader = csv.DictReader(io.TextIOWrapper(gzip.GzipFile(fileobj=buffer, mode='rb'), encoding='utf-8'))
print(list(dict_reader)) # see if uncompressing the compressed data gets us back what we wrote
This outputs:
b'\x1f\x8b\x08\x00\x9c6[\\\x02\xffJ\xd4I\xe2\xe5\xe52\xd41\x02\x92&:\xc6@\x12\x00\x00\x00\xff\xff\x03\x00\x85k\xa2\x9e\x12\x00\x00\x00'
[OrderedDict([('a', '1'), ('b', '2')]), OrderedDict([('a', '4'), ('b', '3')])]
A roundabout way would be to write it to a io.StringIO
object first and then convert the content back to io.BytesIO
:
s = io.StringIO()
b = io.BytesIO()
dict_writer = csv.DictWriter(s, ["a", "b"], extrasaction="ignore")
... # complete your write operations ...
s.seek(0) # reset cursor to the beginning of the StringIO stream
b.write(s.read().encode('utf-8')) # or an encoding of your choice
compressed = gzip.GzipFile(fileobj=b, mode='wb')
...
s.close() # Remember to close your streams!
b.close()
Though as @wwii's comment suggest, depending on the size of your data, perhaps it's more worthwhile to write your own csv
in bytes
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With