Python: Creating a streaming gzip'd file-like?

Tags:

I'm trying to figure out the best way to compress a stream with Python's zlib.

I've got a file-like input stream (input, below) and an output function which accepts a file-like (output_function, below):

with open("file") as input:     output_function(input)

And I'd like to gzip-compress input chunks before sending them to output_function:

with open("file") as input:     output_function(gzip_stream(input))

It looks like the gzip module assumes that either the input or the output will be a gzip'd file-on-disk… So I assume that the zlib module is what I want.

However, it doesn't natively offer a simple way to create a stream file-like… And the stream-compression it does support comes by way of manually adding data to a compression buffer, then flushing that buffer.

Of course, I could write a wrapper around zlib.Compress.compress and zlib.Compress.flush (Compress is returned by zlib.compressobj()), but I'd be worried about getting buffer sizes wrong, or something similar.

So, what's the simplest way to create a streaming, gzip-compressing file-like with Python?

Edit: To clarify, the input stream and the compressed output stream are both too large to fit in memory, so something like output_function(StringIO(zlib.compress(input.read()))) doesn't really solve the problem.

880

asked Feb 03 '10 14:02

David Wolever

1 Answers

It's quite kludgy (self referencing, etc; just put a few minutes writing it, nothing really elegant), but it does what you want if you're still interested in using gzip instead of zlib directly.

Basically, GzipWrap is a (very limited) file-like object that produces a gzipped file out of a given iterable (e.g., a file-like object, a list of strings, any generator...)

Of course, it produces binary so there was no sense in implementing "readline".

You should be able to expand it to cover other cases or to be used as an iterable object itself.

from gzip import GzipFile  class GzipWrap(object):     # input is a filelike object that feeds the input     def __init__(self, input, filename = None):         self.input = input         self.buffer = ''         self.zipper = GzipFile(filename, mode = 'wb', fileobj = self)      def read(self, size=-1):         if (size < 0) or len(self.buffer) < size:             for s in self.input:                 self.zipper.write(s)                 if size > 0 and len(self.buffer) >= size:                     self.zipper.flush()                     break             else:                 self.zipper.close()             if size < 0:                 ret = self.buffer                 self.buffer = ''         else:             ret, self.buffer = self.buffer[:size], self.buffer[size:]         return ret      def flush(self):         pass      def write(self, data):         self.buffer += data      def close(self):         self.input.close()

answered Sep 23 '22 12:09

Ricardo Cárdenes

Related questions
                            
                                virtualenv does not include pip
                            
                                Convert to date using formatters parameter in pandas to_string
                            
                                How to pivot on multiple columns in Spark SQL?
                            
                                Why neural network predicts wrong on its own training data?
                            
                                Fast way to filter illegal xml unicode chars in python?
                            
                                Creating square subplots (of equal height and width) in matplotlib
                            
                                Python: module for plotting Gantt charts
                            
                                Suppressing namespace prefixes in ElementTree 1.2
                            
                                Interleave different length lists, elimating duplicates, and preserve order
                            
                                How do I program an Android App with Python? [closed]
                            
                                Why is OrderedDict named in camel case while defaultdict is lower case?
                            
                                Recommended Python publish/subscribe/dispatch module? [closed]
                            
                                Configuring Django to use SQLAlchemy [closed]
                            
                                Dumping a multiprocessing.Queue into a list
                            
                                Has threading in GTK w/ Python changed in PyGObject introspection?
                            
                                Numpy array: sequence too large
                            
                                SOCKET ERROR: [Errno 111] Connection refused
                            
                                Import a module with parameter in python
                            
                                Flask-restful API Authorization. Access current_identity inside decorator
                            
                                Python 3 type hint for a factory method on a base class returning a child class instance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Creating a streaming gzip'd file-like?

Tags:

python

gzip

zlib

David Wolever

People also ask

1 Answers

Ricardo Cárdenes

Recent Activity

Donate For Us