Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Creating a streaming gzip'd file-like?

Tags:

python

gzip

zlib

I'm trying to figure out the best way to compress a stream with Python's zlib.

I've got a file-like input stream (input, below) and an output function which accepts a file-like (output_function, below):

with open("file") as input:     output_function(input) 

And I'd like to gzip-compress input chunks before sending them to output_function:

with open("file") as input:     output_function(gzip_stream(input)) 

It looks like the gzip module assumes that either the input or the output will be a gzip'd file-on-disk… So I assume that the zlib module is what I want.

However, it doesn't natively offer a simple way to create a stream file-like… And the stream-compression it does support comes by way of manually adding data to a compression buffer, then flushing that buffer.

Of course, I could write a wrapper around zlib.Compress.compress and zlib.Compress.flush (Compress is returned by zlib.compressobj()), but I'd be worried about getting buffer sizes wrong, or something similar.

So, what's the simplest way to create a streaming, gzip-compressing file-like with Python?

Edit: To clarify, the input stream and the compressed output stream are both too large to fit in memory, so something like output_function(StringIO(zlib.compress(input.read()))) doesn't really solve the problem.

like image 880
David Wolever Avatar asked Feb 03 '10 14:02

David Wolever


People also ask

How do I create a gzip in Python?

When mode parameter is given as 'w' or 'wb' or 'wt', the GipFile object will provide write() method to compress the given data and write to a gzip file. This will create a testnew. txt. gz file.

Can gzip be streamed?

You can achieve streaming gzip compression. The gzip module uses zlib which is documented to achieve streaming compression, and peeking into the gzip module source, it doesn't appear to load all the output bytes into memory.

How do I create a gzip file?

gz file is a Tar archive compressed with Gzip. To create a tar. gz file, use the tar -czf command, followed by the archive name and files you want to add.


1 Answers

It's quite kludgy (self referencing, etc; just put a few minutes writing it, nothing really elegant), but it does what you want if you're still interested in using gzip instead of zlib directly.

Basically, GzipWrap is a (very limited) file-like object that produces a gzipped file out of a given iterable (e.g., a file-like object, a list of strings, any generator...)

Of course, it produces binary so there was no sense in implementing "readline".

You should be able to expand it to cover other cases or to be used as an iterable object itself.

from gzip import GzipFile  class GzipWrap(object):     # input is a filelike object that feeds the input     def __init__(self, input, filename = None):         self.input = input         self.buffer = ''         self.zipper = GzipFile(filename, mode = 'wb', fileobj = self)      def read(self, size=-1):         if (size < 0) or len(self.buffer) < size:             for s in self.input:                 self.zipper.write(s)                 if size > 0 and len(self.buffer) >= size:                     self.zipper.flush()                     break             else:                 self.zipper.close()             if size < 0:                 ret = self.buffer                 self.buffer = ''         else:             ret, self.buffer = self.buffer[:size], self.buffer[size:]         return ret      def flush(self):         pass      def write(self, data):         self.buffer += data      def close(self):         self.input.close() 
like image 54
Ricardo Cárdenes Avatar answered Sep 23 '22 12:09

Ricardo Cárdenes