Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node Zlib creating invalid .gz files when streaming

I am trying to create a .gz file by streaming strings (JSONs separated by newlines) to it gradually.

Using Node 0.11 and 0.12 (both with same reasult, .gz file won't open).

I reduced the code to this:

var fs = require('fs');
var output = fs.createWriteStream('test.gz');
var zlib = require('zlib');
var compress = zlib.createGzip();
var myStream = compress.pipe(output);

myStream.write('Hello, World!');
myStream.end();

The file is created, but I cannot open it. What am I doing wrong?

like image 533
Adam Abrams Avatar asked Apr 06 '15 06:04

Adam Abrams


People also ask

Is zlib Gzip?

zlib (/ˈziːlɪb/ or "zeta-lib", /ˈziːtəˌlɪb/) is a software library used for data compression. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program.

Is zlib duplex stream?

An example of a Duplex stream is a Socket, which provides two channels to send and receive data. Other examples of the Duplex streams are: TCP sockets. zlib streams.


1 Answers

Okay, so here's the fix:

var fs = require('fs');
var output = fs.createWriteStream('test.gz');
var zlib = require('zlib');
var compress = zlib.createGzip();
/* The following line will pipe everything written into compress to the file stream */
compress.pipe(output);
/* Since we're piped through the file stream, the following line will do: 
   'Hello World!'->gzip compression->file which is the desired effect */
compress.write('Hello, World!');
compress.end();

And the explanation: Piping is used to forward streams from one context to another, each context manipulating the stream according to it's own specification (i.e STDOUT->gzip compression->encryption->file will cause everything printed to STDOUT to pass gzip compression, encryption and eventually write to file).

In your original example, you are writing to the end of the pipe, this means writing to the file with no manipulations and hence you got the plain ASCII you asked to write. The confusion here is about what myStream is. You assumed it is the entire chain (gzip->file) but in fact, it is just the end (file).

Once a pipe is set to a stream object, all further writes to that stream will pipe through automatically when you write to the original stream.

Some references I found useful:
http://codewinds.com/blog/2013-08-20-nodejs-transform-streams.html#what_are_transform_streams_

http://www.sitepoint.com/basics-node-js-streams/

https://nodejs.org/api/process.html

like image 86
Ishay Peled Avatar answered Sep 18 '22 07:09

Ishay Peled