Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read very large (> 1GB) tar.gz files in Node.js?

I have never had to do this before so this is probably something really basic, but I thought I'd ask anyways.

What is the right way to read a very large file in Node.js? Say the file is just too large to read all at once. Also say the file could come in as a .zip or .tar.gz format.

First question, is it best to decompress the file first and save it to disk (I'm using Stuffit on the Mac to do this now), and then work with that file? Or can you read the IO stream straight from the compressed .zip or .tar.gz version? I guess you'd need to know the format of the content in the compressed file, so you probably have to decompress (just found out this .tar.gz file is actually a .dat file)...

Then the main issue is, how do I read this large file in Node.js? Say it's a 1GB XML file, where should I look to get started in parsing it? (Not, how to parse XML, but if you're reading the large file line-by-line, how do you parse something like XML which needs to know the context of previous lines).

I have seen fs.createReadStream, but I'm afraid to mess around with it... don't want to explode my computer. Just looking for some pointers in the right direction.

like image 513
Lance Avatar asked Jun 18 '12 02:06

Lance


2 Answers

there is built-in zlib module for stream decompression and sax for stream XML parsing

var fs = require('fs');
var zlib = require('zlib');
var sax = require('sax');

var saxStream = sax.createStream();
// add your xml handlers here

fs.createReadStream('large.xml.gz').pipe(zlib.createUnzip()).pipe(saxStream);
like image 185
Andrey Sidorov Avatar answered Oct 07 '22 14:10

Andrey Sidorov


We can also zip the directory something like the following :

var spawn = require('child_process').spawn;
var pathToArchive = './very_large_folder.tar.gz';
var pathToFolder = './very_large_folder';

var tar = spawn('tar', ['czf', pathToArchive, pathToFolder]);
tar.on('exit', function (code) {
        if (code === 0) {
                console.log('completed successfully');
        } else {
                console.log('error');
        }
});

This worked nicely :)

like image 31
Vaibhav Pachauri Avatar answered Oct 07 '22 14:10

Vaibhav Pachauri