Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

From node.js, which is faster, shell grep or fs.readFile?

I have a long running node.js process and I need to scan a log file for a pattern. I have at least two obvious choices: spawn a grep process or read the file using fs.read* and parse the buffer/stream in node.js. I haven't found a comparison of the two methods on the intarwebs. My question is twofold:

  1. which is faster?
  2. why might I prefer one technique over the other?
like image 426
Matt Simerson Avatar asked Feb 08 '15 23:02

Matt Simerson


People also ask

What makes node JS faster?

Single-threaded structure NodeJS is a single-threaded and asynchronous programming language. Any input/output process does not halt work. This means you can read files, send emails, query a database, and do other things simultaneously. Every request does not start a new NodeJS process.

What is the difference between readFile and readFileSync in node JS?

In fs. readFile() method, we can read a file in a non-blocking asynchronous way, but in fs. readFileSync() method, we can read files in a synchronous way, i.e. we are telling node. js to block other parallel process and do the current file reading process.

Does fs readFile return promise?

Return Value: It returns a Promise. The Promise is resolved with the contents of the file. If no encoding is specified (using options. encoding), the data is returned as a Buffer object.


2 Answers

forking a grep is simpler and quicker, and grep would most likely run faster and use less cpu. Although fork has a moderately high overhead (much more than opening a file), you would only fork once and stream the results. Plus it can be tricky to get good performance out of node's file i/o.

like image 81
Andras Avatar answered Oct 11 '22 20:10

Andras


To answer this question, I wrote this little program.

#!/usr/local/bin/node
'use strict';

const fs = require('fs');
const log = '/var/log/maillog';
const fsOpts = { flag: 'r', encoding: 'utf8' };
const wantsRe = new RegExp(process.argv[2]);

function handleResults (err, data) {
    console.log(data);
}

function grepWithFs (file, done) {
    fs.readFile(log, fsOpts, function (err, data) {
        if (err) throw (err);
        let res = '';
        data.toString().split(/\n/).forEach(function (line) {
            if (wantsRe && !wantsRe.test(line)) return;
            res += line + '\n';
        });
        done(null, res);
    });
};

function grepWithShell (file, done) {
    const spawn = require('child_process').spawn;
    let res = '';

    const child = spawn('grep', [ '-e', process.argv[2], file ]);
    child.stdout.on('data', function (buffer) { res += buffer.toString(); });
    child.stdout.on('end', function() { done(null, res); });
};

for (let i=0; i < 10; i++) {
    // grepWithFs(log, handleResults);
    grepWithShell(log, handleResults);
}

Then I alternately ran both functions inside a loop 10x and measured the time it took them to grep the result from a log file that's representative of my use case:

$ ls -alh /var/log/maillog
-rw-r--r--  1 root  wheel    37M Feb  8 16:44 /var/log/maillog

The file system is a pair of mirrored SSDs which are generally quick enough that they aren't the bottleneck. Here are the results:

grepWithShell

$ time node logreader.js 3E-4C03-86DD-FB6EF

real    0m0.238s
user    0m0.181s
sys     0m1.550s

grepWithFs

$ time node logreader.js 3E-4C03-86DD-FB6EF

real    0m6.599s
user    0m5.710s
sys     0m1.751s

The different is huge. Using a shell grep process is dramatically faster. As Andras points out, node's I/O can be tricky, and I didn't try any other fs.read* methods. If there's a better way, please do point it out (preferably with similar test scenario and results).

like image 40
Matt Simerson Avatar answered Oct 11 '22 20:10

Matt Simerson