From node.js, which is faster, shell grep or fs.readFile?

Tags:

I have a long running node.js process and I need to scan a log file for a pattern. I have at least two obvious choices: spawn a grep process or read the file using fs.read* and parse the buffer/stream in node.js. I haven't found a comparison of the two methods on the intarwebs. My question is twofold:

which is faster?
why might I prefer one technique over the other?

426

asked Feb 08 '15 23:02

Matt Simerson

2 Answers

forking a grep is simpler and quicker, and grep would most likely run faster and use less cpu. Although fork has a moderately high overhead (much more than opening a file), you would only fork once and stream the results. Plus it can be tricky to get good performance out of node's file i/o.

answered Oct 11 '22 20:10

Andras

To answer this question, I wrote this little program.

#!/usr/local/bin/node
'use strict';

const fs = require('fs');
const log = '/var/log/maillog';
const fsOpts = { flag: 'r', encoding: 'utf8' };
const wantsRe = new RegExp(process.argv[2]);

function handleResults (err, data) {
    console.log(data);
}

function grepWithFs (file, done) {
    fs.readFile(log, fsOpts, function (err, data) {
        if (err) throw (err);
        let res = '';
        data.toString().split(/\n/).forEach(function (line) {
            if (wantsRe && !wantsRe.test(line)) return;
            res += line + '\n';
        });
        done(null, res);
    });
};

function grepWithShell (file, done) {
    const spawn = require('child_process').spawn;
    let res = '';

    const child = spawn('grep', [ '-e', process.argv[2], file ]);
    child.stdout.on('data', function (buffer) { res += buffer.toString(); });
    child.stdout.on('end', function() { done(null, res); });
};

for (let i=0; i < 10; i++) {
    // grepWithFs(log, handleResults);
    grepWithShell(log, handleResults);
}

Then I alternately ran both functions inside a loop 10x and measured the time it took them to grep the result from a log file that's representative of my use case:

$ ls -alh /var/log/maillog
-rw-r--r--  1 root  wheel    37M Feb  8 16:44 /var/log/maillog

The file system is a pair of mirrored SSDs which are generally quick enough that they aren't the bottleneck. Here are the results:

grepWithShell

$ time node logreader.js 3E-4C03-86DD-FB6EF

real    0m0.238s
user    0m0.181s
sys     0m1.550s

grepWithFs

$ time node logreader.js 3E-4C03-86DD-FB6EF

real    0m6.599s
user    0m5.710s
sys     0m1.751s

The different is huge. Using a shell grep process is dramatically faster. As Andras points out, node's I/O can be tricky, and I didn't try any other fs.read* methods. If there's a better way, please do point it out (preferably with similar test scenario and results).

answered Oct 11 '22 20:10

Matt Simerson

Related questions
                            
                                nodejs https Error: socket hang up with localhost
                            
                                NodeJS HTTPS API testing with mocha and super test -"DEPTH_ZERO_SELF_SIGNED_CERT"
                            
                                What is the difference between methods and statics in Mongoose?
                            
                                When using (substack's) Tape module for testing, how do I run only one test in a file?
                            
                                NodeJS Multer is not working
                            
                                Edit elements in the jsdom window and save the window as a new HTML file?
                            
                                Duplicate an array an arbitrary number of times (javascript)
                            
                                How to set environment variables in a cross-platform way?
                            
                                Authentication (Passport) enough for security with Node js backend server?
                            
                                Update many documents in mongoDB with different values
                            
                                nodejs memory allocation failure
                            
                                Cannot find module 'fs'
                            
                                why react should usually be a prod dependency and not dev-dependency
                            
                                How to handle ECONNRESET, Connection reset by peer
                            
                                How can I benchmark a websocket-based Node.js application?
                            
                                How to handle Mongoose DB connection interruptions
                            
                                Why IndexedDB is not available in node.js? [closed]
                            
                                Mocha.js: to run "after" hook even if test suit fails
                            
                                How do I display flash message without page refresh using express and connect-flash?
                            
                                node.js cannot find a module in the same folder

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

From node.js, which is faster, shell grep or fs.readFile?

Tags:

grep

shell

node.js

Matt Simerson

People also ask

2 Answers

Andras

grepWithShell

grepWithFs

Matt Simerson

Recent Activity

Donate For Us