I have a long running node.js process and I need to scan a log file for a pattern. I have at least two obvious choices: spawn a grep process or read the file using fs.read* and parse the buffer/stream in node.js. I haven't found a comparison of the two methods on the intarwebs. My question is twofold:
Single-threaded structure NodeJS is a single-threaded and asynchronous programming language. Any input/output process does not halt work. This means you can read files, send emails, query a database, and do other things simultaneously. Every request does not start a new NodeJS process.
In fs. readFile() method, we can read a file in a non-blocking asynchronous way, but in fs. readFileSync() method, we can read files in a synchronous way, i.e. we are telling node. js to block other parallel process and do the current file reading process.
Return Value: It returns a Promise. The Promise is resolved with the contents of the file. If no encoding is specified (using options. encoding), the data is returned as a Buffer object.
forking a grep is simpler and quicker, and grep would most likely run faster and use less cpu. Although fork has a moderately high overhead (much more than opening a file), you would only fork once and stream the results. Plus it can be tricky to get good performance out of node's file i/o.
To answer this question, I wrote this little program.
#!/usr/local/bin/node
'use strict';
const fs = require('fs');
const log = '/var/log/maillog';
const fsOpts = { flag: 'r', encoding: 'utf8' };
const wantsRe = new RegExp(process.argv[2]);
function handleResults (err, data) {
console.log(data);
}
function grepWithFs (file, done) {
fs.readFile(log, fsOpts, function (err, data) {
if (err) throw (err);
let res = '';
data.toString().split(/\n/).forEach(function (line) {
if (wantsRe && !wantsRe.test(line)) return;
res += line + '\n';
});
done(null, res);
});
};
function grepWithShell (file, done) {
const spawn = require('child_process').spawn;
let res = '';
const child = spawn('grep', [ '-e', process.argv[2], file ]);
child.stdout.on('data', function (buffer) { res += buffer.toString(); });
child.stdout.on('end', function() { done(null, res); });
};
for (let i=0; i < 10; i++) {
// grepWithFs(log, handleResults);
grepWithShell(log, handleResults);
}
Then I alternately ran both functions inside a loop 10x and measured the time it took them to grep the result from a log file that's representative of my use case:
$ ls -alh /var/log/maillog
-rw-r--r-- 1 root wheel 37M Feb 8 16:44 /var/log/maillog
The file system is a pair of mirrored SSDs which are generally quick enough that they aren't the bottleneck. Here are the results:
$ time node logreader.js 3E-4C03-86DD-FB6EF
real 0m0.238s
user 0m0.181s
sys 0m1.550s
$ time node logreader.js 3E-4C03-86DD-FB6EF
real 0m6.599s
user 0m5.710s
sys 0m1.751s
The different is huge. Using a shell grep process is dramatically faster. As Andras points out, node's I/O can be tricky, and I didn't try any other fs.read* methods. If there's a better way, please do point it out (preferably with similar test scenario and results).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With