Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js: Count the number of lines in a file

I have large text files, which range between 30MB and 10GB. How can I count the number of lines in a file using Node.js?

I have these limitations:

  • The entire file does not need to be written to memory
  • A child process is not required to perform the task
like image 591
hexacyanide Avatar asked Sep 17 '12 04:09

hexacyanide


People also ask

How do I count the number of lines in a text file?

The most easiest way to count the number of lines, words, and characters in text file is to use the Linux command “wc” in terminal. The command “wc” basically means “word count” and with different optional parameters one can use it to count the number of lines, words, and characters in a text file.


2 Answers

We can use indexOf to let the VM find the newlines:

function countFileLines(filePath){   return new Promise((resolve, reject) => {   let lineCount = 0;   fs.createReadStream(filePath)     .on("data", (buffer) => {       let idx = -1;       lineCount--; // Because the loop will run once for idx=-1       do {         idx = buffer.indexOf(10, idx+1);         lineCount++;       } while (idx !== -1);     }).on("end", () => {       resolve(lineCount);     }).on("error", reject);   }); }; 

What this solution does is that it finds the position of the first newline using .indexOf. It increments lineCount, then it finds the next position. The second parameter to .indexOf tells where to start looking for newlines. This way we are jumping over large chunks of the buffer. The while loop will run once for every newline, plus one.

We are letting the Node runtime do the searching for us which is implemented on a lower level and should be faster.

On my system this is about twice as fast as running a for loop over the buffer length on a large file (111 MB).

like image 34
Emil Vikström Avatar answered Sep 20 '22 10:09

Emil Vikström


solution without using wc:

var i; var count = 0; require('fs').createReadStream(process.argv[2])   .on('data', function(chunk) {     for (i=0; i < chunk.length; ++i)       if (chunk[i] == 10) count++;   })   .on('end', function() {     console.log(count);   }); 

it's slower, but not that much you might expect - 0.6s for 140M+ file including node.js loading & startup time

>time node countlines.js video.mp4  619643  real    0m0.614s user    0m0.489s sys 0m0.132s  >time wc -l video.mp4  619643 video.mp4 real    0m0.133s user    0m0.108s sys 0m0.024s  >wc -c video.mp4 144681406  video.mp4 
like image 117
Andrey Sidorov Avatar answered Sep 20 '22 10:09

Andrey Sidorov