Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of lines in a file without reading entire file into memory?

Tags:

ruby

I'm processing huge data files (millions of lines each).

Before I start processing I'd like to get a count of the number of lines in the file, so I can then indicate how far along the processing is.

Because of the size of the files, it would not be practical to read the entire file into memory, just to count how many lines there are. Does anyone have a good suggestion on how to do this?

like image 568
smnirven Avatar asked Apr 16 '10 04:04

smnirven


People also ask

How do I count the number of lines in a file without opening the file?

If you are in *Nix system, you can call the command wc -l that gives the number of lines in file.

How do I count the number of lines in a file without opening the file in Linux?

Use grep -n string file to find the line number without opening the file.

How do I count the number of lines in a file?

Using “wc -l” There are several ways to count lines in a file. But one of the easiest and widely used way is to use “wc -l”. The wc utility displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified) to the standard output. 1.

Which command is used to count lines from file?

Use the wc command to count the number of lines, words, and bytes in the files specified by the File parameter.


2 Answers

Reading the file a line at a time:

count = File.foreach(filename).inject(0) {|c, line| c+1} 

or the Perl-ish

File.foreach(filename) {} count = $. 

or

count = 0 File.open(filename) {|f| count = f.read.count("\n")} 

Will be slower than

count = %x{wc -l #{filename}}.split.first.to_i 
like image 113
glenn jackman Avatar answered Oct 07 '22 23:10

glenn jackman


If you are in a Unix environment, you can just let wc -l do the work.

It will not load the whole file into memory; since it is optimized for streaming file and count word/line the performance is good enough rather then streaming the file yourself in Ruby.

SSCCE:

filename = 'a_file/somewhere.txt' line_count = `wc -l "#{filename}"`.strip.split(' ')[0].to_i p line_count 

Or if you want a collection of files passed on the command line:

wc_output = `wc -l "#{ARGV.join('" "')}"` line_count = wc_output.match(/^ *([0-9]+) +total$/).captures[0].to_i p line_count 
like image 42
DJ. Avatar answered Oct 07 '22 23:10

DJ.