I'm processing huge data files (millions of lines each). Before I start processing I'd like to get a count of the number of lines in the file, so I can then indicate how far along the processing is. Because of the size of the files, it would not be practical to read the entire file into memory, just to count how many lines there are. Does anyone have a good suggestion on how to do this?

Reading the file a line at a time: <pre class="prettyprint"><code>count = File.foreach(filename).inject(0) {|c, line| c+1} </code></pre> or the Perl-ish <pre class="prettyprint"><code>File.foreach(filename) {} count = $. </code></pre> or <pre class="prettyprint"><code>count = 0 File.open(filename) {|f| count = f.read.count("\n")} </code></pre> Will be slower than <pre class="prettyprint"><code>count = %x{wc -l #{filename}}.split.first.to_i </code></pre>

If you are in a Unix environment, you can just let <code>wc -l</code> do the work. It will not load the whole file into memory; since it is optimized for streaming file and count word/line the performance is good enough rather then streaming the file yourself in Ruby. SSCCE: <pre class="prettyprint"><code>filename = 'a_file/somewhere.txt' line_count = `wc -l "#{filename}"`.strip.split(' ')[0].to_i p line_count </code></pre> Or if you want a collection of files passed on the command line: <pre class="prettyprint"><code>wc_output = `wc -l "#{ARGV.join('" "')}"` line_count = wc_output.match(/^ *([0-9]+) +total$/).captures[0].to_i p line_count </code></pre>

Count the number of lines in a file without reading entire file into memory?

2 Answers

Reading the file a line at a time:

count = File.foreach(filename).inject(0) {|c, line| c+1}

or the Perl-ish

File.foreach(filename) {} count = $.

count = 0 File.open(filename) {|f| count = f.read.count("\n")}

Will be slower than

count = %x{wc -l #{filename}}.split.first.to_i

113

answered Oct 07 '22 23:10

glenn jackman

If you are in a Unix environment, you can just let wc -l do the work.

It will not load the whole file into memory; since it is optimized for streaming file and count word/line the performance is good enough rather then streaming the file yourself in Ruby.

SSCCE:

filename = 'a_file/somewhere.txt' line_count = `wc -l "#{filename}"`.strip.split(' ')[0].to_i p line_count

Or if you want a collection of files passed on the command line:

wc_output = `wc -l "#{ARGV.join('" "')}"` line_count = wc_output.match(/^ *([0-9]+) +total$/).captures[0].to_i p line_count

answered Oct 07 '22 23:10

DJ.

Related questions
                            
                                Is it possible to read a file's modification date with Ruby?
                            
                                Radio buttons on Rails
                            
                                God vs. Monit [closed]
                            
                                Rails Migration with adding and removing reference
                            
                                If I have a hash in Ruby on Rails, is there a way to make it indifferent access?
                            
                                ruby 2.0 rails gem install error "cannot load such file -- openssl"
                            
                                How can we watch the Rails development log?
                            
                                Is there a pluralize function in Ruby NOT Rails?
                            
                                "bin/rails: No such file or directory" w/ Ruby 2 & Rails 4 on Heroku
                            
                                How to handle Ruby on Rails error: "Please install the postgresql adapter: `gem install activerecord-postgresql-adapter'"
                            
                                Ruby run shell command in a specific directory
                            
                                Parse command line arguments in a Ruby script
                            
                                Check if URL exists in Ruby
                            
                                How do I add multiple elements to an array?
                            
                                Open the default browser in Ruby
                            
                                STI, one controller
                            
                                rbenv install --list does not list version 2.1.2
                            
                                Rails bundler doesn't install gems inside a group
                            
                                Convert CSV file into array of hashes
                            
                                Rails 4 Unpermitted Parameters for Array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Count the number of lines in a file without reading entire file into memory?

Tags:

ruby

smnirven

People also ask

2 Answers

glenn jackman

DJ.

Recent Activity

Donate For Us