Just to analyze my iis log (BONUS: happened to know that iislog is encoded in ASCII, errrr..) Here's my ruby code 1.readlines <pre class="prettyprint"><code>Dir.glob("*.log").each do |filename| File.readlines(filename,:encoding => "ASCII").each do |line| #comment line if line[0] == '#' next else line_content = line.downcase #just care about first one matched_keyword = keywords.select { |e| line_content.include? e }[0] total_count += 1 if extensions.any? { |e| line_content.include? e } hit_count[matched_keyword] += 1 unless matched_keyword.nil? end end end </code></pre> 2.open <pre class="prettyprint"><code>Dir.glob("*.log").each do |filename| File.open(filename,:encoding => "ASCII").each_line do |line| #comment line if line[0] == '#' next else line_content = line.downcase #just care about first one matched_keyword = keywords.select { |e| line_content.include? e }[0] total_count += 1 if extensions.any? { |e| line_content.include? e } hit_count[matched_keyword] += 1 unless matched_keyword.nil? end end end </code></pre> "readlines" read the whole file in mem, why "open" always a bit faster on the contrary?? I tested it a couple of times on Win7 Ruby1.9.3

Both <code>readlines</code> and <code>open.each_line</code> read the file only once. And Ruby will do buffering on IO objects, so it will read a block (e.g. 64KB) data from disk every time to minimize the cost on disk read. There should be little time consuming difference in the disk read step. When you call <code>readlines</code>, Ruby constructs an empty array <code>[]</code> and repeatedly reads a line of file contents and pushes it to the array. And at last it will return the array containing all lines of the file. When you call <code>each_line</code>, Ruby reads a line of file contents and yield it to your logic. When you finished processing this line, ruby reads another line. It repeatedly reads lines until there is no more contents in the file. The difference between the two method is that <code>readlines</code> have to append the lines to an array. When the file is large, Ruby might have to duplicate the underlying array (C level) to enlarge its size one or more times. Digging into the source, <code>readlines</code> is implemented by <code>io_s_readlines</code> which calls <code>rb_io_readlines</code>. <code>rb_io_readlines</code> calls <code>rb_io_getline_1</code> to fetch line and <code>rb_ary_push</code> to push result into the returning array. <code>each_line</code> is implemented by <code>rb_io_each_line</code> which calls <code>rb_io_getline_1</code> to fetch line just like <code>readlines</code> and yield the line to your logic with <code>rb_yield</code>. So, there is no need to store line results in a growing array for <code>each_line</code>, no array resizing, copying issue.

In ruby, file.readlines.each not faster than file.open.each_line, why?

Tags:

file

io

ruby

Just to analyze my iis log (BONUS: happened to know that iislog is encoded in ASCII, errrr..)

Here's my ruby code

1.readlines

Dir.glob("*.log").each do |filename|
  File.readlines(filename,:encoding => "ASCII").each do |line|
    #comment line
    if line[0] == '#'
      next
    else
      line_content = line.downcase
      #just care about first one
      matched_keyword = keywords.select { |e| line_content.include? e }[0]
      total_count += 1 if extensions.any? { |e| line_content.include? e }
      hit_count[matched_keyword] += 1 unless matched_keyword.nil?
    end
  end
end

2.open

Dir.glob("*.log").each do |filename|
  File.open(filename,:encoding => "ASCII").each_line do |line|
    #comment line
    if line[0] == '#'
      next
    else
      line_content = line.downcase
      #just care about first one
      matched_keyword = keywords.select { |e| line_content.include? e }[0]
      total_count += 1 if extensions.any? { |e| line_content.include? e }
      hit_count[matched_keyword] += 1 unless matched_keyword.nil?
    end
  end
end

"readlines" read the whole file in mem, why "open" always a bit faster on the contrary?? I tested it a couple of times on Win7 Ruby1.9.3

356

asked Mar 28 '13 08:03

rhapsodyn

1 Answers

Both readlines and open.each_line read the file only once. And Ruby will do buffering on IO objects, so it will read a block (e.g. 64KB) data from disk every time to minimize the cost on disk read. There should be little time consuming difference in the disk read step.

When you call readlines, Ruby constructs an empty array [] and repeatedly reads a line of file contents and pushes it to the array. And at last it will return the array containing all lines of the file.

When you call each_line, Ruby reads a line of file contents and yield it to your logic. When you finished processing this line, ruby reads another line. It repeatedly reads lines until there is no more contents in the file.

The difference between the two method is that readlines have to append the lines to an array. When the file is large, Ruby might have to duplicate the underlying array (C level) to enlarge its size one or more times.

Digging into the source, readlines is implemented by io_s_readlines which calls rb_io_readlines. rb_io_readlines calls rb_io_getline_1 to fetch line and rb_ary_push to push result into the returning array.

each_line is implemented by rb_io_each_line which calls rb_io_getline_1 to fetch line just like readlines and yield the line to your logic with rb_yield.

So, there is no need to store line results in a growing array for each_line, no array resizing, copying issue.

169

answered Nov 11 '22 10:11

Arie Xiao

Related questions
                            
                                where is ruby 3.0.0 on rbenv
                            
                                What other languages have features and/or libraries similar to Perl's format?
                            
                                How to Properly Convert or Query Date Range for Rails / MySQL DateTime Column
                            
                                How can you interact with Perl programs from Ruby?
                            
                                Rails routing - custom routes for Resources
                            
                                Compute hex color code for an arbitrary string
                            
                                Use hash or case-statement [Ruby]
                            
                                How do you extend a Ruby module with macro-like metaprogramming methods?
                            
                                Validate arguments in Ruby?
                            
                                Can node.js replace Ruby? [closed]
                            
                                rails friendly_id and check if entry exists
                            
                                Emacs, ruby: convert do end block to curly braces and vice versa
                            
                                Using Ruby and net-ssh, how do I authenticate using the key_data parameter with Net::SSH.start?
                            
                                Mongodb change ObjectID or _id attribute for a document?
                            
                                How can I find the max attribute across records in ruby?
                            
                                How to render a string as an erb file?
                            
                                Ruby: Converting a nested Ruby hash to an un-nested one
                            
                                Find loaded providers for OmniAuth
                            
                                IP Range to CIDR in Ruby/Rails?
                            
                                Is a global variable defined inside a Sinatra route shared between requests?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With