Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In ruby, file.readlines.each not faster than file.open.each_line, why?

Tags:

file

io

ruby

Just to analyze my iis log (BONUS: happened to know that iislog is encoded in ASCII, errrr..)

Here's my ruby code

1.readlines

Dir.glob("*.log").each do |filename|
  File.readlines(filename,:encoding => "ASCII").each do |line|
    #comment line
    if line[0] == '#'
      next
    else
      line_content = line.downcase
      #just care about first one
      matched_keyword = keywords.select { |e| line_content.include? e }[0]
      total_count += 1 if extensions.any? { |e| line_content.include? e }
      hit_count[matched_keyword] += 1 unless matched_keyword.nil?
    end
  end
end

2.open

Dir.glob("*.log").each do |filename|
  File.open(filename,:encoding => "ASCII").each_line do |line|
    #comment line
    if line[0] == '#'
      next
    else
      line_content = line.downcase
      #just care about first one
      matched_keyword = keywords.select { |e| line_content.include? e }[0]
      total_count += 1 if extensions.any? { |e| line_content.include? e }
      hit_count[matched_keyword] += 1 unless matched_keyword.nil?
    end
  end
end

"readlines" read the whole file in mem, why "open" always a bit faster on the contrary?? I tested it a couple of times on Win7 Ruby1.9.3

like image 356
rhapsodyn Avatar asked Mar 28 '13 08:03

rhapsodyn


People also ask

When manipulating a file What is the difference between foreach and Readlines methods in Ruby?

When manipulating a file What is the difference between foreach and Readlines methods in Ruby? Explanation: The difference between the method foreach and the method readlines is that the method foreach is associated with a block. However, unlike the method readlines, the method foreach does not return an array. 9.

What are the Ruby file open modes?

Ruby allows the following open modes: "r" Read-only, starts at beginning of file (default mode). "r+" Read-write, starts at beginning of file. "w" Write-only, truncates existing file to zero length or creates a new file for writing.

How do I read a file line by line in Ruby?

Luckily, Ruby allows reading files line by line using File. foreach . Instead of reading the file's full content at once, it will execute a passed block for each line. Its result is enumerable, therefore it either yields a block for each line, or returns an Enumerator object if no block is passed.

Does file read close the file Ruby?

Ruby read file into array with File. The File. readlines method reads the whole file into an array of lines. The method automatically closes the file for us. Since the method reads the whole file at once, it is suitable for smaller files.


1 Answers

Both readlines and open.each_line read the file only once. And Ruby will do buffering on IO objects, so it will read a block (e.g. 64KB) data from disk every time to minimize the cost on disk read. There should be little time consuming difference in the disk read step.

When you call readlines, Ruby constructs an empty array [] and repeatedly reads a line of file contents and pushes it to the array. And at last it will return the array containing all lines of the file.

When you call each_line, Ruby reads a line of file contents and yield it to your logic. When you finished processing this line, ruby reads another line. It repeatedly reads lines until there is no more contents in the file.

The difference between the two method is that readlines have to append the lines to an array. When the file is large, Ruby might have to duplicate the underlying array (C level) to enlarge its size one or more times.

Digging into the source, readlines is implemented by io_s_readlines which calls rb_io_readlines. rb_io_readlines calls rb_io_getline_1 to fetch line and rb_ary_push to push result into the returning array.

each_line is implemented by rb_io_each_line which calls rb_io_getline_1 to fetch line just like readlines and yield the line to your logic with rb_yield.

So, there is no need to store line results in a growing array for each_line, no array resizing, copying issue.

like image 169
Arie Xiao Avatar answered Nov 11 '22 10:11

Arie Xiao