Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest way to read a large file in Ruby?

I've seen answers to this question but I couldn't figure out which of the answers would perform the fastest. These are the answers I've seen- which is best?

  1. Read one line at a time using each or each_line
  2. Read one line at a time using gets
  3. Save it all into an array of lines using readlines and then use each
  4. Use grep (not sure what exactly to do with grep...)
  5. Use sed (not sure what exactly to do with sed...)
  6. Something else?

Also, would it be better to just use another language or should Ruby be fine?

EDIT:

More details: Each line contains something like "id1 attr1_1 attr2_1 id2 attr1_2 attr2_2... idn attr1_n attr2_n" (n is very big) and I need to insert those into a database. For that example line, I would need to insert n rows into the database.

like image 703
user1136342 Avatar asked Feb 01 '13 20:02

user1136342


1 Answers

Ruby will likely be using the same or very similar low-level code (written in C) to do the actual reading from disk for the first three options, so they should perform similarly. Given that, you should choose whichever is most convenient for you; the ability to do that is what makes languages like Ruby so useful! You will be reading a lot of data from disk, so I would suggest using each_line and processing each line as you read it.

I would not recommend bringing grep, sed, or any other such external utilities into the picture unless you have a very good reason, as they will make your code less portable and expose you to failures that may be difficult to diagnose.

like image 70
mdunsmuir Avatar answered Sep 28 '22 09:09

mdunsmuir