Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to parse large CSV files in ruby

Tags:

ruby

csv

What is the best way to parse a large CSV file in ruby. My CSV file is almost 1 GB. I want to filter the data in CSV according to some conditions.

like image 866
Maneesha Cd Avatar asked Oct 22 '25 16:10

Maneesha Cd


1 Answers

You don't specifically say, but I think most people commenting feel this is likely to be a homework question. If so you should read "How do I ask and answer homework questions?". If not read "How do I ask a good question?".

As G4143 stated in the comment Ruby has an excellent CSV class which should fit your needs.

Here are a couple of quick examples using foreach which the documentation describes as being intended as the primary method for reading CSV files. The method reads one line at a time from the file so it should work well with large files. Here is a basic example of how you might filter out a subset of Csv records using it, but I would encourage you to read the CSV class documentation and follow-up with more specific questions, showing what you have tried so far if you have trouble.

The basic idea is to start with an empty array, use foreach to get each row and if that row meets your filtering criteria, added to the initially empty filtered results array.

test.csv:

a, b, c
1,2,3


4,5,6 

require 'csv'    

filtered = []
CSV.foreach("test.csv") do |row|
    filtered << row if row[0] == "1"
end
filtered
=> [["1", "2", "3"]]

In the case where the first line of the file is a "header" you can pass in an option to treat it as such:

require 'csv' 

filtered = []
CSV.foreach("test.csv", :headers => true) do |row|
    filtered << row if row["a"] == "1"
end
filtered
=> [#<CSV::Row "a":"1" " b":"2" " c":"3">]
like image 195
nPn Avatar answered Oct 25 '25 13:10

nPn