data = IO::read(file).scrub("")
CSV.parse(data, {:col_sep => "\t", :headers => headers, :quote_char => '_'}) do |row|
# how to detect last line of CSV?
end
I have a giant CSV file that needs to be scrubbed. It has multiple lines that apply to one DB object. In my code I'm collecting all the lines that apply to one object before passing them off to a class that will process them.
It would be very helpful if I could detect the last line in the CSV so I could make sure the last collection gets sent.
Here is a seek-based solution. This is optimized for larger files. No matter what the size of the file, it will take only a jiffy to print the last line of the CSV:
#!/usr/bin/env ruby
require 'csv'
f = File.open('test.csv')
f.seek(-2, IO::SEEK_END) #pos -1 is newline at end of file
last_line = nil
while f.pos > 0
if f.getc == "\n"
last_line = f.read
break
else
f.pos -= 2 #getc advances position by 1
end
end
row = CSV.parse_line(last_line.scrub(""), col_sep: "\t")
p row
f.close
test.csv
first, second, third
1,2,3
3,4,5
7,8,9
test.rb
require 'csv'
headers = 'headers'
filename = './test.csv'
line_count = File.readlines(filename).size
file = File.open(filename, 'r')
data = IO::read(file).scrub("")
parse_opts = { col_sep: "\t", headers: headers, quote_char: '_'}
CSV.parse(data, parse_opts).to_enum.with_index(1).each do |row, line_num|
puts line_num == line_count
end
#=> false
#=> false
#=> false
#=> true
The line_count
is generated in ~8 sec on a 10+ million row CSV, you can alternatively use line_count = %x(wc -l #{filename}).to_i
which takes ~1.7 sec on the same file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With