Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect the last row in CSV (ruby)

Tags:

ruby

csv

data = IO::read(file).scrub("")
CSV.parse(data, {:col_sep => "\t", :headers => headers, :quote_char => '_'}) do |row|
  # how to detect last line of CSV?
end

I have a giant CSV file that needs to be scrubbed. It has multiple lines that apply to one DB object. In my code I'm collecting all the lines that apply to one object before passing them off to a class that will process them.

It would be very helpful if I could detect the last line in the CSV so I could make sure the last collection gets sent.

like image 426
newUserNameHere Avatar asked Mar 13 '23 04:03

newUserNameHere


2 Answers

Here is a seek-based solution. This is optimized for larger files. No matter what the size of the file, it will take only a jiffy to print the last line of the CSV:

#!/usr/bin/env ruby

require 'csv'

f = File.open('test.csv')
f.seek(-2, IO::SEEK_END) #pos -1 is newline at end of file
last_line = nil

while f.pos > 0
  if f.getc == "\n"
    last_line = f.read
    break
  else
    f.pos -= 2  #getc advances position by 1
  end
end

row = CSV.parse_line(last_line.scrub(""), col_sep: "\t")
p row

f.close
like image 102
shivams Avatar answered Mar 19 '23 04:03

shivams


test.csv

first, second, third
1,2,3
3,4,5
7,8,9

test.rb

require 'csv'
headers    = 'headers'
filename   = './test.csv'
line_count = File.readlines(filename).size
file       = File.open(filename, 'r')
data       = IO::read(file).scrub("")
parse_opts = { col_sep: "\t", headers: headers, quote_char: '_'}

CSV.parse(data, parse_opts).to_enum.with_index(1).each do |row, line_num|
  puts line_num == line_count
end
#=> false
#=> false
#=> false
#=> true

The line_count is generated in ~8 sec on a 10+ million row CSV, you can alternatively use line_count = %x(wc -l #{filename}).to_i which takes ~1.7 sec on the same file.

like image 33
Travis Avatar answered Mar 19 '23 04:03

Travis