Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby CSV parsing string with escaped quotes

Tags:

ruby

csv

I have a line in my CSV file that has some escaped quotes:

173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"

When I try to parse it the the Ruby CSV parser:

require 'csv'
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
  puts row
end

I get this error:

.../1.9.3-p327/lib/ruby/1.9.1/csv.rb:1914:in `block (2 levels) in shift': Missing or stray quote in line 122 (CSV::MalformedCSVError)

How can I get around this error?

like image 387
Andrew Avatar asked Jan 26 '13 06:01

Andrew


2 Answers

The \" is typical Unix whereas Ruby CSV expects ""

To parse it:

require 'csv'
text = File.read('test.csv').gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
  puts row
end

Note: if your CSV file is very large, it uses a lot of RAM to read the entire file. Consider reading the file one line at a time.

Note: if your CSV file may have slashes in front of slashes, use Andrew Grimm's suggestion below to help:

gsub(/(?<!\\)\\"/,'""')
like image 87
joelparkerhenderson Avatar answered Nov 13 '22 06:11

joelparkerhenderson


CSV supports "converters", which we can normally use to massage the content of a field before it's passed back to our code. For instance, that can be used to strip extra spaces on all fields in a row.

Unfortunately, the converters fire off after the line is split into fields, and it's during that step that CSV is getting mad about the embedded quotes, so we have to get between the "line read" step, and the "parse the line into fields" step.

This is my sample CSV file:

ID,Name,Country
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"

Preserving your CSV.foreach method, this is my example code for parsing it without CSV getting mad:

require 'csv'
require 'pp'

header = []
File.foreach('test.csv') do |csv_line|

  row = CSV.parse(csv_line.gsub('\"', '""')).first

  if header.empty?
    header = row.map(&:to_sym)
    next
  end

  row = Hash[header.zip(row)]
  pp row
  puts row[:Name]

end

And the resulting hash and name value:

{:ID=>"173", :Name=>"Yukihiro \"The Ruby Guy\" Matsumoto", :Country=>"Japan"}
Yukihiro "The Ruby Guy" Matsumoto

I assumed you were wanting a hash back because you specified the :headers flag:

CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
like image 17
the Tin Man Avatar answered Nov 13 '22 08:11

the Tin Man