Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSV.read Illegal quoting in line x

I am using ruby CSV.read with massive data. From time to time the library encounters poorly formatted lines, for instance:

"Illegal quoting in line 53657." 

It would be easier to ignore the line and skip it, then to go through each csv and fix the formatting. How can I do this?

like image 640
JZ. Avatar asked Mar 25 '12 21:03

JZ.


2 Answers

I had this problem in a line like 123,456,a"b"c

The problem is the CSV parser is expecting ", if they appear, to entirely surround the comma-delimited text.

Solution use a quote character besides " that I was sure would not appear in my data:

CSV.read(filename, :quote_char => "|")

like image 116
Ray Baxter Avatar answered Sep 19 '22 02:09

Ray Baxter


The liberal_parsing option is available starting in Ruby 2.4 for cases like this. From the documentation:

When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.

To enable it, pass it as an option to the CSV read/parse/new methods:

CSV.read(filename, liberal_parsing: true) 
like image 35
Will Madden Avatar answered Sep 20 '22 02:09

Will Madden