I am using ruby CSV.read with massive data. From time to time the library encounters poorly formatted lines, for instance:
"Illegal quoting in line 53657."
It would be easier to ignore the line and skip it, then to go through each csv and fix the formatting. How can I do this?
I had this problem in a line like 123,456,a"b"c
The problem is the CSV parser is expecting "
, if they appear, to entirely surround the comma-delimited text.
Solution use a quote character besides "
that I was sure would not appear in my data:
CSV.read(filename, :quote_char => "|")
The liberal_parsing
option is available starting in Ruby 2.4 for cases like this. From the documentation:
When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.
To enable it, pass it as an option to the CSV read/parse/new methods:
CSV.read(filename, liberal_parsing: true)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With