Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing CSV quoting error is driving me nuts

Tags:

I've been having an unbelievable time trying to import a CSV file in ruby-1.9.2.

The file I am trying to parse has:

  • commas within columns
  • quotes within columns
  • uses an '@' as the :col_sep

csv.txt (representative input, real one is 101k lines):

㔾@㔾@jié@"seal" radical in Chinese characters, (Kangxi radical 26) 

My code:

require 'csv'  CSV.foreach("/Users/adam/Desktop/csvtest.txt", {:col_sep => "@"}) do |row|     puts row.to_s  end 

My desired output:

["㔾", "㔾", "jié", "\"seal\" radical in Chinese characters, (Kangxi radical 26)"] 

What I get for output:

CSV::MalformedCSVError: Unclosed quoted field on line 1. from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1910:in `block in shift' from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1825:in `loop' from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1825:in `shift' from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1767:in `each' from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1202:in `block in foreach' from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1340:in `open' from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1201:in `foreach' from (irb):31 from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/bin/irb:16:in `<main>' 

It says there are unclosed quoted feilds, but I can see that the quotes open and close.

Escaping the quotes does nothing. I get the same error (...@""seal"" r...). Changing them to single quotes makes it work (...@'seal' r...). The problem is I NEED them to be in double quotes.

Any ideas?

like image 970
AdamA Avatar asked Nov 10 '11 01:11

AdamA


People also ask

Why is my CSV not importing correctly?

The most common CSV import errors include: The file size is too large - The CSV import tool of the program you're using might have a file size requirement. To reduce the file size, you can delete unnecessary data values, columns, and rows.

What is import from CSV?

The Import from CSV command parses a comma-separated value (CSV) file and loads it into the project repository.


1 Answers

I think the problem is that CSV is trying to interpret "seal" as a single quoted column; but, it doesn't appear as @"seal"@ so the parser gets confused because quotes are supposed to surround columns. I don't see any option to tell CSV that the columns aren't quoted but you can kludge around it by setting :quote_char to something that will never occur. If you're using UTF-8 then you can safely use a zero byte as your "quote character that will never occur":

CSV.foreach(filename, :col_sep => "@", :quote_char => "\x00") do |row|     #... end 

This should work as long as none of your columns are quoted.

like image 62
mu is too short Avatar answered Oct 11 '22 14:10

mu is too short