I'm using ruby 1.9 to parse the following csv file with MacRoman character
# encoding: ISO-8859-1
#csv_parse.csv
Name, main-dialogue
"Marceu", "Give it to him ó he, his wife."
I did the following to parse this.
require 'csv'
input_string = File.read("../csv_parse.rb").force_encoding("ISO-8859-1").encode("UTF-8")
#=> "Name, main-dialogue\r\n\"Marceu\", \"Give it to him \x97 he, his wife.\"\r\n"
data = CSV.parse(input_string, :quote_char => "'", :col_sep => "/\",/")
#=> [["Name, main-dialogue"], ["\"Marceu", " \"Give it to him \x97 he, his wife.\""]]
So, the problem is the second array in data is of single string rather than 2 strings like:
["\"Marceu\"", " \"Give it to him \x97 he, his wife.\""]]
I tried with :col_sep => ","
(which is the default behaviour) but it gave me 3 splits.
header = CSV.parse(input_string, :quote_char => "'")[0].map{|a| a.strip.downcase unless a.nil? }
#=> ["Name", "main-dialogue"]
I've to parse again for the header as there's no double quote here.
The output is intented to be shown in browser again, so character ó
should show up as usual and not as \x97
or other.
Is there any way to solve the above problems?
I think you do have MacRoman encoded data; if you do this in irb
:
>> "\x97".force_encoding('MacRoman').encode('UTF-8')
you get this:
=> "ó"
And that seems to be the character that you're expecting. So you want this:
input_string = File.read("../csv_parse.rb").force_encoding('MacRoman').encode('UTF-8')
Then you have two columns in your CSV, the columns are quoted with double quotes (so you don't need :quote_char
), and the delimiter is ', '
so this should work:
data = CSV.parse(input_string, :col_sep => ", ")
and data
will look like this:
[
["Name", "main-dialogue"],
["Marceu", "Give it to him ó he, his wife."]
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With