This is what I was doing:
csv = CSV.open(file_name, "r")
I used this for testing:
line = csv.shift
while not line.nil?
puts line
line = csv.shift
end
And I ran into this:
ArgumentError: invalid byte sequence in UTF-8
I read the answer here and this is what I tried
csv = CSV.open(file_name, "r", encoding: "windows-1251:utf-8")
I ran into the following error:
Encoding::UndefinedConversionError: "\x98" to UTF-8 in conversion from Windows-1251 to UTF-8
Then I came across a Ruby gem - charlock_holmes. I figured I'd try using it to find the source encoding.
CharlockHolmes::EncodingDetector.detect(File.read(file_name))
=> {:type=>:text, :encoding=>"windows-1252", :confidence=>37, :language=>"fr"}
So I did this:
csv = CSV.open(file_name, "r", encoding: "windows-1252:utf-8")
And still got this:
Encoding::UndefinedConversionError: "\x8F" to UTF-8 in conversion from Windows-1252 to UTF-8
It looks like you have problem with detecting the valid encoding of your file. CharlockHolmes provide you with useful tip of :confidence=>37
which simply means the detected encoding may not be the right one.
Basing on error messages and test_transcode.rb
from https://github.com/MacRuby/MacRuby/blob/master/test-mri/test/ruby/test_transcode.rb I found the encoding that passes through both of your error messages. With help of String#encode
it's easy to test:
"\x8F\x98".encode("UTF-8","cp1256") # => "ڈک"
Your issue looks like strictly related to the file and not to ruby.
In case we are not sure which encoding to use and can agree to loose some character we can use :invalid
and :undef
params for String#encode
, in this case:
"\x8F\x98".encode("UTF-8", "CP1250",:invalid => :replace, :undef => :replace, :replace => "?") # => "Ź?"
other way is to use Iconv
*//IGNORE
option for target encoding:
Iconv.iconv("UTF-8//IGNORE","CP1250", "\x8F\x98")
As a source encoding suggestion of CharlockHolmes should be pretty good.
PS. String.encode
was introduced in ruby 1.9. With ruby 1.8 you can use Iconv
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With