I'm importing a CSV file into Ruby (1.8.7). File.open('path/to/file.csv').read returns this in the console:
Stefan,Engstr\232m
The encoding is identified as iso-8859-2 by UniversalDetector (chardet gem).
UniversalDetector::chardet("Stefan,Engstr\232m")
=> {"confidence"=>0.626936305574385, "encoding"=>"ISO-8859-2"}
Trying to convert the string yields the following:
Iconv.conv("UTF-8", "ISO-8859-2", "Stefan,Engstr\232m")
=> "Stefan,Engstrm"
whereas I would expect:
=> "Stefan,Engström"
Let me know if I should provide more information or elaborate on something.
The encoding is probably "Macintosh Roman", a couple other options would be "Mac Central European" and "Mac Icelandic". The \nnn notation uses octal so \232 is 154 in decimal and character 154 is the lower case O-umlaut ("ö") that you're expecting in all three of those encodings; I don't see 154 in any of the Windows codepages or ISO 8859 character sets. I'd guess that Mac Roman is more common than the Icelandic or Central European encodings.
Try using 'MacRoman' as your source encoding with Iconv:
>> Iconv.conv("UTF-8", "MacRoman", "Stefan,Engstr\232m")
=> "Stefan,Engström"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With