Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert UTF-8 to ISO-8859-1 in Ruby 2.0? [closed]

Timezones for (date)-times and encoding for strings are no problem if you do not have do convert between them. In Ruby 1.9 and 2.0, encodings seem to be the new timezones from older Ruby versions, they cause nothing but trouble. Iconv has been replaced by the native encoding functions. How do you convert from the standard UTF-8 to ISO-8859-1, for example for the use in Windows systems? In the Ruby 2.0 console the encode function does not work, although it should be able to convert from a source encoding to a destination encoding via encode(dst_encoding, src_encoding) → str?

>> "ABC äöüÄÖÜ".encoding
=> #<Encoding:UTF-8>
>> "ABC äöüÄÖÜ".encode("UTF-8").encode("ISO-8859-1")
=> "ABC \xE4\xF6\xFC\xC4\xD6\xDC"
>> "ABC äöüÄÖÜ".encode("ISO-8859-1","UTF-8")
=> "ABC \xE4\xF6\xFC\xC4\xD6\xDC"

I am using Ruby 2.0.0 (Revision 41674) on a linux system.

like image 786
0x4a6f4672 Avatar asked Oct 09 '13 16:10

0x4a6f4672


1 Answers

The encode method does work.

Let's create a string with U+00FC (ü):

uuml_utf8 = "\u00FC"       #=> "ü"

Ruby encodes this string in UTF-8:

uuml_utf8.encoding         #=> #<Encoding:UTF-8>

In UTF-8, ü is represented as 195 188 (decimal):

uuml_utf8.bytes            #=> [195, 188]

Now let's convert the string to ISO-8859-1:

uuml_latin1 = uuml_utf8.encode("ISO-8859-1")

uuml_latin1.encoding       #=> #<Encoding:ISO-8859-1>

In ISO-8859-1, ü is represented as 252 (decimal):

uuml_latin1.bytes          #=> [252]

In UTF-8 however 252 is an invalid sequence. That's why your terminal/console displays the replacement character "�" (U+FFFD) or no character at all.

In order to display ISO-8859-1 encoded characters, you'll have to switch your terminal/console to that encoding, too.

like image 165
Stefan Avatar answered Sep 23 '22 12:09

Stefan