Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby 1.9: Convert byte array to string with multibyte UTF-8 characters

Tags:

I am trying to find a way in Ruby to take a UTF-8 byte array and transform it back to a string.

In irb (Ruby 1.9.2 preview 3) I can create the correct byte array from UTF-8 string:

ruby-1.9.2-preview3 > 'Café'.bytes.to_a  => [67, 97, 102, 195, 169] 

However, I can't find a way to roundtrip from bytes back to an array. I tried to use Array.pack with the U* option, but that doesn't work for multibyte characters.

ruby-1.9.2-preview3 > [67, 97, 102, 195, 169].pack('U*')  => "Café" 

Does anybody know a way to take a UTF-8 byte array with multibyte characters and convert it back to a string?

Thanks.

like image 512
Charlie Avatar asked Dec 13 '10 20:12

Charlie


1 Answers

This has to do with how pack interprets its input data. The U* in your example causes it to convert the input data (assumed to be in a default character set, I assume; I really couldn't find any documentation of this) to UTF-8, thus the double encoding. Instead, just pack the bytes and interpret as UTF-8:

irb(main):010:0> [67, 97, 102, 195, 169].pack('C*').force_encoding('utf-8') => "Café" 
like image 78
Jakob Borg Avatar answered Oct 16 '22 18:10

Jakob Borg