Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby: Convert encoded character to actual UTF-8 character

Ruby will not play nice with UTF-8 strings. I am passing data in an XML file and although the XML document is specified as UTF-8 it treats the ascii encoding (two bytes per character) as individual characters.

I have started encoding the input strings in the '\uXXXX' format, however I can not figure out how to convert this to an actual UTF-8 character. I have been searching all over on this site and google to no avail and my frustration is pretty high right now. I am using Ruby 1.8.6

Basically, I want to convert the string '\u03a3' -> "Σ".

What I had is:

data.gsub /\\u([a-zA-Z0-9]{4})/,  $1.hex.to_i.chr

Which of course gives "931 out of char range" error.

Thank you Tim

like image 465
Tim Reynolds Avatar asked Dec 23 '22 07:12

Tim Reynolds


1 Answers

Try this :

[0x50].pack("U")

where 0x50 is the hex code of the utf8 char.

like image 62
webtu Avatar answered Jan 08 '23 20:01

webtu