Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert an escaped unicode String to its chars in ruby 1.8

I have to read some text files with the following content:

\u201CThe Pedlar Lady of Gushing Cross\u201D

In ruby 1.9 terminal, when I create a string with this content:

ruby-1.9.1-p378 > "\u2714 \u2714 my great string \u2714 \u2714"
 => "✔ ✔ my great string ✔ ✔" 

In ruby 1.8, I don't get the unicode codes converted to their characters:

ree-1.8.7-2010.01 > "\u2714 \u2714 my great string \u2714 \u2714"
 => "u2714 u2714 my great string u2714 u2714" 

Is there any easy way to return the right string chars in Ruby 1.8?

like image 957
Daniel Cukier Avatar asked Oct 29 '10 19:10

Daniel Cukier


3 Answers

For anyone else who stumbles on this question (like me) looking for an answer, the equivalent way of doing this in Ruby 1.8 would be:

["2714".to_i(16)].pack("U*")
like image 77
Dave Avatar answered Sep 22 '22 11:09

Dave


The simplest approach might be to use a JSON parser, as JSON happens to use this very format:

irb(main):014:0> JSON '["\u2714 \u2714 my great string \u2714 \u2714"]'
=> ["\342\234\224 \342\234\224 my great string \342\234\224 \342\234\224"]
like image 40
Martin v. Löwis Avatar answered Sep 22 '22 11:09

Martin v. Löwis


This builds on @Dave's answer. I'm using the following to replace all Unicode escape sequences in a given string with the corresponding character:

string_value.gsub(/\\u([0-9a-fA-F]{4})/) {|m| [$1.hex].pack("U")}

It's a regular expression that looks for "\u" followed by 4 hexadecimal symbols. It then throws away the "\u", converts the 4 hex symbols to an integer and uses pack to get the Unicode character. It replaces each escape sequence with the corresponding character and returns the resulting string.

It will give you trouble if your string is escaped further (e.g. by having "\" escaped as "\\"). But in the vanilla case it should work fine.

like image 34
Pieter Müller Avatar answered Sep 18 '22 11:09

Pieter Müller