I have to read some text files with the following content:
\u201CThe Pedlar Lady of Gushing Cross\u201D
In ruby 1.9 terminal, when I create a string with this content:
ruby-1.9.1-p378 > "\u2714 \u2714 my great string \u2714 \u2714"
=> "✔ ✔ my great string ✔ ✔"
In ruby 1.8, I don't get the unicode codes converted to their characters:
ree-1.8.7-2010.01 > "\u2714 \u2714 my great string \u2714 \u2714"
=> "u2714 u2714 my great string u2714 u2714"
Is there any easy way to return the right string chars in Ruby 1.8?
For anyone else who stumbles on this question (like me) looking for an answer, the equivalent way of doing this in Ruby 1.8 would be:
["2714".to_i(16)].pack("U*")
The simplest approach might be to use a JSON parser, as JSON happens to use this very format:
irb(main):014:0> JSON '["\u2714 \u2714 my great string \u2714 \u2714"]'
=> ["\342\234\224 \342\234\224 my great string \342\234\224 \342\234\224"]
This builds on @Dave's answer. I'm using the following to replace all Unicode escape sequences in a given string with the corresponding character:
string_value.gsub(/\\u([0-9a-fA-F]{4})/) {|m| [$1.hex].pack("U")}
It's a regular expression that looks for "\u" followed by 4 hexadecimal symbols. It then throws away the "\u", converts the 4 hex symbols to an integer and uses pack to get the Unicode character. It replaces each escape sequence with the corresponding character and returns the resulting string.
It will give you trouble if your string is escaped further (e.g. by having "\" escaped as "\\"). But in the vanilla case it should work fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With