Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby: unescape unicode string

Unicode string:

string = "CEO Frye \u2013 response to Capitalism discussion in Davos: Vote aggressively with your wallet against firms without social conscience."

I tried (via Is this the best way to unescape unicode escape sequences in Ruby?):

def unescape_unicode(s)
   s.gsub(/\\u([\da-fA-F]{4})/) {|m| [$1].pack("H*").unpack("n*").pack("U*")}
end

unescape_unicode(string) #=> CEO Frye \u2013 response to Capitalism discussion in Davos: Vote aggressively with your wallet against firms without social conscience. 

But output (to file) is still identical to input! Any help would be appreciated.

Edit: Not using IRB, using RubyMine, and input is parsed from Twitter, hence the single "\u" not "\\u"

Edit 2: RubyMine IDEOutput

like image 958
Mr. Demetrius Michael Avatar asked Oct 09 '22 17:10

Mr. Demetrius Michael


1 Answers

Are you trying it from irb, or outputting the string with p?

String#inspect (called from irb and p str) transform unicode characters into \uxxxx format to allow the string to be printed anywhere. Also, when you type "CEO Frye \u2013 response to...", this is a escaped sequence resolved by the ruby parser. It is a unicode character in the final string.

str1 = "a\u2013b"
str1.size #=> 3
str2 = "a\\u2013b"
str2.size #=> 8
unescape_unicode(str2) == str1 #=> true
like image 156
Guilherme Bernal Avatar answered Oct 12 '22 11:10

Guilherme Bernal