Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I escape a Unicode string with Ruby?

Tags:

ruby

unicode

I need to encode/convert a Unicode string to its escaped form, with backslashes. Anybody know how?

like image 783
Dogweather Avatar asked Apr 06 '11 02:04

Dogweather


People also ask

How do you escape unicode characters?

A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.

How do you escape a regular expression in Ruby?

Regexp escape form : /a/ Regexp escape form : \\\*\? \{\}\.

How do I escape unicode in HTML?

Escapes start with a backslash followed by the hexadecimal number that represents the character's hexadecimal Unicode code point value. If there is a following character that is not in the range A–F, a–f or 0–9, that is all you need.


2 Answers

In Ruby 1.8.x, String#inspect may be what you are looking for, e.g.

>> multi_byte_str = "hello\330\271!"
=> "hello\330\271!"

>> multi_byte_str.inspect
=> "\"hello\\330\\271!\""

>> puts multi_byte_str.inspect
"hello\330\271!"
=> nil

In Ruby 1.9 if you want multi-byte characters to have their component bytes escaped, you might want to say something like:

>> multi_byte_str.bytes.to_a.map(&:chr).join.inspect
=> "\"hello\\xD8\\xB9!\""

In both Ruby 1.8 and 1.9 if you are instead interested in the (escaped) unicode code points, you could do this (though it escapes printable stuff too):

>> multi_byte_str.unpack('U*').map{ |i| "\\u" + i.to_s(16).rjust(4, '0') }.join
=> "\\u0068\\u0065\\u006c\\u006c\\u006f\\u0639\\u0021"
like image 85
Jon Jensen Avatar answered Oct 10 '22 06:10

Jon Jensen


To use a unicode character in Ruby use the "\uXXXX" escape; where XXXX is the UTF-16 codepoint. see http://leejava.wordpress.com/2009/03/11/unicode-escape-in-ruby/

like image 39
Richard Schneider Avatar answered Oct 10 '22 04:10

Richard Schneider