Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a symbol to its 4 digit unicode escape representation and vice versa

1) How can I convert a symbol to its 4 digit Unicode escape representation in python 2.7 e.g "¥" to "\u00a5"?

2) How can I convert a Unicode representation to the symbol notation on Windows 7/8 platform e.g "\u00a5" to "¥"?

like image 618
rdp Avatar asked Jul 30 '14 03:07

rdp


2 Answers

1) Does it need to be \u-escaped? Will \x work? If so, try the unicode_escape codec. Otherwise, you can convert using the function below:

def four_digit_escape(string):
    return u''.join(char if 32 <= ord(char) <= 126 else u'\\u%04x'%ord(char) for char in string)

symbol = u"hello ¥"
print symbol.encode('unicode_escape')
print four_digit_escape(symbol)

2) Similarly, you can use the unicode_escape codec:

encoded_symbol = '\\u00a5'
print encoded_symbol
print encoded_symbol.decode('unicode_escape')
like image 144
Robᵩ Avatar answered Sep 25 '22 21:09

Robᵩ


The most reliable way I found to do this in python is to first decode it into unicode, get the ord of the unicode character and plug that into a format string. It looks like this:

"\\u%04x" % ord("¥".decode("utf-8"))

There is also a method unichr that is supposed to output something like this, but on my system it displays a different encoding than what the op wanted. So the above solution is the most platform independent way that I can think of.

like image 34
Andrew Johnson Avatar answered Sep 23 '22 21:09

Andrew Johnson