Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Printing unicode number of chars in a string (Python)

This should be simple, but I can't crack it.

I have a string of Arabic symbols between u'\u0600' - u'\u06FF' and u'\uFB50' - u'\uFEFF'. For example غينيا واستمر العصبة ضرب قد.

How do I print each character's unicode number? I'm using Python 2.7.


Something like the following gives me decoding Unicode is not supported:

for c in example_string:
    print unicode(c,'utf-8')
like image 635
Hassan Baig Avatar asked Jan 29 '23 18:01

Hassan Baig


2 Answers

You can use the ord() function.

for c in example_string:
    print(ord(c), hex(ord(c)), c.encode('utf-8'))

will give you the decimal, hex codepoint as well as the UTF-8 encoding for this character, like so:

(1594, '0x63a', '\xd8\xba')
(1610, '0x64a', '\xd9\x8a')
(1606, '0x646', '\xd9\x86')
(1610, '0x64a', '\xd9\x8a')
(1575, '0x627', '\xd8\xa7')
(32, '0x20', ' ')
  :
  :
like image 165
Arminius Avatar answered Feb 01 '23 06:02

Arminius


In a comment you said '\u06FF is what I'm trying to print' - this could also be done using Python's repr function, although you seem pretty happy with hex(ord(c)). This may be useful for someone looking for a way to find an ascii representation of a unicode character, though.

example_string = u'\u063a\u064a\u0646\u064a'

for c in example_string:
    print repr(c), c

gives output

u'\u063a' غ
u'\u064a' ي
u'\u0646' ن
u'\u064a' ي

If you want to strip out the Python unicode literal part, you can quite simply do

for c in example_string:
    print repr(c)[2:-1], c

to get the output

\u063a غ
\u064a ي
\u0646 ن
\u064a ي
like image 42
Izaak van Dongen Avatar answered Feb 01 '23 07:02

Izaak van Dongen