This should be simple, but I can't crack it.
I have a string of Arabic symbols between u'\u0600'
- u'\u06FF'
and u'\uFB50'
- u'\uFEFF'
. For example غينيا واستمر العصبة ضرب قد
.
How do I print each character's unicode number? I'm using Python 2.7.
Something like the following gives me decoding Unicode is not supported
:
for c in example_string:
print unicode(c,'utf-8')
You can use the ord()
function.
for c in example_string:
print(ord(c), hex(ord(c)), c.encode('utf-8'))
will give you the decimal, hex codepoint as well as the UTF-8 encoding for this character, like so:
(1594, '0x63a', '\xd8\xba')
(1610, '0x64a', '\xd9\x8a')
(1606, '0x646', '\xd9\x86')
(1610, '0x64a', '\xd9\x8a')
(1575, '0x627', '\xd8\xa7')
(32, '0x20', ' ')
:
:
In a comment you said '\u06FF
is what I'm trying to print' - this could also be done using Python's repr
function, although you seem pretty happy with hex(ord(c)). This may be useful for someone looking for a way to find an ascii representation of a unicode character, though.
example_string = u'\u063a\u064a\u0646\u064a'
for c in example_string:
print repr(c), c
gives output
u'\u063a' غ
u'\u064a' ي
u'\u0646' ن
u'\u064a' ي
If you want to strip out the Python unicode literal part, you can quite simply do
for c in example_string:
print repr(c)[2:-1], c
to get the output
\u063a غ
\u064a ي
\u0646 ن
\u064a ي
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With