Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getting bytes from unicode string in python

I have an 16bit big endian unicode string represented as u'\u4132',

how can I split it into integers 41 and 32 in python ?

like image 968
altunyurt Avatar asked Nov 21 '10 19:11

altunyurt


2 Answers

Here are a variety of different ways you may want it.

Python 2:

>>> chars = u'\u4132'.encode('utf-16be')
>>> chars
'A2'
>>> ord(chars[0])
65
>>> '%x' % ord(chars[0])
'41'
>>> hex(ord(chars[0]))
'0x41'
>>> ['%x' % ord(c) for c in chars]
['41', '32']
>>> [hex(ord(c)) for c in chars]
['0x41', '0x32']

Python 3:

>>> chars = '\u4132'.encode('utf-16be')
>>> chars
b'A2'
>>> chars = bytes('\u4132', 'utf-16be')
>>> chars  # Just the same.
b'A2'
>>> chars[0]
65
>>> '%x' % chars[0]
'41'
>>> hex(chars[0])
'0x41'
>>> ['%x' % c for c in chars]
['41', '32']
>>> [hex(c) for c in chars]
['0x41', '0x32']
like image 175
Chris Morgan Avatar answered Oct 20 '22 00:10

Chris Morgan


  • Java: "\u4132".getBytes("UTF-16BE")
  • Python 2: u'\u4132'.encode('utf-16be')
  • Python 3: '\u4132'.encode('utf-16be')

These methods return a byte array, which you can convert to an int array easily. But note that code points above U+FFFF will be encoded using two code units (so with UTF-16BE this means 32 bits or 4 bytes).

like image 44
Roland Illig Avatar answered Oct 20 '22 00:10

Roland Illig