Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert list of bytes (unicode) to Python string?

I have a list of bytes (8 bit bytes, or in C/C++ language they form wchar_t type string), they form an UNICODE string (byte by byte), how to convert those values into a Python string, tried a few things, but none could join those 2 bytes into 1 character and build an entire string from it. Thank you.

like image 428
Bartosz Wójcik Avatar asked May 11 '14 21:05

Bartosz Wójcik


People also ask

How do I convert bytes to strings?

One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable. The simplest way to do so is using valueOf() method of String class in java. lang package.

How do I turn a list into a string in Python?

To convert a list to a string, use Python List Comprehension and the join() function. The list comprehension will traverse the elements one by one, and the join() method will concatenate the list's elements into a new string and return it as output.

How do you convert a byte array into a string?

There are two ways to convert byte array to String: By using String class constructor. By using UTF-8 encoding.


2 Answers

Converting a sequence of bytes to a Unicode string is done by calling the decode() method on that str (in Python 2.x) or bytes (Python 3.x) object.

If you actually have a list of bytes, then, to get this object, you can use ''.join(bytelist) or b''.join(bytelist).

You need to specify the encoding that was used to encode the original Unicode string.

However, the term "Python string" is a bit ambiguous and also version-dependent. The Python str type stands for a byte string in Python 2.x and a Unicode string in Python 3.x. So, in Python 2, just doing ''.join(bytelist) will give you a str object.

Demo for Python 2:

In [1]: 'тест'
Out[1]: '\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'

In [2]: bytelist = ['\xd1', '\x82', '\xd0', '\xb5', '\xd1', '\x81', '\xd1', '\x82']

In [3]: ''.join(bytelist).decode('utf-8')
Out[3]: u'\u0442\u0435\u0441\u0442'

In [4]: print ''.join(bytelist).decode('utf-8') # encodes to the terminal encoding
тест

In [5]: ''.join(bytelist) == 'тест'
Out[5]: True
like image 157
Lev Levitsky Avatar answered Oct 30 '22 20:10

Lev Levitsky


you can also convert the byte list into string list using the decode()

stringlist=[x.decode('utf-8') for x in bytelist]
like image 20
Umer Avatar answered Oct 30 '22 21:10

Umer