I have a code such that:
a = "\u0432" b = u"\u0432" c = b"\u0432" d = c.decode('utf8') print(type(a), a) print(type(b), b) print(type(c), c) print(type(d), d)
And output:
<class 'str'> в <class 'str'> в <class 'bytes'> b'\\u0432' <class 'str'> \u0432
Why in the latter case I see a character code, instead of the character? How I can transform Byte string to Unicode string that in case of an output I saw the character, instead of its code?
Given a Byte value in Java, the task is to convert this byte value to string type. One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable.
A character in a str represents one Unicode character. However, to represent more than 256 characters, individual Unicode encodings use more than one byte per character to represent many characters. bytes objects give you access to the underlying bytes.
To allow working with Unicode characters, Python 2 has a unicode type which is a collection of Unicode code points (like Python 3's str type). The line ustring = u'A unicode \u018e string \xf1' creates a Unicode string with 20 characters.
In strings (or Unicode objects in Python 2), \u
has a special meaning, namely saying, "here comes a Unicode character specified by it's Unicode ID". Hence u"\u0432"
will result in the character в.
The b''
prefix tells you this is a sequence of 8-bit bytes, and bytes object has no Unicode characters, so the \u
code has no special meaning. Hence, b"\u0432"
is just the sequence of the bytes \
,u
,0
,4
,3
and 2
.
Essentially you have an 8-bit string containing not a Unicode character, but the specification of a Unicode character.
You can convert this specification using the unicode escape encoder.
>>> c.decode('unicode_escape') 'в'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With