In Python 3, how do I interpolate a byte string into a regular string and get the same behavior as Python 2 (i.e.: get just the escape codes without the b
prefix or double backslashes)?
e.g.:
Python 2.7:
>>> x = u'\u041c\u0438\u0440'.encode('utf-8')
>>> str(x)
'\xd0\x9c\xd0\xb8\xd1\x80'
>>> 'x = %s' % x
'x = \xd0\x9c\xd0\xb8\xd1\x80'
Python 3.3:
>>> x = u'\u041c\u0438\u0440'.encode('utf-8')
>>> str(x)
"b'\\xd0\\x9c\\xd0\\xb8\\xd1\\x80'"
>>> 'x = %s' % x
"x = b'\\xd0\\x9c\\xd0\\xb8\\xd1\\x80'"
Note how with Python 3, I get the b
prefix in my output and double underscores. The result that I would like to get is the result that I get in Python 2.
In Python, a byte string is represented by a b , followed by the byte string's ASCII representation. A byte string can be decoded back into a character string, if you know the encoding that was used to encode it.
String literals in python are surrounded by either single quotation marks, or double quotation marks. 'hello' is the same as "hello".
One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable. The simplest way to do so is using valueOf() method of String class in java.
A string is a sequence of characters; these are an abstract concept, and can't be directly stored on disk. A byte string is a sequence of bytes - things that can be stored on disk.
In Python 2 you have types str
and unicode
. str
represents a simple byte string while unicode
is a Unicode string.
For Python 3, this changed: Now str
is what was unicode
in Python 2 and byte
is what was str
in Python 2.
So when you do ("x = %s" % '\u041c\u0438\u0440').encode("utf-8")
you can actually omit the u
prefix, as it is implicit. Everything that is not explicitly converted in python is unicode.
This will yield your last line in Python 3:
("x = %s" % '\u041c\u0438\u0440').encode("utf-8")
Now how I encode after the final result, which is what you should always do: Take an incoming object, decode it to unicode (how ever you do that) and then, when making an output, encode it in the encoding of your choice. Don't try to handle raw byte strings. That is just ugly and deprecated behaviour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With