Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3: How do I get a string literal representation of a byte string?

In Python 3, how do I interpolate a byte string into a regular string and get the same behavior as Python 2 (i.e.: get just the escape codes without the b prefix or double backslashes)?

e.g.:

Python 2.7:

>>> x = u'\u041c\u0438\u0440'.encode('utf-8')
>>> str(x)
'\xd0\x9c\xd0\xb8\xd1\x80'
>>> 'x = %s' % x
'x = \xd0\x9c\xd0\xb8\xd1\x80'

Python 3.3:

>>> x = u'\u041c\u0438\u0440'.encode('utf-8')
>>> str(x)
"b'\\xd0\\x9c\\xd0\\xb8\\xd1\\x80'"
>>> 'x = %s' % x
"x = b'\\xd0\\x9c\\xd0\\xb8\\xd1\\x80'"

Note how with Python 3, I get the b prefix in my output and double underscores. The result that I would like to get is the result that I get in Python 2.

like image 551
Marc Abramowitz Avatar asked Mar 13 '13 16:03

Marc Abramowitz


People also ask

How do you represent a byte string in Python?

In Python, a byte string is represented by a b , followed by the byte string's ASCII representation. A byte string can be decoded back into a character string, if you know the encoding that was used to encode it.

How do you specify a string literal in Python?

String literals in python are surrounded by either single quotation marks, or double quotation marks. 'hello' is the same as "hello".

How do you convert bytes to strings?

One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable. The simplest way to do so is using valueOf() method of String class in java.

What is the difference between a string and a byte string?

A string is a sequence of characters; these are an abstract concept, and can't be directly stored on disk. A byte string is a sequence of bytes - things that can be stored on disk.


1 Answers

In Python 2 you have types str and unicode. str represents a simple byte string while unicode is a Unicode string.

For Python 3, this changed: Now str is what was unicode in Python 2 and byte is what was str in Python 2.

So when you do ("x = %s" % '\u041c\u0438\u0440').encode("utf-8") you can actually omit the u prefix, as it is implicit. Everything that is not explicitly converted in python is unicode.

This will yield your last line in Python 3:

 ("x = %s" % '\u041c\u0438\u0440').encode("utf-8")

Now how I encode after the final result, which is what you should always do: Take an incoming object, decode it to unicode (how ever you do that) and then, when making an output, encode it in the encoding of your choice. Don't try to handle raw byte strings. That is just ugly and deprecated behaviour.

like image 123
javex Avatar answered Sep 22 '22 02:09

javex