I already know <code>r"string"</code> in Python 2.7 often used for regex patterns. I also have seen <code>u"string"</code> for, I think, Unicode strings. Now with Python 3 we see <code>b"string"</code>. I have searched for these in different sources / questions, such as What does a b prefix before a python string mean?, but it's difficult to see the big picture of all these strings with prefixes in Python, especially with Python 2 vs 3. Question: would you have a rule of thumb to remember the different types of strings with prefixes in Python? (or maybe a table with a column for Python 2 and one for Python 3?) NB: I have read a few questions+answers but I haven't found an easy to remember comparison with all prefixes / Python 2+3

<ol> <li> u-strings if for unicode in python 2. Most probably you should forget this, if you're working with modern applications — default strings in python 3 is all unicode, and if you're migrating from python 2, you'll most probably use <code>from __future__ import unicode_literals</code>, which makes [almost] the same for python 2 </li> <li> b-strings is for raw bytes — have no idea of text, rather just stream of bytes. Rarely used as input for your source, most often as result of network or low-level code — reading data in binary format, unpacking archives, working with encryption libraries. Moving from/to b-string to <code>str</code> done via <pre class="prettyprint"><code># python 3 >>> 'hēllö'.encode('utf-8') b'h\xc4\x93ll\xc3\xb6' >>> b'h\xc4\x93ll\xc3\xb6'.decode() 'hēllö' # python 2 without __future__ >>> u'hēllö'.encode('utf-8') 'h\xc4\x93ll\xc3\xb6' >>> 'h\xc4\x93ll\xc3\xb6'.decode('utf-8') u'h\u0113ll\xf6' # this is correct representation </code></pre> </li> <li> r-strings is not specifically for regex, this is "raw" string. Unlike regular string literals, r-string doesn't give any special meaning for escape characters. I.e. normal string <code>'abc\n'</code> is 4 characters long, last char is "newline" special character. To provide it in literal, we're using escaping with <code>\</code>. For raw strings, <code>r'abc\n'</code> is 5-length string, last two characters is literally <code>\</code> and <code>n</code>. Two places to see raw strings often: </li> </ol> <ul> <li> regex patterns — to not mess escaping with actual special characters in patters </li> <li> file path notations for windows systems, as windows family uses <code>\</code> as delimeter, normal string literals will look like <code>'C:\\dir\\file'</code>, or <code>'\\\\share\\dir'</code>, while raw would be nicer: <code>r'C:\dir\file'</code> and <code>r'\\share\dir'</code> respectively </li> </ul> <ol start="4"> <li>One more notable is f-strings, which came to life with python 3.6 as simple and powerful way of formatting strings:</li> </ol> <ul> <li> <code>f'a equals {a} and b is {b}'</code> will substitute variables <code>a</code> and <code>b</code> in runtime.</li> </ul>

r"string" b"string" u"string" Python 2 / 3 comparison

Tags:

python

string

python-3.x

python-2.7

I already know r"string" in Python 2.7 often used for regex patterns. I also have seen u"string" for, I think, Unicode strings. Now with Python 3 we see b"string".

I have searched for these in different sources / questions, such as What does a b prefix before a python string mean?, but it's difficult to see the big picture of all these strings with prefixes in Python, especially with Python 2 vs 3.

Question: would you have a rule of thumb to remember the different types of strings with prefixes in Python? (or maybe a table with a column for Python 2 and one for Python 3?)

^{NB: I have read a few questions+answers but I haven't found an easy to remember comparison with all prefixes / Python 2+3}

943

asked Feb 05 '19 11:02

Basj

2 Answers

From the python docs for literals: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

Both string and bytes literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and treat backslashes as literal characters. As a result, in string literals, '\U' and '\u' escapes in raw strings are not treated specially. Given that Python 2.x’s raw unicode literals behave differently than Python 3.x’s the 'ur' syntax is not supported.

and

A string literal with 'f' or 'F' in its prefix is a formatted string literal; see Formatted string literals. The 'f' may be combined with 'r', but not with 'b' or 'u', therefore raw formatted strings are possible, but formatted bytes literals are not.

So:

r means raw
b means bytes
u means unicode
f means format

The r and b were already available in Python 2, as such in many other languages (they are very handy sometimes).

Since the strings literals were not unicode in Python 2, the u-strings were created to offer support for internationalization. As of Python 3, u-strings are the default strings, so "..." is semantically the same as u"...".

Finally, from those, the f-string is the only one that isn't supported in Python 2.

198

answered Sep 28 '22 15:09

Vitor SRG

u-strings if for unicode in python 2. Most probably you should forget this, if you're working with modern applications — default strings in python 3 is all unicode, and if you're migrating from python 2, you'll most probably use from __future__ import unicode_literals, which makes [almost] the same for python 2
b-strings is for raw bytes — have no idea of text, rather just stream of bytes. Rarely used as input for your source, most often as result of network or low-level code — reading data in binary format, unpacking archives, working with encryption libraries.

Moving from/to b-string to str done via
```
# python 3
>>> 'hēllö'.encode('utf-8')
b'h\xc4\x93ll\xc3\xb6'

>>> b'h\xc4\x93ll\xc3\xb6'.decode()
'hēllö'

# python 2 without __future__
>>> u'hēllö'.encode('utf-8')
'h\xc4\x93ll\xc3\xb6'

>>> 'h\xc4\x93ll\xc3\xb6'.decode('utf-8')
u'h\u0113ll\xf6'  # this is correct representation
```
r-strings is not specifically for regex, this is "raw" string. Unlike regular string literals, r-string doesn't give any special meaning for escape characters. I.e. normal string 'abc\n' is 4 characters long, last char is "newline" special character. To provide it in literal, we're using escaping with \. For raw strings, r'abc\n' is 5-length string, last two characters is literally \ and n. Two places to see raw strings often:

regex patterns — to not mess escaping with actual special characters in patters
file path notations for windows systems, as windows family uses \ as delimeter, normal string literals will look like 'C:\\dir\\file', or '\\\\share\\dir', while raw would be nicer: r'C:\dir\file' and r'\\share\dir' respectively

One more notable is f-strings, which came to life with python 3.6 as simple and powerful way of formatting strings:

f'a equals {a} and b is {b}' will substitute variables a and b in runtime.

answered Sep 28 '22 13:09

Slam

Related questions
                            
                                Create a Legend on a Folium map
                            
                                Why does scipy.norm.pdf sometimes give PDF > 1? How to correct it?
                            
                                How do I install modules on qpython3 (Android port of python)
                            
                                Where does next_batch in the TensorFlow tutorial batch_xs, batch_ys = mnist.train.next_batch(100) come from?
                            
                                Create tuple of multiple items n Times in Python
                            
                                How can I change a specific row label in a Pandas dataframe?
                            
                                How to find out the accuracy?
                            
                                SSL failure on Windows using python requests
                            
                                Wrapping C++ code with python (manually)
                            
                                [Django rest framework]: Serialize a list of strings
                            
                                Appending pandas Data Frame to Google spreadsheet
                            
                                Access standardized residuals, cook's values, hatvalues (leverage) etc. easily in Python?
                            
                                Issue in using win32com to access Excel file
                            
                                Pandas unable to reset index because name exist
                            
                                Docker ENTRYPOINT with ENV variable and optional arguments
                            
                                Python Pandas: How to set the name of multiindex?
                            
                                AttributeError: module 'cv2.cv2' has no attribute 'bgsegm
                            
                                What is the difference between S3.Client.upload_file() and S3.Client.upload_fileobj()?
                            
                                Intersection of two list of dictionaries based on a key
                            
                                Django REST-Auth Password Reset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With