What exactly do "u" and "r" string flags do, and what are raw string literals?

2 Answers

There's not really any "raw string"; there are raw string literals, which are exactly the string literals marked by an 'r' before the opening quote.

A "raw string literal" is a slightly different syntax for a string literal, in which a backslash, \, is taken as meaning "just a backslash" (except when it comes right before a quote that would otherwise terminate the literal) -- no "escape sequences" to represent newlines, tabs, backspaces, form-feeds, and so on. In normal string literals, each backslash must be doubled up to avoid being taken as the start of an escape sequence.

This syntax variant exists mostly because the syntax of regular expression patterns is heavy with backslashes (but never at the end, so the "except" clause above doesn't matter) and it looks a bit better when you avoid doubling up each of them -- that's all. It also gained some popularity to express native Windows file paths (with backslashes instead of regular slashes like on other platforms), but that's very rarely needed (since normal slashes mostly work fine on Windows too) and imperfect (due to the "except" clause above).

r'...' is a byte string (in Python 2.*), ur'...' is a Unicode string (again, in Python 2.*), and any of the other three kinds of quoting also produces exactly the same types of strings (so for example r'...', r'''...''', r"...", r"""...""" are all byte strings, and so on).

Not sure what you mean by "going back" - there is no intrinsically back and forward directions, because there's no raw string type, it's just an alternative syntax to express perfectly normal string objects, byte or unicode as they may be.

And yes, in Python 2.*, u'...' is of course always distinct from just '...' -- the former is a unicode string, the latter is a byte string. What encoding the literal might be expressed in is a completely orthogonal issue.

E.g., consider (Python 2.6):

>>> sys.getsizeof('ciao') 28 >>> sys.getsizeof(u'ciao') 34

The Unicode object of course takes more memory space (very small difference for a very short string, obviously ;-).

188

answered Oct 12 '22 14:10

Alex Martelli

There are two types of string in python: the traditional str type and the newer unicode type. If you type a string literal without the u in front you get the old str type which stores 8-bit characters, and with the u in front you get the newer unicode type that can store any Unicode character.

The r doesn't change the type at all, it just changes how the string literal is interpreted. Without the r, backslashes are treated as escape characters. With the r, backslashes are treated as literal. Either way, the type is the same.

ur is of course a Unicode string where backslashes are literal backslashes, not part of escape codes.

You can try to convert a Unicode string to an old string using the str() function, but if there are any unicode characters that cannot be represented in the old string, you will get an exception. You could replace them with question marks first if you wish, but of course this would cause those characters to be unreadable. It is not recommended to use the str type if you want to correctly handle unicode characters.

answered Oct 12 '22 12:10

Mark Byers

Related questions
                            
                                Combine two columns of text in pandas dataframe
                            
                                How to change the font size on a matplotlib plot
                            
                                How to check if a string is a substring of items in a list of strings?
                            
                                Split Strings into words with multiple word boundary delimiters
                            
                                What does -> mean in Python function definitions?
                            
                                How to print the full NumPy array, without truncation?
                            
                                What is the difference between dict.items() and dict.iteritems() in Python2?
                            
                                Is arr.__len__() the preferred way to get the length of an array in Python?
                            
                                How do I set the figure title and axes labels font size in Matplotlib?
                            
                                Why shouldn't I use PyPy over CPython if PyPy is 6.3 times faster?
                            
                                Adding a Method to an Existing Object Instance
                            
                                Why is [] faster than list()?
                            
                                How to print a date in a regular format?
                            
                                How to get the last day of the month?
                            
                                Argparse optional positional arguments?
                            
                                How to test multiple variables against a single value?
                            
                                How can I iterate over files in a given directory?
                            
                                Best way to strip punctuation from a string
                            
                                TypeError: a bytes-like object is required, not 'str' when writing to a file in Python3
                            
                                How can I force division to be floating point? Division keeps rounding down to 0?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What exactly do "u" and "r" string flags do, and what are raw string literals?

Tags:

python

unicode

python-2.x

rawstring

e-satis

People also ask

2 Answers

Alex Martelli

Mark Byers

Recent Activity

Donate For Us