According to the docs, the builtin string encoding <code>string_escape</code>: <blockquote> Produce[s] a string that is suitable as string literal in Python source code </blockquote> ...while the <code>unicode_escape</code>: <blockquote> Produce[s] a string that is suitable as Unicode literal in Python source code </blockquote> So, they should have roughly the same behaviour. BUT, they appear to treat single quotes differently: <pre class="prettyprint lang-none prettyprint-override"><code>>>> print """before '" \0 after""".encode('string-escape') before \'" \x00 after >>> print """before '" \0 after""".encode('unicode-escape') before '" \x00 after </code></pre> The <code>string_escape</code> escapes the single quote while the Unicode one does not. Is it safe to assume that I can simply: <pre class="prettyprint"><code>>>> escaped = my_string.encode('unicode-escape').replace("'", "\\'") </code></pre> ...and get the expected behaviour? Edit: Just to be super clear, the expected behavior is getting something suitable as a literal.

According to my interpretation of the implementation of <code>unicode-escape</code> and the unicode <code>repr</code> in the CPython 2.6.5 source, yes; the only difference between <code>repr(unicode_string)</code> and <code>unicode_string.encode('unicode-escape')</code> is the inclusion of wrapping quotes and escaping whichever quote was used. They are both driven by the same function, <code>unicodeescape_string</code>. This function takes a parameter whose sole function is to toggle the addition of the wrapping quotes and escaping of that quote.

Within the range 0 ≤ c < 128, yes the <code>'</code> is the only difference for CPython 2.6. <pre class="prettyprint"><code>>>> set(unichr(c).encode('unicode_escape') for c in range(128)) - set(chr(c).encode('string_escape') for c in range(128)) set(["'"]) </code></pre> Outside of this range the two types are not exchangeable. <pre class="prettyprint"><code>>>> '\x80'.encode('string_escape') '\\x80' >>> '\x80'.encode('unicode_escape') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can’t decode byte 0x80 in position 0: ordinal not in range(128) >>> u'1'.encode('unicode_escape') '1' >>> u'1'.encode('string_escape') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: escape_encode() argument 1 must be str, not unicode </code></pre> On Python 3.x, the <code>string_escape</code> encoding no longer exists, since <code>str</code> can only store Unicode.

Python "string_escape" vs "unicode_escape"

Tags:

python

escaping

encoding

python-2.x

quotes

According to the docs, the builtin string encoding string_escape:

Produce[s] a string that is suitable as string literal in Python source code

...while the unicode_escape:

Produce[s] a string that is suitable as Unicode literal in Python source code

So, they should have roughly the same behaviour. BUT, they appear to treat single quotes differently:

Click to copy

>>> print """before '" \0 after""".encode('string-escape') before \'" \x00 after >>> print """before '" \0 after""".encode('unicode-escape') before '" \x00 after

The string_escape escapes the single quote while the Unicode one does not. Is it safe to assume that I can simply:

Click to copy

>>> escaped = my_string.encode('unicode-escape').replace("'", "\\'")

...and get the expected behaviour?

Edit: Just to be super clear, the expected behavior is getting something suitable as a literal.

698

asked Jun 03 '10 19:06

Mike Boers

2 Answers

According to my interpretation of the implementation of unicode-escape and the unicode repr in the CPython 2.6.5 source, yes; the only difference between repr(unicode_string) and unicode_string.encode('unicode-escape') is the inclusion of wrapping quotes and escaping whichever quote was used.

They are both driven by the same function, unicodeescape_string. This function takes a parameter whose sole function is to toggle the addition of the wrapping quotes and escaping of that quote.

119

answered Oct 09 '22 13:10

Mike Boers

Within the range 0 ≤ c < 128, yes the ' is the only difference for CPython 2.6.

Click to copy

>>> set(unichr(c).encode('unicode_escape') for c in range(128)) - set(chr(c).encode('string_escape') for c in range(128)) set(["'"])

Outside of this range the two types are not exchangeable.

Click to copy

>>> '\x80'.encode('string_escape') '\\x80' >>> '\x80'.encode('unicode_escape') Traceback (most recent call last):   File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can’t decode byte 0x80 in position 0: ordinal not in range(128)  >>> u'1'.encode('unicode_escape') '1' >>> u'1'.encode('string_escape') Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: escape_encode() argument 1 must be str, not unicode

On Python 3.x, the string_escape encoding no longer exists, since str can only store Unicode.

answered Oct 09 '22 13:10

kennytm

Related questions
                            
                                Python datetime.utcnow() returning incorrect datetime
                            
                                last_login field is not updated when authenticating using Tokenauthentication in Django Rest Framework
                            
                                Why and When to use Django mark_safe() function
                            
                                How to find the features names of the coefficients using scikit linear regression?
                            
                                Checking for member existence in Python
                            
                                What does the term "blocking" mean in programming?
                            
                                Numpy image - rotate matrix 270 degrees
                            
                                Python equivalent of Ruby's 'method_missing'
                            
                                Python ncurses, CDK, urwid difference
                            
                                Dictionary access speed comparison with integer key against string key
                            
                                How to expose a property (virtual field) on a Django Model as a field in a TastyPie ModelResource
                            
                                Disable ipython console in pycharm
                            
                                Python get mouse x, y position on click
                            
                                Python multi-thread multi-interpreter C API
                            
                                NaN values when new column added to pandas DataFrame
                            
                                What does dtype=object mean while creating a numpy array?
                            
                                Convert column to row in Python Pandas
                            
                                Python 3 range Vs Python 2 range
                            
                                Set pyflake AND mypy ignore same line
                            
                                How to access url hash/fragment from a Django Request object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python "string_escape" vs "unicode_escape"

Tags:

python

escaping

encoding

python-2.x

quotes

Mike Boers

People also ask

2 Answers

Mike Boers

kennytm

Recent Activity

Donate For Us