I would like to turn this string:
a = '\\a'
into this one
b = '\a'
It doesn't seem like there is an obvious way to do this with replace
?
To be more precise, I want to change the escaping of the backslash to escaping the character a
.
replaceAll("\\","");
The character '\a' is the ASCII BEL character, chr(7).
To do the conversion in Python 2:
from __future__ import print_function
a = '\\a'
c = a.decode('string-escape')
print(repr(a), repr(c))
output
'\\a' '\x07'
And for future reference, in Python 3:
a = '\\a'
b = bytes(a, encoding='ascii')
c = b.decode('unicode-escape')
print(repr(a), repr(c))
This gives identical output to the above snippet.
In Python 3, if you were working with bytes objects you'd do something like this:
a = b'\\a'
c = bytes(a.decode('unicode-escape'), 'ascii')
print(repr(a), repr(c))
output
b'\\a' b'\x07'
As Antti Haapala mentions, this simple strategy for Python 3 won't work if the source string contains unicode characters too. In tha case, please see his answer for a more robust solution.
On Python 2 you can use
>>> '\\a'.decode('string_escape')
'\x07'
Note how \a
is repr'd as \x07
.
If the string is a unicode string with also extended characters, you need to decode it to a bytestring first, otherwise the default encoding (ascii!) is used to convert the unicode object to a bytestring first.
However, this codec doesn't exist in Python 3, and things are very much more complicated. You can use the unicode-escape
to decode but it is very broken if the source string contains unicode characters too:
>>> '\aäầ'.encode().decode('unicode_escape')
'\x07äầ'
The resulting string doesn't consist of Unicode characters but bytes decoded as latin-1. The solution is to re-encode to latin-1 and then decode as utf8 again:
>>> '\\aäầ\u1234'.encode().decode('unicode_escape').encode('latin1').decode()
'\x07äầሴ'
Unescape string is what I searched for to find this:
>>> a = r'\a'
>>> a.encode().decode('unicode-escape')
'\x07'
>>> '\a'
'\x07'
That's the way to do it with unicode. Since you're in Python 2 and may not be using unicode, you may actually one:
>>> a.decode('string-escape')
'\x07'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With