Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove escape character from string

I would like to turn this string:

a = '\\a'

into this one

b = '\a'

It doesn't seem like there is an obvious way to do this with replace?

To be more precise, I want to change the escaping of the backslash to escaping the character a.

like image 970
elelias Avatar asked Nov 06 '16 18:11

elelias


People also ask

How remove all escape characters from JSON string?

replaceAll("\\","");


3 Answers

The character '\a' is the ASCII BEL character, chr(7).

To do the conversion in Python 2:

from __future__ import print_function
a = '\\a'
c = a.decode('string-escape')
print(repr(a), repr(c))

output

'\\a' '\x07'

And for future reference, in Python 3:

a = '\\a'
b = bytes(a, encoding='ascii')
c = b.decode('unicode-escape')
print(repr(a), repr(c))

This gives identical output to the above snippet.

In Python 3, if you were working with bytes objects you'd do something like this:

a = b'\\a'
c = bytes(a.decode('unicode-escape'), 'ascii')
print(repr(a), repr(c))

output

b'\\a' b'\x07'

As Antti Haapala mentions, this simple strategy for Python 3 won't work if the source string contains unicode characters too. In tha case, please see his answer for a more robust solution.

like image 176
PM 2Ring Avatar answered Oct 20 '22 16:10

PM 2Ring


On Python 2 you can use

>>> '\\a'.decode('string_escape')
'\x07'

Note how \a is repr'd as \x07.

If the string is a unicode string with also extended characters, you need to decode it to a bytestring first, otherwise the default encoding (ascii!) is used to convert the unicode object to a bytestring first.


However, this codec doesn't exist in Python 3, and things are very much more complicated. You can use the unicode-escape to decode but it is very broken if the source string contains unicode characters too:

>>> '\aäầ'.encode().decode('unicode_escape')
'\x07äầ'

The resulting string doesn't consist of Unicode characters but bytes decoded as latin-1. The solution is to re-encode to latin-1 and then decode as utf8 again:

>>> '\\aäầ\u1234'.encode().decode('unicode_escape').encode('latin1').decode()
'\x07äầሴ'

Unescape string is what I searched for to find this:

>>> a = r'\a'
>>> a.encode().decode('unicode-escape')
'\x07'
>>> '\a'
'\x07'

That's the way to do it with unicode. Since you're in Python 2 and may not be using unicode, you may actually one:

>>> a.decode('string-escape')
'\x07'
like image 27
Trey Hunner Avatar answered Oct 20 '22 18:10

Trey Hunner