I want to convert strings containing escaped characters to their normal form, the same way Python's lexical parser does:
>>> escaped_str = 'One \\\'example\\\''
>>> print(escaped_str)
One \'Example\'
>>> normal_str = normalize_str(escaped_str)
>>> print(normal_str)
One 'Example'
Of course the boring way will be to replace all known escaped characters one by one: http://docs.python.org/reference/lexical_analysis.html#string-literals
How would you implement normalize_str()
in the above code?
Character combinations consisting of a backslash (\) followed by a letter or by a combination of digits are called "escape sequences." To represent a newline character, single quotation mark, or certain other characters in a character constant, you must use escape sequences.
Escape sequences are used inside strings, not just those for printf, to represent special characters. In particular, the \n escape sequence represents the newline character.
For turning a normal string into a raw string, prefix the string (before the quote) with an r or R. This is the method of choice for overcoming this escape sequence problem.
>>> escaped_str = 'One \\\'example\\\'' >>> print escaped_str.encode('string_escape') One \\\'example\\\' >>> print escaped_str.decode('string_escape') One 'example'
Several similar codecs are available, such as rot13 and hex.
The above is Python 2.x, but – since you said (below, in a comment) that you're using Python 3.x – while it's circumlocutious to decode a Unicode string object, it's still possible. The codec has been renamed to "unicode_escape" too:
Python 3.3a0 (default:b6aafb20e5f5, Jul 29 2011, 05:34:11) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> escaped_str = "One \\\'example\\\'" >>> import codecs >>> print(codecs.getdecoder("unicode_escape")(escaped_str)[0]) One 'example'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With