In python2, there is string-escape
and unicode-escape
. For utf-8 byte string, string-escape could escape \ and keep non-ascii bytes, like:
"你好\\n".decode('string-escape')
'\xe4\xbd\xa0\xe5\xa5\xbd\n'
However, in python3, string-escape
is removed. We have to encode string into bytes and decode it with unicode-escape
:
"This\\n".encode('utf_8').decode('unicode_escape')
'This\n'
It does work with ascii bytes. But non-ascii bytes will also be escaped:
"你好\\n".encode('utf_8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\\n'
"你好\\n".encode('utf_8').decode('unicode_escape').encode('utf_8')
b'\xc3\xa4\xc2\xbd\xc2\xa0\xc3\xa5\xc2\xa5\xc2\xbd\n'
All non-ascii bytes are escaped, which leads to encoding error.
So is there a solution for this ? Is it possible in python3 to keep all non-ascii bytes and decode all escape chars ?
import codecs
codecs.getdecoder('unicode_escape')('你好\\n')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With