Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python3 unicode-escape doesn't work with non-ascii bytes?

In python2, there is string-escape and unicode-escape. For utf-8 byte string, string-escape could escape \ and keep non-ascii bytes, like:

"你好\\n".decode('string-escape')
'\xe4\xbd\xa0\xe5\xa5\xbd\n'

However, in python3, string-escape is removed. We have to encode string into bytes and decode it with unicode-escape:

"This\\n".encode('utf_8').decode('unicode_escape')
'This\n'

It does work with ascii bytes. But non-ascii bytes will also be escaped:

"你好\\n".encode('utf_8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\\n'
"你好\\n".encode('utf_8').decode('unicode_escape').encode('utf_8')
b'\xc3\xa4\xc2\xbd\xc2\xa0\xc3\xa5\xc2\xa5\xc2\xbd\n'

All non-ascii bytes are escaped, which leads to encoding error.

So is there a solution for this ? Is it possible in python3 to keep all non-ascii bytes and decode all escape chars ?

like image 917
Ning Sun Avatar asked Feb 17 '12 12:02

Ning Sun


1 Answers

import codecs
codecs.getdecoder('unicode_escape')('你好\\n')
like image 195
raylu Avatar answered Sep 19 '22 19:09

raylu