I was trying to make an add-on for Anki which imports the opml notes from Mubu, and I the contents that I needed were stored in a str
object like the one below, and I was not able to decode them or convert them into byte objects.
"\x3Cspan\x3E\xE6\x88\x91\xE5\x8F\x91\xE7\x8E\xB0\xE6\x88\x91\xE5\xB1\x85\xE7\x84\xB6\xE6\xB2\xA1\xE6\x9C\x89\xE6\xB5\x8B\xE8\xAF\x95\xE8\xBF\x87\xE4\xB8\xAD\xE6\x96\x87\xEF\xBC\x8C\xE8\xBF\x99\xE4\xB8\xAA\xE5\xB0\xB1\xE5\xA4\xAA\xE7\xA6\xBB\xE8\xB0\xB1\xE4\xBA\x86\xE3\x80\x82\x3C/span\x3E"
Previously, I was trying able to decode this string using the following method, but it does not support utf-8:
text = text.encode().decode("unicode_escape")
I wonder if there is a way to turn str objects whose literal content is in utf-8 into byte objects.
In python3 this can be decoded as follows:
# put a b in front of the string to make it bytes
s = b"\x3Cspan\x3E\xE6\x88\x91\xE5\x8F\x91\xE7\x8E\xB0\xE6\x88\x91\xE5\xB1\x85\xE7\x84\xB6\xE6\xB2\xA1\xE6\x9C\x89\xE6\xB5\x8B\xE8\xAF\x95\xE8\xBF\x87\xE4\xB8\xAD\xE6\x96\x87\xEF\xBC\x8C\xE8\xBF\x99\xE4\xB8\xAA\xE5\xB0\xB1\xE5\xA4\xAA\xE7\xA6\xBB\xE8\xB0\xB1\xE4\xBA\x86\xE3\x80\x82\x3C/span\x3E"
import chardet
encoding = chardet.detect(s)
content = s.decode(encoding['encoding'])
content
It decodes to
<span>我发现我居然没有测试过中文,这个就太离谱了。</span>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With