I am trying to read email with imaplib. I get this mail body:
=C4=EE=E1=F0=FB=E9 =E4=E5=ED=FC!
That is Quoted-printable
encoding.
I need to get utf-8
from this. It should be Добрый день!
I googled it, but it is too messy with Python's versions. It is already unicode in Python 3, I cann't use .encode('utf-8')
here.
How can I change this to utf-8
?
The format of a quoted-printable message is simple. The encoder converts any character that must be escaped to an equal sign (=) followed by the character's ASCII value in hexadecimal. For example, a VT character (ASCII value 11) is represented as =0B and a DEL character (ASCII value 127) is represented as =7F.
UTF-8 is the dominant encoding for the World Wide Web (and internet technologies), accounting for 98% of all web pages, and up to 100.0% for some languages, as of 2022.
Quoted-printable encoding is used where data is mostly US-ASCII text. It allows for 8-bit characters to be represented as their hexadecimal values. For instance, a new line can be forced by using the following string: "=0D=0A".
Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters (alphanumeric and the equals sign = ) to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean.
The quopri
module can convert those bytes to an unencoded byte stream. You need to then decode those from whatever character set they're in, then encode back to utf-8
.
>>> b = quopri.decodestring('=C4=EE=E1=F0=FB=E9 =E4=E5=ED=FC')
>>> print(b.decode('windows-1251'))
Добрый день
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With