Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decode a mime part of a message and get a **unicode** string in Python 2.7?

Here is a method which tries to get the html part of an email message:

from __future__ import absolute_import, division, unicode_literals, print_function

import email

html_mail_quoted_printable=b'''Subject: =?ISO-8859-1?Q?WG=3A_Wasenstra=DFe_84_in_32052_Hold_Stau?=
MIME-Version: 1.0
Content-type: multipart/mixed;
 Boundary="0__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253"

--0__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253
Content-type: multipart/alternative;
 Boundary="1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253"

--1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253
Content-type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: quoted-printable

Freundliche Gr=FC=DFe

--1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253
Content-type: text/html; charset=ISO-8859-1
Content-Disposition: inline
Content-transfer-encoding: quoted-printable

<html><body>
Freundliche Gr=FC=DFe
</body></html>
--1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253--

--0__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253--

'''
def get_html_part(msg):
    for part in msg.walk():
        if part.get_content_type() == 'text/html':
            return part.get_payload(decode=True)

msg=email.message_from_string(html_mail_quoted_printable)
html=get_html_part(msg)
print(type(html))
print(html)

Output:

<type 'str'>
<html><body>
Freundliche Gr��e
</body></html>

Unfortunately I get a byte string. I would like to have unicode string.

According to this answer msg.get_payload(decode=True) should do the magic. But it does not in this case.

How to decode a mime part of a message and get a unicode string in Python 2.7?

like image 879
guettli Avatar asked Aug 16 '16 09:08

guettli


1 Answers

Unfortunately I get a byte string. I would like to have unicode string.

The decode=True parameter to get_payload only decodes the Content-Transfer-Encoding wrapper, the =-encoding in this message. To get from there to characters is one of the many things the email package makes you do yourself:

bytes = part.get_payload(decode=True)
charset = part.get_content_charset('iso-8859-1')
chars = bytes.decode(charset, 'replace')

(iso-8859-1 being the fallback in case the message specifies no encoding.)

like image 191
bobince Avatar answered Oct 15 '22 09:10

bobince