Given an RFC822 message in Python 2.6, how can I get the right text/plain content part? Basically, the algorithm I want is this:
message = email.message_from_string(raw_message) if has_mime_part(message, "text/plain"): mime_part = get_mime_part(message, "text/plain") text_content = decode_mime_part(mime_part) elif has_mime_part(message, "text/html"): mime_part = get_mime_part(message, "text/html") html = decode_mime_part(mime_part) text_content = render_html_to_plaintext(html) else: # fallback text_content = str(message) return text_content
Of these things, I have get_mime_part
and has_mime_part
down pat, but I'm not quite sure how to get the decoded text from the MIME part. I can get the encoded text using get_payload()
, but if I try to use the decode
parameter of the get_payload()
method (see the doc) I get an error when I call it on the text/plain part:
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/ email/message.py", line 189, in get_payload raise TypeError('Expected list, got %s' % type(self._payload)) TypeError: Expected list, got <type 'str'>
In addition, I don't know how to take HTML and render it to text as closely as possible.
We used Google App Password to connect our Python script to the Gmail account, so our Python program could read the email from the inbox. You do not need to do it if you are using a different email provider or server. There, you can log in to your account just with your email id and password with the Python program.
The data parameter of a CAN message is exposed as a bytearray with length between 0 and 8. The DLC parameter of a CAN message is an integer between 0 and 8 representing the frame payload length. In the case of a CAN FD message, this indicates the data length in number of bytes.
In a multipart e-mail, email.message.Message.get_payload()
returns a list with one item for each part. The easiest way is to walk the message and get the payload on each part:
import email msg = email.message_from_string(raw_message) for part in msg.walk(): # each part is a either non-multipart, or another multipart message # that contains further parts... Message is organized like a tree if part.get_content_type() == 'text/plain': print part.get_payload() # prints the raw text
For a non-multipart message, no need to do all the walking. You can go straight to get_payload(), regardless of content_type.
msg = email.message_from_string(raw_message) msg.get_payload()
If the content is encoded, you need to pass None
as the first parameter to get_payload()
, followed by True (the decode flag is the second parameter). For example, suppose that my e-mail contains an MS Word document attachment:
msg = email.message_from_string(raw_message) for part in msg.walk(): if part.get_content_type() == 'application/msword': name = part.get_param('name') or 'MyDoc.doc' f = open(name, 'wb') f.write(part.get_payload(None, True)) # You need None as the first param # because part.is_multipart() # is False f.close()
As for getting a reasonable plain-text approximation of an HTML part, I've found that html2text works pretty darn well.
Flat is better than nested ;)
from email.mime.multipart import MIMEMultipart assert isinstance(msg, MIMEMultipart) for _ in [k.get_payload() for k in msg.walk() if k.get_content_type() == 'text/plain']: print _
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With