Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get body text of an email using python imap and email package

I want to retrieve body (only text) of emails using python imap and email package.

As per this SO thread, I'm using the following code:

mail = email.message_from_string(email_body)
bodytext = mail.get_payload()[ 0 ].get_payload()

Though it's working fine for some instances, but sometime I get similar to following response

[<email.message.Message instance at 0x0206DCD8>, <email.message.Message instance at 0x0206D508>]
like image 679
biztiger Avatar asked May 07 '13 05:05

biztiger


2 Answers

You are assuming that messages have a uniform structure, with one well-defined "main part". That is not the case; there can be messages with a single part which is not a text part (just an "attachment" of a binary file, and nothing else) or it can be a multipart with multiple textual parts (or, again, none at all) and even if there is only one, it need not be the first part. Furthermore, there are nested multiparts (one or more parts is another MIME message, recursively).

In so many words, you must inspect the MIME structure, then decide which part(s) are relevant for your application. If you only receive messages from a fairly static, small set of clients, you may be able to cut some corners (at least until the next upgrade of Microsoft Plague hits) but in general, there simply isn't a hierarchy of any kind, just a collection of (not necessarily always directly related) equally important parts.

like image 165
tripleee Avatar answered Sep 22 '22 01:09

tripleee


The main problem in my case is that replied or forwarded message shown as message instance in the bodytext.

Solved my problem using the following code:

bodytext=mail.get_payload()[0].get_payload();
if type(bodytext) is list:
    bodytext=','.join(str(v) for v in bodytext)
like image 23
biztiger Avatar answered Sep 21 '22 01:09

biztiger