Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to extract the body of the email file in python

Tags:

python

I am reading an email file stored in my machine,able to extract the headers of the email, but unable to extract the body.

    # The following part is working , opening a file and reading the header .

    import email
    from email.parser import HeaderParser
    with open(passedArgument1+filename,"r",encoding="ISO-8859-1") as f:
        msg=email.message_from_file(f)
        print('message',msg.as_string())
        parser = HeaderParser()
        h = parser.parsestr(msg.as_string())
        print (h.keys())  

       # The following snippet gives error
        msgBody=msg.get_body('text/plain')

Is there any proper way to extract only the body message.Stuck at this point.

For reference the email file can be downloaded from

https://drive.google.com/file/d/0B3XlF206d5UrOW5xZ3FmV3M3Rzg/view

like image 919
Sumanth Avatar asked Jul 16 '17 01:07

Sumanth


2 Answers

Update

If you are having the AttributeError: 'Message' object has no attribute 'get_body' error, you might want to read what follows.

I did some tests, and it seems the doc is indeed erroneous compared to the current library implementation (July 2017).

What you might be looking for is actually the function get_payload() it seems to do what you want to achieve:

The conceptual model provided by an EmailMessage object is that of an ordered dictionary of headers coupled with a payload that represents the RFC 5322 body of the message, which might be a list of sub-EmailMessage objects

get_payload() is not in current July 2017 Documentation, but the help() says the following:

get_payload(i=None, decode=False) method of email.message.Message instance
  Return a reference to the payload.

The payload will either be a list object or a string. If you mutate the list object, you modify the message's payload in place. Optional i returns that index into the payload.

Optional decode is a flag indicating whether the payload should be decoded or not, according to the Content-Transfer-Encoding header (default is False).

When True and the message is not a multipart, the payload will be decoded if this header's value is 'quoted-printable' or 'base64'. If some other encoding is used, or the header is missing, or if the payload has bogus data (i.e. bogus base64 or uuencoded data), the payload is returned as-is.

If the message is a multipart and the decode flag is True, then None is returned.

like image 97
Fabien Avatar answered Oct 21 '22 14:10

Fabien


The 3.6 email lib uses an API that is compatible with Python 3.2 by default and that is what is causing you this problem.

Note the default policy in the declaration below from the docs:

email.message_from_file(fp, _class=None, *, policy=policy.compat32)

If you want to use the "new" API that you see in the 3.6 docs, you have to create the message with a different policy.

import email
from email import policy
...
msg=email.message_from_file(f, policy=policy.default)

will give you the new API that you see in the docs which will include the very useful: get_body()

like image 29
Arthur Cinader Avatar answered Oct 21 '22 12:10

Arthur Cinader