I am reading an email file stored in my machine,able to extract the headers of the email, but unable to extract the body.
# The following part is working , opening a file and reading the header .
import email
from email.parser import HeaderParser
with open(passedArgument1+filename,"r",encoding="ISO-8859-1") as f:
msg=email.message_from_file(f)
print('message',msg.as_string())
parser = HeaderParser()
h = parser.parsestr(msg.as_string())
print (h.keys())
# The following snippet gives error
msgBody=msg.get_body('text/plain')
Is there any proper way to extract only the body message.Stuck at this point.
For reference the email file can be downloaded from
https://drive.google.com/file/d/0B3XlF206d5UrOW5xZ3FmV3M3Rzg/view
Update
If you are having the AttributeError: 'Message' object has no attribute 'get_body'
error, you might want to read what follows.
I did some tests, and it seems the doc is indeed erroneous compared to the current library implementation (July 2017).
What you might be looking for is actually the function get_payload()
it seems to do what you want to achieve:
The conceptual model provided by an EmailMessage object is that of an ordered dictionary of headers coupled with a payload that represents the RFC 5322 body of the message, which might be a list of sub-EmailMessage objects
get_payload()
is not in current July 2017 Documentation, but the help()
says the following:
get_payload(i=None, decode=False) method of email.message.Message instance Return a reference to the payload.
The payload will either be a list object or a string. If you mutate the list object, you modify the message's payload in place. Optional
i
returns that index into the payload.Optional
decode
is a flag indicating whether the payload should be decoded or not, according to the Content-Transfer-Encoding header (default isFalse
).When
True
and the message is not a multipart, the payload will be decoded if this header's value is 'quoted-printable' or 'base64'. If some other encoding is used, or the header is missing, or if the payload has bogus data (i.e. bogus base64 or uuencoded data), the payload is returned as-is.If the message is a multipart and the decode flag is
True
, thenNone
is returned.
The 3.6 email lib uses an API that is compatible with Python 3.2 by default and that is what is causing you this problem.
Note the default policy in the declaration below from the docs:
email.message_from_file(fp, _class=None, *, policy=policy.compat32)
If you want to use the "new" API that you see in the 3.6 docs, you have to create the message with a different policy.
import email
from email import policy
...
msg=email.message_from_file(f, policy=policy.default)
will give you the new API that you see in the docs which will include the very useful: get_body()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With