Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Email body is a string sometimes and a list sometimes. Why?

My application is written in python. What I am doing is I am running a script on each email received by postfix and do something with the email content. Procmail is responsible for running the script taking the email as input. The problem started when I was converting the input message(may be text) to email_message object(because the latter comes in handy). I am using email.message_from_string (where email is the default email module, comes with python).

import email message = email.message_from_string(original_mail_content) message_body = message.get_payload()

This message_body is sometimes returning a list[email.message.Message instance,email.message.Message instance] and sometime returning a string(actual body content of the incoming email). Why is it. And even I found one more observation. When I was browsing through the email.message.Message.get_payload() docstring, I found this..
""" The payload will either be a list object or a string.If you mutate the list object, you modify the message's payload in place....."""

So how do I have generic method to get the body of email through python? Please help me out.

like image 568
None-da Avatar asked Feb 27 '09 12:02

None-da


3 Answers

As crazy as it might seem, the reason for the sometimes string, sometimes list-semantics are given in the documentation. Basically, multipart messages are returned as lists.

like image 22
unwind Avatar answered Oct 14 '22 23:10

unwind


Rather than simply looking for a sub-part, use walk() to iterate through the message contents

def walkMsg(msg):
  for part in msg.walk():
    if part.get_content_type() == "multipart/alternative":
      continue
    yield part.get_payload(decode=1)

The walk() method returns an iterator that you can loop with (i.e. it's a generator). If the message is not a container of parts (i.e. has no attachments or alternates), the walk() method will then return an iterator with a single element - the message itself.

You want to skip any 'multipart' parts as they are just glue.

The above method returns all readable parts. You may want to expand this to simply return the text parts if they contain the info you are seeking.

Note that as of Python 2.5, methods get_type(), get_main_type(), and get_subtype() have been removed -> http://docs.python.org/library/email.message.html#email.message.Message.walk

like image 40
timbo Avatar answered Oct 14 '22 22:10

timbo


Well, the answers are correct, you should read the docs, but for an example of a generic way:

def get_first_text_part(msg):
    maintype = msg.get_content_maintype()
    if maintype == 'multipart':
        for part in msg.get_payload():
            if part.get_content_maintype() == 'text':
                return part.get_payload()
    elif maintype == 'text':
        return msg.get_payload()

This is prone to some disaster, as it is conceivable the parts themselves might have multiparts, and it really only returns the first text part, so this might be wrong too, but you can play with it.

like image 147
Ali Afshar Avatar answered Oct 15 '22 00:10

Ali Afshar