Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing Multipart emails in python and saving attachments

I am pretty new to python and I am trying to parse email from gmail via python's imaplib and email. It is working pretty well but I am having issues with email attachments.

I would like to parse out all of the plaintext from the email while ignoring any HTML that may be inserted as a secondary content type while also removing and saving all other attachments. I have been trying the following:

...imaplib connection and mailbox selection...

typ, msg_data = c.fetch(num, '(RFC822)')
        email_body = msg_data[0][1]
mail = email.message_from_string(email_body)
        for part in mail.walk():
            if part.get_content_type() == 'text/plain':
                body = body + '\n' + part.get_payload()
            else:
                continue

This was my original attempt to just take the plaintext portions of an email, but when someone sends an email with a text attachment, the contents of the text file shows up for the 'body' variable above.

Can someone tell me how I can extract the plaintext portions of an email while ignoring the secondary HTML that is sometimes present, while also saving all other types of file attachments as files? I appologize if this doesn't make a lot of sense. I will update the question with more clarification if needed.

like image 535
ajt Avatar asked Jun 06 '11 16:06

ajt


People also ask

How do I read email attachments in python?

imaplib is the package that installs IMAP a standard email protocol that stores email messages on a mail server, but allows the end user to view and manipulate the messages as though they were stored locally on the end user's computing device(s).

How do I download an email attachment in Python?

Create an empty python file download_attachment.py. Add the following lines to it. print 'Proceeding' import email import getpass import imaplib import os import sys userName = '[email protected]' passwd = 'yourpassword' directory = '/full/path/to/the/directory' detach_dir = '. ' if 'DataFiles' not in os.


1 Answers

If you just need to keep text attachments out of the body variable with what you have there, it should be as simple as this:

mail = email.message_from_string(email_body)
    for part in mail.walk():
        c_type = part.get_content_type()
        c_disp = part.get('Content-Disposition')

        if c_type == 'text/plain' and c_disp == None:
            body = body + '\n' + part.get_payload()
        else:
            continue

Then if the Content-Disposition indicates that it's an attachment, you should be able to use part.get_filename() and part.get_payload() to handle the file. I don't know if any of this can vary, but it's basically what I've used in the past to interface with my mail server.

like image 136
robots.jpg Avatar answered Sep 18 '22 08:09

robots.jpg