I am pretty new to python and I am trying to parse email from gmail via python's imaplib and email. It is working pretty well but I am having issues with email attachments.
I would like to parse out all of the plaintext from the email while ignoring any HTML that may be inserted as a secondary content type while also removing and saving all other attachments. I have been trying the following:
...imaplib connection and mailbox selection...
typ, msg_data = c.fetch(num, '(RFC822)')
email_body = msg_data[0][1]
mail = email.message_from_string(email_body)
for part in mail.walk():
if part.get_content_type() == 'text/plain':
body = body + '\n' + part.get_payload()
else:
continue
This was my original attempt to just take the plaintext portions of an email, but when someone sends an email with a text attachment, the contents of the text file shows up for the 'body' variable above.
Can someone tell me how I can extract the plaintext portions of an email while ignoring the secondary HTML that is sometimes present, while also saving all other types of file attachments as files? I appologize if this doesn't make a lot of sense. I will update the question with more clarification if needed.
imaplib is the package that installs IMAP a standard email protocol that stores email messages on a mail server, but allows the end user to view and manipulate the messages as though they were stored locally on the end user's computing device(s).
Create an empty python file download_attachment.py. Add the following lines to it. print 'Proceeding' import email import getpass import imaplib import os import sys userName = '[email protected]' passwd = 'yourpassword' directory = '/full/path/to/the/directory' detach_dir = '. ' if 'DataFiles' not in os.
If you just need to keep text attachments out of the body
variable with what you have there, it should be as simple as this:
mail = email.message_from_string(email_body)
for part in mail.walk():
c_type = part.get_content_type()
c_disp = part.get('Content-Disposition')
if c_type == 'text/plain' and c_disp == None:
body = body + '\n' + part.get_payload()
else:
continue
Then if the Content-Disposition indicates that it's an attachment, you should be able to use part.get_filename()
and part.get_payload()
to handle the file. I don't know if any of this can vary, but it's basically what I've used in the past to interface with my mail server.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With