Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I download all emails with attachments from Gmail?

Hard one :-)

import email, getpass, imaplib, os

detach_dir = '.' # directory where to save attachments (default: current)
user = raw_input("Enter your GMail username:")
pwd = getpass.getpass("Enter your password: ")

# connecting to the gmail imap server
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail") # here you a can choose a mail box like INBOX instead
# use m.list() to get all the mailboxes

resp, items = m.search(None, "ALL") # you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp)
items = items[0].split() # getting the mails id

for emailid in items:
    resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc
    email_body = data[0][1] # getting the mail content
    mail = email.message_from_string(email_body) # parsing the mail content to get a mail object

    #Check if any attachments at all
    if mail.get_content_maintype() != 'multipart':
        continue

    print "["+mail["From"]+"] :" + mail["Subject"]

    # we use walk to create a generator so we can iterate on the parts and forget about the recursive headach
    for part in mail.walk():
        # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue

        # is this part an attachment ?
        if part.get('Content-Disposition') is None:
            continue

        filename = part.get_filename()
        counter = 1

        # if there is no filename, we create one with a counter to avoid duplicates
        if not filename:
            filename = 'part-%03d%s' % (counter, 'bin')
            counter += 1

        att_path = os.path.join(detach_dir, filename)

        #Check if its already there
        if not os.path.isfile(att_path) :
            # finally write the stuff
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()

Wowww! That was something. ;-) But try the same in Java, just for fun!

By the way, I tested that in a shell, so some errors likely remain.

Enjoy

EDIT:

Because mail-box names can change from one country to another, I recommend doing m.list() and picking an item in it before m.select("the mailbox name") to avoid this error:

imaplib.error: command SEARCH illegal in state AUTH, only allowed in states SELECTED


I'm not an expert on Perl, but what I do know is that GMail supports IMAP and POP3, 2 protocols that are completely standard and allow you to do just that.

Maybe that helps you to get started.


#!/usr/bin/env python
"""Save all attachments for given gmail account."""
import os, sys
from libgmail import GmailAccount

ga = GmailAccount("[email protected]", "pA$$w0Rd_")
ga.login()

# folders: inbox, starred, all, drafts, sent, spam
for thread in ga.getMessagesByFolder('all', allPages=True):
    for msg in thread:
        sys.stdout.write('.')
        if msg.attachments:
           print "\n", msg.id, msg.number, msg.subject, msg.sender
           for att in msg.attachments:
               if att.filename and att.content:
                  attdir = os.path.join(thread.id, msg.id)
                  if not os.path.isdir(attdir):
                     os.makedirs(attdir)                
                  with open(os.path.join(attdir, att.filename), 'wb') as f:
                       f.write(att.content)

untested

  1. Make sure TOS allows such scripts otherwise you account will be suspended
  2. There might be better options: GMail offline mode, Thunderbird + ExtractExtensions, GmailFS, Gmail Drive, etc.

Take a look at Mail::Webmail::Gmail:

GETTING ATTACHMENTS

There are two ways to get an attachment:

1 -> By sending a reference to a specific attachment returned by get_indv_email

# Creates an array of references to every attachment in your account
my $messages = $gmail->get_messages();
my @attachments;

foreach ( @{ $messages } ) {
    my $email = $gmail->get_indv_email( msg => $_ );
    if ( defined( $email->{ $_->{ 'id' } }->{ 'attachments' } ) ) {
        foreach ( @{ $email->{ $_->{ 'id' } }->{ 'attachments' } } ) {
            push( @attachments, $gmail->get_attachment( attachment => $_ ) );
            if ( $gmail->error() ) {
                print $gmail->error_msg();
            }
        }
    }
}

2 -> Or by sending the attachment ID and message ID

#retrieve specific attachment
my $msgid = 'F000000000';
my $attachid = '0.1';
my $attach_ref = $gmail->get_attachment( attid => $attachid, msgid => $msgid );

( Returns a reference to a scalar that holds the data from the attachment. )


Within gmail, you can filter on "has:attachment", use it to identify the messages you should be getting when testing. Note this appears to give both messages with attached files (paperclip icon shown), as well as inline attached images (no paperclip shown).

There is no Gmail API, so IMAP or POP are your only real options. The JavaMail API may be of some assistance as well as this very terse article on downloading attachments from IMAP using Perl. Some previous questions here on SO may also help.

This PHP example may help too. Unfortunately from what I can see, there is no attachment information contained within the imap_header, so downloading the body is required to be able to see the X-Attachment-Id field. (someone please prove me wrong).