Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download a csv file from gmail using python

Tags:

python

csv

gmail

I tried different python scripts for download a CSV attachment from Gmail. But I could not able to get it.Is this possible. If it is possible which python script should I use? Thank you.

like image 891
Kanishka Avatar asked Nov 02 '25 01:11

Kanishka


2 Answers

TL;DR

  • I've put together a Github repo that makes getting CSV data from Gmail as simple as:

    from gmail import *
    service = get_gmail_service()
    
    # get all attachments from e-mails containing 'test'
    search_query = "test"
    service = get_gmail_service()
    csv_dfs = query_for_csv_attachments(service, search_query)
    print(csv_dfs)
    
  • Follow the instructions in the README and feel free to contribute!

THE LONG ANSWER (directly using google-api-python-client and oauth2client)

  • Follow this link and click the button: "ENABLE THE GMAIL API". After the setup you will download a file called credentials.json.
  • Install the needed Python packages:

    pip install --upgrade google-api-python-client oauth2client
    
  • The following code will allow you to connect to your Gmail account via Python:

    from googleapiclient.discovery import build
    from httplib2 import Http
    from oauth2client import file, client, tools
    
    GMAIL_CREDENTIALS_PATH = 'credentials.json' # downloaded
    GMAIL_TOKEN_PATH = 'token.json' # this will be created
    
    store = file.Storage(GMAIL_TOKEN_PATH)
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets(GMAIL_CREDENTIALS_PATH, SCOPES)
        creds = tools.run_flow(flow, store)
    service = build('gmail', 'v1', http=creds.authorize(Http()))
    
  • With this service you can read your emails and any attachments.

  • First you can query your e-mails with a search string to find the e-mail id's that have the attachments:

    search_query = "ABCD"
    result = service.users().messages().list(userId='me', q=search_query).execute()
    msgs = results['messages']
    msg_ids = [msg['id'] for msg in msgs]
    
  • For each messageId you can find the associated attachments in the email.

  • This part is a little messy so bear with me. First we obtain a list of "attachment parts" (and attachment filenames). These are components of the email that contain attachments:

    messageId = 'XYZ'
    msg = service.users().messages().get(userId='me', id=messageId).execute()
    parts = msg.get('payload').get('parts')
    all_parts = []
    for p in parts:
        if p.get('parts'):
            all_parts.extend(p.get('parts'))
        else:
            all_parts.append(p)
    
    att_parts = [p for p in all_parts if p['mimeType']=='text/csv']
    filenames = [p['filename'] for p in att_parts]
    
  • Now we can obtain the attached CSV from each part:

    messageId = 'XYZ'
    data = part['body'].get('data')
    attachmentId = part['body'].get('attachmentId')
    if not data:
        att = service.users().messages().attachments().get(
                userId='me', id=attachmentId, messageId=messageId).execute()
        data = att['data']
    
  • Now you have the CSV data but it's in an encoded format, so we change the encoding and convert the result into a Pandas dataframe:

    import base64
    import pandas as pd
    from StringIO import StringIO
    str_csv  = base64.urlsafe_b64decode(data.encode('UTF-8'))
    df = pd.read_csv(StringIO(str_csv))
    
  • That's it! you have a Pandas dataframe with the contents of the CSV attachment. You can work with this dataframe or write it to disk with pd.DataFrame.to_csv if you simply want to download it. You can use the list of filenames obtained earlier if you want to preserve the filename.

like image 75
robertwest Avatar answered Nov 04 '25 16:11

robertwest


I got it. This is not my own work. I got some codes, combined them and modified to this code. However, finally, it worked.

print 'Proceeding'

import email
import getpass
import imaplib
import os
import sys

userName = '[email protected]'
passwd = 'yourpassword'
directory = '/full/path/to/the/directory'


detach_dir = '.'
if 'DataFiles' not in os.listdir(detach_dir):
    os.mkdir('DataFiles')



try:
    imapSession = imaplib.IMAP4_SSL('imap.gmail.com')
    typ, accountDetails = imapSession.login(userName, passwd)
    if typ != 'OK':
        print 'Not able to sign in!'
        raise

    imapSession.select('[Gmail]/All Mail')
    typ, data = imapSession.search(None, 'ALL')
    if typ != 'OK':
        print 'Error searching Inbox.'
        raise


    for msgId in data[0].split():
        typ, messageParts = imapSession.fetch(msgId, '(RFC822)')
        if typ != 'OK':
            print 'Error fetching mail.'
            raise

        emailBody = messageParts[0][1]
        mail = email.message_from_string(emailBody)
        for part in mail.walk():
            if part.get_content_maintype() == 'multipart':
                continue
            if part.get('Content-Disposition') is None:
                continue
            fileName = part.get_filename()

            if bool(fileName):
                filePath = os.path.join(detach_dir, 'DataFiles', fileName)
                if not os.path.isfile(filePath) :
                    print fileName
                    fp = open(filePath, 'wb')
                    fp.write(part.get_payload(decode=True))
                    fp.close()
    imapSession.close()
    imapSession.logout()

    print 'Done'


except :
    print 'Not able to download all attachments.'
like image 22
Kanishka Avatar answered Nov 04 '25 16:11

Kanishka