Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract information from Gmail with Python

I have come through solutions to extract useful information from selected received emails in Gmail mailbox.

Aim in this example is to fetch all mails sent from a newsletter providing monthly prices for petroleum. You can freely subscribe to such a newsletter on EIA website. All such newsletter arrive in same folder in my gmail mailbox, and begin with "$".

Content for emails is like that

enter image description here

and my objective is to write a script that fetch the 10 last such emails (last 10 months) and plot petroleum prices for the different US regions with respect to time.

like image 844
kiriloff Avatar asked Jan 12 '13 14:01

kiriloff


People also ask

What can we do with Gmail API?

The Gmail API is a RESTful API that can be used to access Gmail mailboxes and send mail. For most web applications the Gmail API is the best choice for authorized access to a user's Gmail data and is suitable for various applications, such as: Read-only mail extraction, indexing, and backup.


1 Answers

Python email library will help.

import email, getpass, imaplib, os, re
import matplotlib.pyplot as plt

This directory is where you will save attachments

 detach_dir = "F:\OTHERS\CS\PYTHONPROJECTS"  

Your script then asks user (or yourself) for account features

user = raw_input("Enter your GMail username --> ")
pwd = getpass.getpass("Enter your password --> ")

Connect then to the gmail imap server and login

m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user, pwd)

Select one folder, you could use the whole INBOX instead

m.select("BUSINESS/PETROLEUM")    

One should use m.list() to get all the mailboxes. Search for all emails coming from specified sender and select the mail ids:

resp, items = m.search(None, '(FROM "[email protected]")')
items = items[0].split()  

my_msg = [] # store relevant msgs here in please
msg_cnt = 0
break_ = False

I want the last emails, so that I am using items[::-1]

for emailid in items[::-1]:

    resp, data = m.fetch(emailid, "(RFC822)")

    if ( break_ ):
        break

    for response_part in data:

      if isinstance(response_part, tuple):
          msg = email.message_from_string(str(response_part[1]))
          varSubject = msg['subject']
          varDate = msg['date']

I want only the ones beginning with $

          if varSubject[0] == '$':
              r, d = m.fetch(emailid, "(UID BODY[TEXT])")

              ymd = email.utils.parsedate(varDate)[0:3]
              my_msg.append([ email.message_from_string(d[0][1]) , ymd ])

              msg_cnt += 1

I want only the N=100 last messages

              if ( msg_cnt == 100 ):
                  break_ = True

l = len(my_msg)
US, EastCst, NewEng, CenAtl, LwrAtl, Midwst, GulfCst, RkyMt, WCst, CA = 
[0]*l, [0]*l, [0]*l, [0]*l, [0]*l, [0]*l, [0]*l, [0]*l, [0]*l, [0]*l 
absc = [k for k in range(len(my_msg))]
dates = [str(msg[1][2])+'-'+str(msg[1][3])+'-'+str(msg[1][0]) for msg in my_msg]
cnt = -1

for msg in my_msg:

    data = str(msg[0]).split("\n")
    cnt+=1
    for c in [k.split("\r")[0] for k in data[2:-2]]: 

Use regular expressions to fetch relevant information

        m = re.match( r"(.+)(=3D\$)(.+)" , c )  
        if( m == None ):
            continue 

        country, na, price = m.groups()

        if ( country == "US" or country == "USA" ) :
            US[cnt] = float(price)
        elif( country == "NewEng" ) :
            EastCst[cnt] = float(price)    
        elif( country == "EastCst" ) :
            NewEng[cnt] = float(price)  
        elif( country == "EastCst" ) :
            CenAtl[cnt] = float(price) 
        elif( country == "EastCst" ) :
            LwrAtl[cnt] = float(price)
        elif( country == "EastCst" ) :
            Midwst[cnt] = float(price)
        elif( country == "EastCst" ) :
            GulfCst[cnt] = float(price)
        elif( country == "EastCst" ) :
            RkyMt[cnt] = float(price)
        elif( country == "EastCst" ) :
            WCst[cnt] = float(price)
        elif( country == "EastCst" ) :
            CA[cnt] = float(price)

Plot all these curves with US prices

plt.plot( absc, US )

plt.plot( absc, EastCst )    
plt.plot( absc, NewEng, '#251BE0' )    
plt.plot( absc, EastCst, '#1BE0BF' )
plt.plot( absc, CenAtl, '#E0771B' )
plt.plot( absc, LwrAtl, '#CC1BE0' )
plt.plot( absc, Midwst, '#E01B8B' ) 
plt.plot( absc, GulfCst, '#E01B3F' )
plt.plot( absc, RkyMt )
plt.plot( absc, WCst )
plt.plot( absc, CA )

plt.legend( ('US', 'EastCst', 'NewEng' , 'EastCst', 'CenAtl', 'LwrAtl', 'Midwst', 'GulfCst', 'RkyMt', 'WCst', 'CA')  )
plt.title('Diesel price')
locs,labels = plt.xticks(absc, dates)
plt.show()

Some related interesting topics are here

Get only new emails

Fetch mail body

Forward emails with attachment

Fetch body emails in gmail

Results are here for three areas only

us prices

like image 70
4 revs, 3 users 97% Avatar answered Sep 30 '22 18:09

4 revs, 3 users 97%