I am using imaplib to read gmail messages in my python command window. The only problem is if that the emails come with with newlines and return carriages. Also, the text does not seem to be formatted correct. Instead of Amount: $36.49, it returns =2436.49. How can I go about cleaning up this text? Thanks!
Sample email content:
r\nItem name: Scanner\r\nItem=23: 130585100869\r\nPurchase Date: Oct 7, 2011\r\nUnit Price: =2436.49 USD\r\nQty: 1\r\nAmount: =2436.49USD\r\nSubtotal: =2436.49 USD\r\nShipping and handling: =240.00 USD\r\nInsurance - not offered
Code:
import imaplib
import libgmail
import re
import email
from BeautifulSoup import BeautifulSoup
USER = '[email protected]'
PASSWORD = 'password'
#connecting to the gmail imap server
imap_server = imaplib.IMAP4_SSL('imap.gmail.com', 993)
imap_server.login(USER, PASSWORD)
imap_server.select('Inbox')
typ, response = imap_server.search(None, '(SUBJECT "payment received")')
Data = []
for i in response[0].split():
results, data = imap_server.fetch(i, "(RFC822)")
Data.append(data)
break
for i in Data:
print i
The data is in quoted-printable encoding, this is a little data massager that should get you what you want:
text = '''\r\nPurchase Date: Oct 7, 2011\r\nUnit Price: =2436.49 USD\r\nQty: 1\r\nAmount: =2436.49 USD\r\nSubtotal: =2436.49 USD\r\nShipping and handling: =240.00 USD\r\nInsurance - not offered : ----\r\n----------------------------------------------------------------------\r\nTax: --\r\nTotal: =2436.49 USD\r\nPayment: =2436.49 USD\r\nPayment sent to: emailaddress=40gmail.com\r\n----------------------------------------------------------------------\r\n\r\nSincerely,\r\nPayPal\r\n=20\r\n----------------------------------------------------------------------\r\nHelp Center:=20\r\nhttps://www.paypal.com/us/cgi-bin/helpweb?cmd=3D_help\r\nSecurity Center:=20\r\nhttps://www.paypal.com/us/security\r\n\r\nThis email was sent by an automated system, so if you reply, nobody will =\r\nsee it. To get in touch with us, log in to your account and click =\r\n=22Contact Us=22 at the bottom of any page.\r\n\r\n'''
raw_data = text.decode("quopri") #replace =XX for the real characters
data = [map(str.strip, l.split(":")) for l in raw_data.splitlines() if ": " in l]
print data
# [['Purchase Date', 'Oct 7, 2011'], ['Unit Price', '$36.49 USD'], ['Qty', '1'], ['Amount', '$36.49 USD'], ['Subtotal', '$36.49 USD'], ['Shipping and handling', '$0.00 USD'], ['Insurance - not offered', '----'], ['Tax', '--'], ['Total', '$36.49 USD'], ['Payment', '$36.49 USD'], ['Payment sent to', '[email protected]'], ['Help Center', ''], ['Security Center', '']]
There you have your data in a much easier to process format, I hope it helps.
Edit: to make it even cuter:
>>> cooked = dict(data)
>>> print cooked["Unit Price"]
$36.49 USD
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With